An Introduction to Git, Part 1 of 3

GIT2qqsxdd

We’ve been using Git as a versioning tool for most of our projects for several years by now. Despite this, there are several key questions that seem to crop up every so often. A lot of these questions are centered around the topics of branching, merging and rebasing. What is it, how does it work and why would I want to use it?. In the hope of reducing the confusion surrounding these topics, I’ve started to write a short series of posts.

Since we figured they might be useful to others as well, we’ve decided to publish them. If you’re a Git novice struggling with this yourself, it is my hope that I can provide at least some answers. If you have additional questions be sure to leave a comment. Found any errors or important omissions? Be sure to point those out as well.

This first post in the series is going to cover the fundamental basics of what Git is and how it works, while also delving a bit into the topic of branches. The second post will be a bit more advanced and focus on the topics of merging and rebasing, while the third and final post in this series will cover conflicts during merging and rebasing in more details.

In case you haven’t heard of Git before or are unsure what its uses are, the first chapter will contain a brief summary. This is not going to be an in depth technical introduction and more of a quick 101 for absolute beginners. Readers who know their way around Git, Subversion or any other versioning tool can probably skip ahead.

What is Git

At its core, Git is a versioning system that allows to track changes made to one or more files over the course of time. Files to be tracked need to be part of a repository. Changes made to the files inside a repository contribute to that repository’s history. In a typical usage scenario a repository will be used to keep track of files that belong to a single project. If there is more than one project each project will usually reside in its own repository. While it is entirely possible to keep multiple projects within a single repository it will often turn the repository’s history into a mess of unrelated changes. Think of a repository’s history like you would of a persons genealogy. Mixing the records for complete strangers in the same family tree would be confusing.

While Git is predominantly used by software developers to keep track of changes made to an application’s source code, it’s not limited to this use case. In fact, looking around the web there are people like novelists and other professionals who are using this tool as well. In the case of a novelist a project could be a book they’re writing and all of their notes relating to that book. Thanks to Git’s history jumping back and forth between different revisions of a book is very simple.

To keep things simple the rest of this article will only focus on Git’s application as it relates to software development.

Unlike automatic backups made by some word processors and IDEs Git does not automatically keep track changes made to a file. Instead users have to manually tell Git to commit their work. Think of this like pressing the save button. Only in this case every time the “save” button is pressed a new entry in the repository’s history is created. The history then allows users to switch back and forth between different revisions of their files. In a typical usage scenario work is committed whenever an isolated set of changes has been completed. It is generally recommended to keep commits as small as possible. Ideally each commit should track a single change. Which may, of course, span multiple files.

The reasons for this are quite simple. If a commit needs to be reverted it should be possible to do so without losing other changes that need to be kept. If a commit needs to also be applied to another branch it should be possible to do so without introducing other, unrelated changes. Smaller commits lead to less severe merge conflicts (more about this in the next posts). And last but not least, smaller commits are easier to comprehend when reviewing a repository’s history.

One other important characteristic of a commit is its commit message. A commit message is like a comment that summarizes the changes introduced by that commit (e.g. “Add feature xyz”). Here’s a quick refresher on why good commits messages are important and how to write them. In addition to a commit message it is also possible to tag a commit. A tag is like an earmark that can be used to make a particular commit easier to find. In most cases tags are used to mark specific releases of a software product (e.g. “Version 1.0”).

With these basics out of the way, let’s get started on some of the more complicated topics.

Branching

The illustration below shows an example of what the Git history of a very simple software project might look like. At the very least every Git repository has a single branch, known as the “master” branch. Note the orange “master” label at the top of the history. This history also features a single tag “version-1.0” that points to a specific commit. The history currently contains six commits, each with a short commit message, the author who committed the change and the date the change was committed. Each commit also has a hash code (SHA-1) that refers to it. To simplify things most tools also allow the use of the shortened, seven-character notation, as long as it is unique. The colors in these images are more or less random and carry no special meaning.

linear-workflow

Technically speaking a branch and a tag are both pointers that point to specific commits. However, while the “version-1.0” tag is supposed to stay on that particular commit, the “master” pointer will move whenever new commits to the master branch are made. The “master” pointer always points to the head of the master branch. In this sense a Git repository’s history is like a linked list. Every commit, with the exception of the first one, points to the commit that came before it. The branch pointer points to the head of the list. When a new commit is made, the new commit will then point to the previous commit and the branch’s pointer is automatically updated to point to the new commit.

Let’s imagine the first release of the project above, tagged as “version-1.0” contained bugs. Some of these bugs are critical and need to be fixed as soon as possible. However, there’s a problem: The second version of that application is already in development. Two additional commits have been made since the first version was released. These commits introduce functional changes that are incomplete and should not be released at this point in time. In other words, the second version of the application is not yet ready for release and won’t be for some time.

To solve this issue we need to go back in time and fix these issues before the new changes were introduced. Fortunately Git allows us to do just this with the following command:

git checkout version-1.0

This will “check out” the particular commit the tag points to. Instead of using a label it is also possible to check out any arbitrary commit by entering its (short) hash code like this: “git checkout 87bd80d”. In our case this would’ve been synonymous with the above command. After the above command is entered Git will print a warning that we are now in “detached head” state. This means we are now working on top of a commit that is not tracked by a branch (pointer). Any commits made in this state will be lost, since there is currently no pointer to keep track of these new commits. This is fine as long as we just want to look at this version’s source code or compile an older version for testing.

Since we want to change our code to fix bugs we need to create a new branch, to keep track of these changes for us. We do this with following command:

git checkout -b feature/version-1.1

This creates a new branch or more specifically a new pointer, named “feature/version-1.1”, that points to the commit we’re on. The difference between this pointer and the “version-1.0” tag is, that it will keep moving along when we start adding new commits. For now the pointer simply points to the same commit as version-1.0:

feature-branch-1.1

This starts to change as soon as new commits are made to the new branch we just created. As illustrated below the version-1.0 tag stays at its place while the feature/version-1.1 pointer starts to move along with the commits to this branch.

feature-branch-1.1_fixes

As can be seen above, two commits were made to the new branch. Let’s assume those commits fix all known bugs in version 1.0. The last commit would now typically be tagged as a new version (in this case version-1.1) and the build created from this version would be released to customers as a bug fix release. Release cycle management is beyond the scope of this article, so let’s focus on another issue this has created for us as developers: There are now two concurrent branches of our code. The “master” branch containing the changes that lead up to version 2.0 and our bug fix branch. Fixes added to the bug fix branch are not part of our current development build. To avoid losing these fixes they need to somehow be introduced back into the main development branch. Git allows us to do this by merging branches. As the name implies merging combines the changes of two branches into one.

A merge does not create a new branch however. Instead we need to pick one of the branches as the branch we want to merge into. The changes introduced by the other branch are then merged into this branch. In our case we want to merge the changes introduced by the bug fix branch back into master, so development can continue on the master branch. This can be done by issuing the following commands:

git checkout master
git merge feature/version-1.1

Note the order of commands. The first command switches to the branch we want to merge into. The second command tells Git which branch we want to merge into our current branch. After the bug fix branch our commit history will look as shown below. The merge created a new commit that combines the changes introduced by the bug fix branch and the master branch. This commit is special in that it has two parent commits it is based upon. Since we no longer need to keep track of the bug fix branch it would now be entirely feasible to delete it. Remember that a branch is actually just a pointer. Deleting the branch will not delete the commits, only the pointer pointing to them. Since master now refers to a commit that in turn references both sets of changes removing the branch wouldn’t damage anything.

feature-branch-merged

There’s one issue with this kind of workflow, that becomes apparent if we add a few more bug fix releases to our current example. History will soon become a mess of interleaving lanes, making it difficult to keep track of changes. In addition this workflow doesn’t play nice with most automated build tools. Typically build tools are set up to build one particular branch. Updating the build job every time a new bug fix branch is added is cumbersome.

feature-branches-merged

In the examples above the main development work was done on the master branch, while fixes were created on separate branches that were subsequently merged back into master. This is an overly-simplified (and somewhat broken) illustration of a workflow that is better known as git-flow. A better explanation on what git-flow is supposed to look like can be found here: http://nvie.com/posts/a-successful-git-branching-model/.

This concludes the first post in our short series about working with Git that covered the very basics of branching and merging. Stay tuned for the next post that will provide more details on the topics of merging and rebasing.

Read now:
Part 2 Git Merge and Rebase
Part 3 Git Conflicts

Image by Flickr @Tamer Shlash