Git – By example

So, here we go again!

Our purpose here is to present and make a brief walk through this great VCS, Git. Initially we introduce the concept of a Version Control System and justify it’s use. Then, we talk a little about the most common Git workflow.

In this guide we focus on users running GNU/Linux and interested in command line. This is just the author’s preference and should not be understood as the only way (or even the best way) of using Git.  Moreover, we do not cover the installation process, since it may be different for each system (e.g: apt-get install git, for Ubuntu, pacman -S git, for Arch, and so on). So, before continue, take a look at this page or some equivalent for your distribution.

What is a Version Control System (VCS) and why should you care?

A version control is a system that records changes to a file (or set of files) over time. This way, you can recall specific versions later. Moreover, it makes easier to work with another developers, revert files back to a previous state, compare changes over time, see who modified what and when, and more. If you screw things up, you can easily step back with a very little overhead.

What is Git?

In April 2005, Linus Torvalds, the same one who created Linux some years before, published the very first version of Git.

As the Linux Kernel is a very big and complex project, it needed a proper VCS. From 2002 to 2005, the community used BitKeeper, a proprietary VCS, to the entire project. However, the relationship between the company who owned the software and the Linux community broke down.

Git was born in this scenario, and since it had to support the entire Linux kernel project, it was designed from day one to be incredibly fast, very efficient with large projects and also great with non-linear development.

Nowadays, Git is the most used VCS.

Git three states

Some say that this is the main thing to remember about Git. Others say that, if you need to remember something, then remember this. I just say that this is important.

Git has three main states that your files may reside in.

  1. modified: the data was changed but not committed.
  2. staged: the modified data is marked to go into your next commit snapshot.
  3. committed: the data is stored in your local database.

Also, we can define the untracked 

state, where the file isn’t even tracked by the repository.

Git lifecycle
The lifecycle of the status of your files. Image from Git documentation (http://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository).

The basic Git workflow goes something like this:

  1. You modify files.
  2. You stage the files.
  3. You do commit.

Make your bed and wash your teeth

Initially, you will want to set your identity. This is important because you will use this information in every commit. Moreover, you may also want to change the default text editor and others settings.

We can do this using git config command. For the settings mentioned, we do:

$ git config --global user.name "Your name"

$ git config --global user.email "your@email.com"

Git uses, as default editor, nano. If you want to change it for something else, e.g. vim, you use:

$ git config --global core.editor vim

Tip: try using tab, or your autocomplete shortcut, to explore all git config possibilities. Or just read man git config.

Take off

Now we will cover the case where you want to create a brand new repository (the place where everything will be storaged).  So, your code may or may not exist but you will initialize your commits from scratch.

First, we need a root folder. We will create ours in git/helloworld/.

$ mkdir -pv ~/git/helloworld

$ cd ~/git/helloworld

Great. Now we need to initialize the repository.

$ git init

You can check if everything is ok doing:

$ git status

Your output should be something like this:

On branch master

Initial commit

nothing to commit (create/copy files and use “git add” to track)

Adding a remote repository

There are some places over the internet that you may use to store a copy of your repository. You may do this to backup or share your code; or any other reason you can imagine. We mention GitHub and BitBucket, but there is a lot of options. You may even use some local directory for that.

Everything about add/removing/managing the remote repositories of your code refers to git remote. We will use as example our test repository at GitHub https://github.com/pdroalves/helloworld. Again, for simplicity, at this article we will not cover GitHub stuff, like creating repositories or setting up your account.

To add a new repository, we use it’s address https://github.com/pdroalves/helloworld.git and do:

$ git remote add origin https://github.com/pdroalves/helloworld.git

This will add to your local repository, a reference to a remote one that will be referenced by “origin”. If some day you want to remove it,

$ git remote rm origin

Notice that, both repositories must be already created before you link your local one. Moreover, you may replace the GitHub link in our example for some local folder, like git/helloworld.

To check your repositories, use:

$ git remote -v

Cloning a repository

Great, we know how to create and add new repositories. But what if we want to download a existing repository? This is done by git clone.

$ git clone https://github.com/pdroalves/helloworld.git

This will create a directory with repository name (e.g., helloworld), download all repository data and add the original address in remote repositories list as “origin”.

add, commit, modify, add, commit, …

Now we have a Git repository. Let’s add something to it. I propose a very basic, traditional (and useless) Python’s hello world.

$ echo print \"Hello world\!\" > hello_world.py

At this moment, hello_world.py is in untracked state. Let’s add it to the repository.

$ git add hello_world.py

Good. Now it it in staged state. Do “git status” again and see with your own eyes. It will be taken in next commit. Commit it!

$ git commit -m "Initial commit."

Great. Your first commit! Check it.

$ git log

Now let’s try something else. Make some change to hello_world.py and add it to staging area.

$ echo print \"I see you o/\" >> hello_world.py

$ git add hello_world.py

And do another change, but this time do not add it.

$ echo print \"Bye world.\" >> hello_world.py

Doing git status again, we have:

On branch master
Changes to be committed:
(use “git reset HEAD <file>…” to unstage)

modified: hello_world.py

Changes not staged for commit:
(use “git add <file>…” to update what will be committed)
(use “git checkout — <file>…” to discard changes in working directory)

modified: hello_world.py

Now we have three versions of hello_world.py. The first one, and oldest, is committed in our repository. The second, is in staging area waiting to be sent to commit or changed. The third and last, is in modified state. If you commit right now, it will commit the first modified version (with “I see you o/” in it).

However, if you want most recent version, you need to add it again to staging area. This happens because, when a file is added to stage state, Git makes a copy of it and put in someplace safe (but not as safe as in commit state).

$ git add hello_world.py

$ git commit -m "My first modification."

And then we have again only one version of our file.

Tip: As you may see, in this article we always use -m parameter with git commit. This is a didactic option and makes easier for us to explain some stuff. In daily work, this often is a very bad practice. The very purpose of commit’s logs is to explain to someone else (or even your future “you”) what you did in that commit, and -m makes you want to write just a few words instead of something more enlightening. In daily works, get used to leave aside this parameter and/or don’t be lazy to write explaining what-the-hell you did with that commit.

Pushing and pulling

If you are working with a remote repository, you will obviously want to send and receive new commits to it.

Usually you will want to first receive any new commit that was made while you were working. You do this pulling from your repository.

$ git pull origin

This step may imply in a merge. We will talk about it later. Hold on!

If everything is ok, now we push our new commits.

$ git push origin

Let’s take a branch

Until now, we only considered a very basic and unusual scenario  where the development process is a straight and lonely experience. In the real world, we have something much less predictable and behaved.

Again, for simplicity, I won’t talk about implementation details or anything much complex about how Git works. However, to understand branching, we need to talk about commits hierarchy.

http://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging
Some commits. Notice how a commit stores a reference to the commit before. Image from Git documentation (http://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository).

Every commit we create holds a pointer to the commit it succeed. The main lines of commits of our project, is called by default “master” and set as our default branch.

However, let’s suppose the following context.

  1. You have a working code.
  2. You start working add a new feature.
  3. You receive a call about another critical issue that need a hotfix immediately.

If you are a careful developer, you know that it isn’t a good practice to just let your current job as is, change to another part of the code to fix the issue and then came back to the original feature you were implementing. Sometimes this can bring you some new bugs. Sometimes this can prevent you for testing your hotfix efficiently. Sometimes this can prevent you even deploying your hotfix.

As solution for this, we work with branchs. The usual workflow for this is:

  1. You have a working code.
  2. You create a new branchcalled new_amazing_feature.
  3. You start working add a new feature.
  4. You receive a call about another critical issue that need a hotfix immediately.
  5. You commit your progress and checkout back to branch master.
  6. You implement and commit the hotfix to branch master.
  7. You checkout back to branch new_amazing_feature and resume the work.
  8. At the end, you commit your completed feature and merge it with the branch master.
git-branching-A
We create a branch for the hotfix. Image from Git documentation (http://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository).
After the branch purpose is satisfied, we merge it back to master branch.  Image from Git documentation (http://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository).
After the branch purpose is satisfied, we merge it back to master branch. Image from Git documentation (http://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository).

In a scenario where we have a lot of developers working in different parts of a project, you might imagine this branching system making our life is much more easier.

But, what about doing this in practice?

First, we create a new branch and checkout to it. We will use git checkout -b <new_branch_name> for it. This way we create and immediately checkout to this new branch (otherwise, if you just want to create it without checking out, use git branch).

$ git checkout -b new_amazing_feature

Do your work.

$ ...

Wow! Let’s get back to master branch and create a hotfix’s branch

$ git checkout master

$ git checkout -b hotfix

Do your work.

$ ...

Good. At this moment, we have three branchs. Check it using:

$ git branch

You should see something like it:

* hotfix
master
new_amazing_feature

Do your work, wrote your hotfix and commit it in hotfix branch. Now, let’s merge it to master, destroy hotfix branch and continue our work at new_amazing_feature.

$ git checkout master

$ git merge hotfix

$ git branch -d hotfix

$ git checkout new_amazing_feature

Merging

It’s almost stupid talking about branching and not about merging.

Suppose you are working in a branch where some line X is modified from the master branch. Also suppose that your colleague is also working with that line, but in another branch. He finishes his work. You finish your work. He sends his work to the repository. You send your work to the repository. Well. You can imagine what is coming, right?

At this point, someone will have to decide what version of line X will be merged to master branch. This process is called merge.

Unfortunately, it is not practical to walk through this example using a second person. So, we will force this situation by ourselfs.

Create a new repository and initialize it with the following file main.c:

#include <stdio.h>

int main(){

printf(“Hello world!”);

return 0;

}

Do:

$ git init

$ git add main.c

$ git commit -m "Initial commit."

Now, let’s work in two parallel branchs.

$ git checkout -b branch_A

Change main.c content to:

#include <stdio.h>

int main(){

printf(“Hello dude!”);

return 0;

}

add to staging area and commit it.

$ git add main.c

$ git commit -m "Changed the output to Hello dude"

Ok. Now, create a new branch from master.

$ git checkout master

$ git checkout -b branch_B

Notice that main.c content is the original one. Change it to:

#include <stdio.h>

int main(){

printf(“Hello mom!”);

return 0;

}

add to staging area and commit it.

$ git add main.c

$ git commit -m "Changed the output to Hello mom"

Aha! Here we go. Come back to master and merge branch_A. We do this through git merge.

$ git checkout master

$ git merge branch_A

Check main.c. Now it has our first modification. Repeat this, but now for branch_B.

$ git merge branch_B

Boom! Git tried to merge branch_B to master, but it noticed that both branchs had the same commit as root and modified some stuff in parallel. Using the default merging strategy, Git couldn’t solve this problem and passed it to the most inteligent person it had access. You!

You probably received this error:

Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Automatic merge failed; fix conflicts and then commit the result.

It told you the conflict situation and what files couldn’t be automatically merged. When you open main.c, you will see something like this:

#include <stdio.h>

int main(){

<<<<<<< HEAD
printf(“Hello dude!”);
=======
printf(“Hello mom!”);
>>>>>>> branch_B

return 0;

}

Git uses “>” and “<” to mark where we have a conflict. The first part holds the content in the branch that receives the merge. The second one, hold the content in the branch that is being merge. Choose what should stay in the merge branch. I will say hello to mom!

#include <stdio.h>

int main(){

printf(“Hello mom!”);

return 0;

}

add to staging area and commit it.

$ git add main.c

$ git commit -m "Merge."

Done. You solved the merge! Good work. If you look at git log, you may see something like this:

commit 162e0033c51bf59ed45ae50b0cbc47b2747092d8
Merge: 40e23c4 1844f02
Author: Pedro Alves <pdroalves@gmail.com>
Date: Tue Mar 3 01:03:07 2015 -0300

Merge.

commit 1844f02cb3857c99bc1446edd7216d68b18992a1
Author: Pedro Alves <pdroalves@gmail.com>
Date: Tue Mar 3 00:56:12 2015 -0300

Changed the output to Hello mom

commit 40e23c44ecb574731857a258f1f901169ff1edcd
Author: Pedro Alves <pdroalves@gmail.com>
Date: Tue Mar 3 00:55:14 2015 -0300

Changed the output to Hello dude!

commit 052ad3a49feb6fc78aaeae6971be5cf054612be5
Author: Pedro Alves <pdroalves@gmail.com>
Date: Tue Mar 3 00:54:15 2015 -0300

Initial commit.

Notice that the most recent commit has the mark “Merge”.

Merging may be easy like this or hard as killing Wolverine. So, there are some good tools that you may use to make your life easy. Google it!

And that’s it?

No! Git has a lot of important features that we didn’t cover in this article. However, we presented the commands and the concepts that you should completely understand before playing with more advanced stuff.

References

We would like to encorage you to take a look at Pro Git, by Scott Chacon and Ben Straub. It is an amazing book that is used as Git’s official documentation. This book is licensed under the Creative Commons and is available in a lot of languages (even portuguese).

Advertisements

2 thoughts on “Git – By example

  1. found some typos:
    see whom modified # who
    As Linux Kernel # the
    commited # committed
    Also, we can also # dup
    Now we will over the case # cover
    s/initiate/initialize/g
    or setting your account. # setting up
    waiting to be send # sent
    if you want most recently version # the most recent
    to take a look in # look at
    It is an amazing book that are # are->is

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s