Lecture 3: Basic concepts¶
Remark¶
- You are not intended to memorize any commands or low-level details.
- The goal is to learn the basic concepts:
- hash sums, blobs, trees, commits, references, branches, …
- Understanding these concepts helps to understand what the commands actually do!
What is Git?¶
- Git is a distributed VCS:
- Does not rely on a server-client model.
- Instead, everyone has a full copy of the entire project (repository).
- Complete history, metadata, etc.
- People can work completely independently.
- An (optional) server is used only to distribute changes.
Why use Git?¶
- It is popular.
- Many project already use it, people know how to use it, people can tell you how to use it, …
- Relies on hash sums:
- Built-in data corruption detection.
- Built-in security.
- Distributed.
- Fast, simple and flexible.
- Free and open-source.
How does Git store the history?¶
What is inside a repository?¶
$ mkdir repository && cd repository
$ git init
Initialized empty Git repository in .../repository/.git/
$ find
Most directories are empty and the files are not that interesting:
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/description
Unnamed repository; edit this file 'description' to name the
repository.
Let’s add some content:
$ echo "This file is very interesting" > file.txt
$ git add file.txt
$ git commit -m "This is the first commit"
[master (root-commit) 23b3ed5] This is the first commit
1 file changed, 1 insertion(+)
create mode 100644 file.txt
$ find
Working tree¶
- Everything inside
repository/
is a part of the working tree (or the workspace)..git/
is not included.- At the moment, the working tree contains just one file,
file.txt
. - Working tree is just a regular directory.
- The
git add
andgit commit
commands tell Git to care aboutfile.txt
.- More on that later…
Objects¶
- Git stores files etc as objects:
- Objects are stored under
.git/objects/
.
- Objects are stored under
- Git uses content-based addressing.
- A hash sum is computed from the content of the object.
- The hash “uniquely” identifies the object.
- Two objects with identical contents have the same hash and are stored only once.
- We can compute the hash manually:
- We can find the corresponding object:
- We can confirm that two files with identical contents have the same hash:
$ cp file.txt file2.txt
$ git hash-object file.txt file2.txt
09c78e6e971ce9e3d69e75bcb3ffd5de05b0d59a
09c78e6e971ce9e3d69e75bcb3ffd5de05b0d59a
- Note that we do not have to use the entire hash:
- We only need to use as many characters as is required to uniquely identify the object.
- 7-8 is enough in most cases.
- 12 in larger projects.
- If more characters is required, an error message is printed.
- Objects cannot (and should not) be accessed directly:
$ hexdump -C ./.git/objects/09/c78e6e97*
00000000 78 01 4b ca c9 4f 52 30 .... |x.K..OR06`...,VH|
00000010 cb cc 49 55 00 d2 65 a9 .... |..IU..e.E...y%.E|
00000020 a9 c5 25 99 79 e9 5c 00 .... |..%.y.\..I.3|
0000002c
- However, we can observe the type and the content of an object:
- It is also important to realize that the object stays even when the file is removed:
$ rm file.txt
$ find
....
./.git/objects/09/c78e6e971ce9e3d69e75bcb3ffd5de05b0d59a
....
$ git cat-file -p 09c78e6e971ce9e3d69e75bcb3ffd5de05b0d59a
This file is very interesting
- We can restore the file from the object:
Let’s take a second look at the repository:
What are these two other objects?
Trees¶
- Let’s investigate one of the remaining objects:
$ git cat-file -t 1a098a06
tree
$ git cat-file -p 1a098a06
100644 blob 09c78e6e971ce9e3d69e75b.... file.txt
- We can see that the type of the object is tree:
- A tree stores pointers to
- files (blobs) and
- other trees,
- Trees are used to represent directory structures.
- A tree stores pointers to
In this case, the tree has one level and one blob:
This file is very interesting"] tree(["tree 1a098a06b...
blob 09c78e6e.... file.txt"]) --> first_blob
Let’s take a third look at the repository:
Just one object remains…
Commits¶
- Let’s investigate the last object:
$ git cat-file -t 23b3ed5b
commit
$ git cat-file -p 23b3ed5b
tree 1a098a06bf0bcae9695238d9d5cb96345c00cacf
author Mirko Myllykoski <....@gmail.com> 1600867851 +0200
committer Mirko Myllykoski <....@gmail.com> 1600867851 +0200
This is the first commit
- The type of the object is commit. It contains
- a pointer to a tree,
- an author and a committer (+time), and
- a commit message
A commit stores the state of the project in a given point of time.
In this case, the commit points to a tree that has one level and one blob:
This file is very interesting"] file["file.txt
This file is very interesting"] commit(["commit 23b3ed5b1...
tree 1a098a06b
Mirko Myll...
This is the first commit"]) --> tree(["tree 1a098a06b...
blob 09c78e6e.... file.txt"]) --> first_blob metadata(["metadata"]) --> repo(["repository/"]) --> file
In a more general case, the associated tree can contain several levels and multiple blobs:
Working with Git¶
Let’s see what else we can find…
HEAD and other references¶
HEAD
points (indirectly) to23b3ed5b1
:
$ cat ./.git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
23b3ed5b16095bb84b18d06734fdd614c8982841
This file is very interesting"] head --> master --> commit(["commit
23b3ed5b1..."]) --> tree(["tree
1a098a06b..."]) --> first_blob
HEAD
andmaster
are references.- A reference points to commits and another reference.
HEAD
determines “most recent” commit.- Many commands act on the current
HEAD
.
- Many commands act on the current
master
is the current branch (more later).
- You can create a reference yourself:
This file is very interesting"] first --> commit(["commit
23b3ed5b1..."]) --> tree(["tree
1a098a06b..."]) --> first_blob
Index (staging area)¶
Let’s repeat some of the earlier steps:
- The
git add
command creates a blob that correspond to the updatefile.txt
file.- No other object are created yet.
- The command also adds the file to the index.
- The index will become the next commit.
- Contains a representation of the tree object.
The index is a binary file:
We can now turn the index to the next commit:
$ git commit -m "This is the second commit"
[master d3c6c63] This is the second commit
1 file changed, 1 insertion(+)
$ find
- Just as before, we have a tree object that describes the directory structure:
- And a commit, that describes the state of the repository:
$ git cat-file -p d3c6c635
tree 22b5208bebacfcf745691f799b08df492b2a7da9
parent 23b3ed5b16095bb84b18d06734fdd614c8982841
author Mirko Myllykoski <mirko...> 1601228824 +0200
committer Mirko Myllykoski <mirko....> 1601228824 +0200
This is the second commit
Parent¶
- The major difference is that the commit contains a pointer to a parent:
- The parent pointer points to the previous commit:
tree 22b5208b
parent 23b3ed5b1
Mirko Myll..
This is the second commit"]) --> firstcommit(["commit 23b3ed5b1...
tree 1a098a06b
Mirko Myll...
This is the first commit"]) --> tree2(["tree 1a098a06b...
blob 09c78e6e.... file.txt"]) secondcommit --> tree1(["tree 22b5208b...
blob 3b23ff0c file.txt"]) secondblob["blob 3b23ff0c
This file is very interesting
More content"] firstblob["blob 09c78e6e...
This file is very interesting"] tree2 --> firstblob tree1 --> secondblob
Commit tree¶
- Usually, we have a complete tree of commits (commit tree):
- Each commit represents the state of the repository at a given point of time.
- Each commit is allowed to have multiple parents:
- These parents appear when two (or more) branches are merged.
HEAD and other references (again)¶
- Let’s investigate
HEAD
andmaster
:
$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
d3c6c635fb44c7084797d47050bff7961853c19b
tree 22b5208b
parent 23b3ed5b1
Mirko Myll..
This is the second commit"]) --> firstcommit(["commit 23b3ed5b1...
tree 1a098a06b
Mirko Myll...
This is the first commit"]) subgraph clusterworkingtree["Working tree"] clusterfile["file.txt
This file is very interesting
More content"] end
- Remember, many Git commands act on the current
HEAD
.
- We can change the
HEAD
to something else:
$ git checkout 23b3ed5b
....
HEAD is now at 23b3ed5 This is the first commit
$ cat .git/HEAD
23b3ed5b16095bb84b18d06734fdd614c8982841
$ cat file.txt
This file is very interesting
tree 22b5208b
parent 23b3ed5b1
Mirko Myll..
This is the second commit"]) --> first_commit(["commit 23b3ed5b1...
tree 1a098a06b
Mirko Myll...
This is the first commit"]) head --> first_commit subgraph cluster_working_tree["Working tree"] cluster_file["file.txt
This file is very interesting"] end
Branches¶
- We can modify the working tree and create a new commit:
$ echo "Different content" >> file.txt
$ git commit -a -m "This is the third commit"
[detached HEAD a118ae8] This is the third commit
1 file changed, 1 insertion(+)
- Let’s investigate the newly created commit:
$ git cat-file -p a118ae8c
tree 5fcc4f83fedf5a94cd773704bdb1ab2cdcadc6fd
parent 23b3ed5b16095bb84b18d06734fdd614c8982841
author Mirko Myllykoski <mirko....> 1601286412 +0200
committer Mirko Myllykoski <mirko....> 1601286412 +0200
This is the third commit
- First, the
parent
points to the first commit:
parent 23b3ed5b1...
This is the third commit"]) --> first_commit(["commit 23b3ed5b1...
This is the first commit"])
- Second, the commit tree now has two branches:
This file is very interesting
Different content"] second_blob["blob 3b23ff0c
This file is very interesting
More content"] first_blob["blob 09c78e6e...
This file is very interesting"] head --> third_commit(["commit a118ae8c...
This is the third commit"]) -.-> third_blob master --> second_commit(["commit d3c6c635...
This is the second commit"]) --> first_commit(["commit 23b3ed5b1...
This is the first commit"]) third_commit --> first_commit second_commit -.-> second_blob first_commit -.-> first_blob subgraph cluster_working_tree["Working tree"] cluster_file["file.txt
This file is very interesting
Different content"] end
We can give the second branch a name:
$ git checkout -b second_branch
Switched to a new branch 'second_branch'
$ cat .git/HEAD
ref: refs/heads/second_branch
$ cat .git/refs/heads/second_branch
a118ae8cda10a8f0a966ab7b9158b4a6d3b48cfc
This is the third commit"]) -.-> third_blob["blob ea5f4b8e
This file is very interesting
Different content"] second_commit(["commit d3c6c635...
This is the second commit"]) first_commit(["commit 23b3ed5b1...
This is the first commit"]) second_blob["blob 3b23ff0c
This file is very interesting
More content"] first_blob["blob 09c78e6e...
This file is very interesting"] second_commit -.-> second_blob first_commit -.-> first_blob third_commit --> first_commit second_commit --> first_commit head --> third_commit master --> second_commit subgraph cluster_working_tree["Working tree"] cluster_file["file.txt
This file is very interesting
Different content"] end
Merging¶
We can merge the two branches together:
$ git checkout master
$ git merge --no-ff second_branch
Auto-merging file.txt
CONFLICT (content): Merge conflict in file.txt
Automatic merge failed; fix conflicts and then commit the
result.
$ vim file.txt
We fix some conflicts at this point…
The created commit has two parents:
$ git cat-file -p f0d72989
tree f63f3a4c548f5065cee598bed4ae189bd2c099d8
parent d3c6c635fb44c7084797d47050bff7961853c19b
parent a118ae8cda10a8f0a966ab7b9158b4a6d3b48cfc
author Mirko Myllykoski <mirko....> 1601288485 +0200
committer Mirko Myllykoski <mirko....> 1601288485 +0200
Merge branch 'second_branch'
Finally, the tree looks like follows:
This file is very interesting
More content
Different content"] end
Merge branch 'second_branch'"]) third_commit(["commit a118ae8c...
This is the third commit"]) second_commit(["commit d3c6c635...
This is the second commit"]) first_commit(["commit 23b3ed5b1...
This is the first commit"]) fourth_blob["blob e51364b9
This file is very interesting
More content
Different content"] third_blob["blob ea5f4b8e
This file is very interesting
Different content"] second_blob["blob 3b23ff0c
This file is very interesting
More content"] first_blob["blob 09c78e6e...
This file is very interesting"] fourth_commit -.-> fourth_blob third_commit -.-> third_blob second_commit -.-> second_blob first_commit -.-> first_blob fourth_commit --> second_commit fourth_commit --> third_commit third_commit --> first_commit second_commit --> first_commit head --> fourth_commit master --> fourth_commit second_branch --> third_commit
Switching to a specific commit¶
We can always move back to any of the previous commits:
$ git checkout 23b3ed5b1
....
HEAD is now at 23b3ed5 This is the first commit
$ cat file.txt
This file is very interesting
This file is very interesting
More content
Different content"] end
Merge branch 'second_branch'"]) third_commit(["commit a118ae8c...
This is the third commit"]) --> first_commit(["commit 23b3ed5b1...
This is the first commit"]) second_commit(["commit d3c6c635...
This is the second commit"]) head --> first_commit fourth_blob["blob e51364b9
This file is very interesting
More content
Different content"] third_blob["blob ea5f4b8e
This file is very interesting
Different content"] second_blob["blob 3b23ff0c
This file is very interesting
More content"] first_blob["blob 09c78e6e...
This file is very interesting"] fourth_commit -.-> fourth_blob third_commit -.-> third_blob second_commit -.-> second_blob first_commit -.-> first_blob fourth_commit --> second_commit fourth_commit --> third_commit second_commit --> first_commit master --> fourth_commit second_branch --> third_commit
The end.
An idea: Try to play with the different commands. See what happens to the .git/
directory.