How Git Works Internally
Under the hood, Git is a content-addressable filesystem. Every piece of data (file content, directory structure, commit metadata) is stored as an object identified by a unique SHA-1 hash. These objects form a directed acyclic graph (DAG) that represents your entire project history.
Understanding internals isn't just academic — it makes rebase, cherry-pick, reset, and merge conflict resolution intuitive instead of magical.
The Four Object Types
Git stores everything as four types of objects inside the .git/objects/ directory:
flowchart TD
subgraph "Git Object Types"
B["blob<br/>(file content)"]
T["tree<br/>(directory listing)"]
C["commit<br/>(snapshot + metadata)"]
TG["tag<br/>(named pointer to commit)"]
end
C --> T
T --> B
T --> T2["tree (subdirectory)"]
T2 --> B2["blob"]
TG --> C
| Object | What It Stores | Analogy |
|---|---|---|
| Blob | Raw file content (no filename, no permissions) | A file's contents, nothing more |
| Tree | Directory listing — maps names to blobs and sub-trees | A folder listing with permissions |
| Commit | Points to a tree + author + message + parent commit(s) | A snapshot with a sticky note |
| Tag | Points to a commit with a name and optional message | A labeled bookmark |
SHA-1 Hashing
Every object gets a 40-character SHA-1 hash computed from its content:
# See the hash of a blob
echo "Hello World" | git hash-object --stdin
## Output: 557db03de997c86a4a028e1ebd3a1ceb225be238
# Same content → same hash (content-addressable)
echo "Hello World" | git hash-object --stdin
## Output: 557db03de997c86a4a028e1ebd3a1ceb225be238 (identical!)
Identical content = identical hash. If two files have the same content, Git stores only ONE blob. If you commit the same file unchanged, Git just reuses the existing blob. This is how Git stays efficient.
Anatomy of a Commit
A commit object contains:
commit <size>\0
tree 4b825dc642cb6eb9a060e54bf899d15fd71b3ed9 ← points to root tree
parent a1b2c3d4e5f6... ← points to previous commit
author donnyaw <donnyaw@gmail.com> 1708000000 +0800
committer donnyaw <donnyaw@gmail.com> 1708000000 +0800
Add user authentication module ← commit message
Visualizing a Commit
flowchart TD
C1["Commit c3d4e5<br/>msg: 'Add auth module'<br/>author: donnyaw"]
C1 --> T1["Tree a1b2c3<br/>(root directory)"]
T1 --> B1["Blob f1e2d3<br/>README.md"]
T1 --> T2["Tree b2c3d4<br/>src/"]
T2 --> B2["Blob d4e5f6<br/>auth.js"]
T2 --> B3["Blob e5f6a7<br/>login.js"]
C1 -->|parent| C0["Commit a1b2c3<br/>msg: 'Initial commit'"]
The Commit Graph (DAG)
Commits form a Directed Acyclic Graph — each commit points to its parent(s), creating a chain:
gitGraph
commit id: "Initial"
commit id: "Add README"
branch feature/login
commit id: "Add login form"
commit id: "Add auth logic"
checkout main
commit id: "Fix typo"
merge feature/login id: "Merge login"
commit id: "Release v1.0"
Key properties of the DAG:
- Directed — edges point from child to parent (newer to older)
- Acyclic — you can never follow parent pointers back to the same commit
- Immutable — once created, a commit's hash never changes (content-addressable)
Branches Are Just Pointers
A branch is nothing more than a file containing a commit hash:
# See what a branch really is
cat .git/refs/heads/main
## Output: a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
# It's just 40 characters — a pointer to a commit!
flowchart LR
HEAD["HEAD"] --> main["main"]
main --> C3["Commit 3"]
feature["feature/api"] --> C5["Commit 5"]
C3 --> C2["Commit 2"]
C5 --> C4["Commit 4"]
C4 --> C2
C2 --> C1["Commit 1"]
| Pointer | What It Points To | How It Moves |
|---|---|---|
Branch (main) | Latest commit on that branch | Advances with each new commit |
| HEAD | The currently checked-out branch | Changes when you git checkout |
Tag (v1.0) | A specific commit | Never moves (permanent marker) |
Creating a branch = writing 40 bytes to a file. That's why Git branches are instant, unlike SVN which copies entire directory trees.
Inside .git/
When you run git init, Git creates this structure:
.git/
├── HEAD ← Points to current branch (e.g., ref: refs/heads/main)
├── config ← Repository-level configuration
├── objects/ ← All blobs, trees, commits, tags (the database)
│ ├── pack/ ← Compressed object storage (for efficiency)
│ └── info/
├── refs/ ← Branch and tag pointers
│ ├── heads/ ← Local branches (main, feature/x)
│ ├── tags/ ← Tags (v1.0, v2.0)
│ └── remotes/ ← Remote-tracking branches (origin/main)
├── index ← The staging area (binary file)
├── hooks/ ← Scripts that run on events (pre-commit, etc.)
└── logs/ ← History of where HEAD and branches pointed
Explore It Yourself
# See what HEAD points to
cat .git/HEAD
## Output: ref: refs/heads/main
# See the commit hash that main points to
cat .git/refs/heads/main
## Output: a1b2c3d4e5f6...
# Inspect a commit object
git cat-file -p a1b2c3
## Output: tree, parent, author, committer, message
# Inspect a tree object
git cat-file -p <tree-hash>
## Output: list of blobs and sub-trees with permissions, names
# Inspect a blob
git cat-file -p <blob-hash>
## Output: raw file content
How Common Operations Map to Internals
| Command | What Happens Internally |
|---|---|
git add file.txt | Creates a blob object, updates the index |
git commit | Creates a tree (from index) + commit object pointing to it |
git branch feature | Writes the current commit hash to .git/refs/heads/feature |
git checkout feature | Updates HEAD + working directory to match the branch's commit |
git merge feature | Creates a new commit with two parent pointers |
git tag v1.0 | Writes a tag object (or ref) pointing to the current commit |
git push | Sends objects + refs to the remote repository |
git gc | Compresses loose objects into pack files for efficiency |
Best Practices
- Don't fear internals — Understanding objects and refs makes Git predictable
- Use
git cat-file— Inspect objects when debugging history issues - Never edit
.git/manually — Use Git commands; manual edits can corrupt the repo - Run
git gcperiodically — Keeps the object database optimized (Git does this automatically in most cases)
Mini Quiz
- What are the four Git object types?
- What does a branch actually store?
- If two files have identical content, how many blobs does Git create?
- What makes the commit graph a "DAG"?
- Blob, tree, commit, tag. 2. A 40-character SHA-1 hash pointing to a commit. 3. One — same content = same hash. 4. It's Directed (child→parent), Acyclic (can't loop back), and a Graph (multiple paths possible via branches and merges).
What's Next
- Installation & Setup — Get Git installed and configured
- Configuration — Set up your identity and preferences