The Three Sections of a Git Project

To understand Git, one must understand its workflow, which is centered around three main sections: the Working Directory, the Staging Area (also known as the Index), and the Git Directory (Repository).

Working Directory: A single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on your disk for you to use or modify.
Staging Area: A file, generally contained in your Git directory, that stores information about what will go into your next commit. It is a “technical middleman” that allows you to craft commits precisely.
Git Directory (.git): This is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

System Diagram

The Git Object Model

Under the hood, Git is essentially a content-addressable filesystem. It is a simple key-value store. When you insert any piece of content into the Git repository, it gives you back a unique key (the SHA-1 hash) that you can use to retrieve that content.

There are three primary types of objects in the Git database:

1. Blobs (Binary Large Objects)

A blob stores the file data, but not the file name or any metadata. If two files have the exact same content, they will share the same blob in the Git database, regardless of their names.

2. Trees

A tree solves the problem of storing filenames and also allows you to group files together. One tree object contains a list of entries, each of which is the SHA-1 hash of a blob or another tree, along with its associated mode, type, and filename. This is analogous to a directory in a filesystem.

3. Commits

A commit object points to a single tree, marking what the project looked like at that point in time. It also contains the author, the committer, a timestamp, a message, and pointers to the parent commit(s).

System Diagram

The `.git` Directory Structure

If you look inside the hidden .git folder in any repository, you will see the mechanics of how Git works:

config: Project-specific configuration settings.
description: Used by the GitWeb program.
HEAD: Points to the branch you currently have checked out.
hooks/: Scripts that run on certain events (e.g., pre-commit).
info/: A global exclude file for ignored patterns.
objects/: The heart of Git—all the blobs, trees, and commits.
refs/: Pointers to master/main, tags, and remotes.

Immutable Data

One of the most powerful aspects of Git’s architecture is that objects are immutable. Once a blob or a commit is written to the database, it cannot be changed. If you modify a file and commit it, Git creates a new blob and a new commit. The old ones remain in the database (until pruned by garbage collection), which is why it is so difficult to truly lose data in Git once it has been committed.

Git Internals Quiz

1 / 2

Which of these is NOT a primary Git object type?

Runtime Environment

Inspecting the Object Database

1# In a real repository, you would use cat-file to see object details

2# git cat-file -p <hash> returns the content of an object

3echo "Hello Git" | git hash-object --stdin

System Console

Waiting for signal...

Git Architecture: The Three Stages and Internals