Git and Distributed Version Control

Master the art of version control with Git. From basic workflows to advanced branching strategies and internals used in professional DevOps environments.

Official Documentation

February 2026

Foundations

The Evolution of Version Control Systems
Environment Setup and Configuration
Git Architecture: The Three Stages and Internals
The Standard Git Workflow
Branching and Merging Fundamentals

Foundations

Section Detail

The Evolution of Version Control Systems

The Rationale for Version Control

In modern software engineering, the ability to track changes, collaborate across disparate teams, and maintain a historical record of a project is not merely a convenience—it is a foundational requirement. Version Control Systems (VCS) provide a mechanism for managing changes to source code over time. Without such systems, developers would be forced to manually manage file copies (e.g., project_v1, project_final_v2), which is inherently error-prone and lacks the granularity required for complex systems.

Centralized vs. Distributed Models

Historically, VCS was divided into two primary architectures: Centralized and Distributed.

Centralized Version Control (CVCS)

Systems like Subversion (SVN) and Perforce rely on a single central server that contains all the versioned files. Clients check out files from that central place. This model offers a single point of authority and fine-grained access control. However, it introduces a single point of failure: if the server goes down, collaboration ceases, and if the disk is corrupted without proper backups, the entire history is lost.

Distributed Version Control (DVCS)

Git belongs to the distributed category. In a DVCS, every client maintains a full clone of the repository, including the entire history. This redundancy ensures that if any server dies, any client repository can be used to restore the system. Furthermore, most operations are local, providing significant performance advantages.

System Diagram

The Genesis of Git

Git was created in 2005 by Linus Torvalds during the development of the Linux kernel, following the loss of access to BitKeeper. Torvalds designed Git with several non-negotiable goals:

Speed and Efficiency: Operations must be nearly instantaneous.
Robust Design: Simple data structures that ensure reliability.
Non-linear Development: Seamless support for parallel branching.
Fully Distributed: No reliance on a central server for core operations.
Data Integrity: Cryptographic protection against corruption.

Snapshots, Not Deltas

Unlike older VCS that store deltas (file changes), Git captures snapshots of the entire filesystem. When you commit, Git records what every file looks like at that moment. If a file has not changed, Git simply stores a link to the previous version, significantly optimizing storage and retrieval.

Data Integrity

Git utilizes SHA-1 hashes to identify content. Every file or directory is referred to by its checksum, making it impossible to alter the records without Git detecting the change. A commit is identified by a 40-character hexadecimal string, ensuring a permanent and verifiable state.

VCS Concepts Check

1 / 2

Which characteristic primarily distinguishes a Distributed VCS from a Centralized VCS?

Performance Considerations

Because Git stores the entire history locally, most operations look like they are instantaneous. For example, to browse the history of a project, Git doesn’t need to go to the server to get the log—it simply reads it directly from your local database. This architecture enables a workflow where developers can commit frequently and experiment with branches without overhead.

Runtime Environment

Understanding the Hash

1# Simulating how Git would generate a hash for a content

2echo "Initial Content" | openssl sha1

System Console

Waiting for signal...

Section Detail

Environment Setup and Configuration

Multi-Platform Installation

Git is designed to be portable across all POSIX-compliant systems and Windows. In academic and professional settings, you will likely encounter a heterogeneous environment where developers use different operating systems.

Installation via Package Managers

The preferred method for installing Git is through the system’s native package manager to ensure compatibility and easy updates.

Linux and BSD

On Debian-based systems (Ubuntu, Mint):

sudo apt update && sudo apt install git

On Red Hat-based systems (Fedora, RHEL):

sudo dnf install git

On Arch Linux:

sudo pacman -S git

On FreeBSD:

pkg install git

macOS

While macOS comes with a version of Git installed via Xcode Command Line Tools, many developers prefer the more up-to-date version from Homebrew:

brew install git

Windows

For Windows, Git for Windows (also known as Git Bash) is the standard. It provides a BASH emulation environment which is critical for maintaining script compatibility across teams. You can install it via Winget:

winget install --id Git.Git -e --source winget

Initial Configuration: The Identity

Git records the identity of the author for every commit. This is not for authentication, but for accountability and metadata. These settings are stored in the ~/.gitconfig file (or %USERPROFILE%\.gitconfig on Windows).

git config --global user.name "Leonardo da Vinci"
git config --global user.email "leo@renaissance.org"

The --global flag ensures that these settings are applied to every repository on your machine. For project-specific identities (e.g., using a work email for a specific repo), you can omit the flag while inside that repository.

The Problem of Line Endings: CRLF vs. LF

One of the most common issues in cross-platform development is how different operating systems handle the end of a line in a text file.

Windows: Uses Carriage Return (CR) and Line Feed (LF) together (\r\n).
Linux/macOS/BSD: Uses only Line Feed (LF) (\n).

If not managed, Git will see the change in line endings as a change to the entire file, leading to “merge hell.”

Configuration for Windows Users

You should configure Git to convert LF to CRLF when checking out code, and convert CRLF back to LF when committing:

git config --global core.autocrlf true

Configuration for Linux/macOS/BSD Users

You should ensure that Git only converts CRLF to LF on commit, but doesn’t do anything on checkout:

git config --global core.autocrlf input

Configuring the Default Editor

By default, Git may fall back to vi or vim. If you are not comfortable with modal editors, you should change it to a simpler one like nano or a code editor like VS Code:

# To use Nano
git config --global core.editor "nano"

# To use VS Code
git config --global core.editor "code --wait"

Interactive Lab

Verification of Configuration

# Command to list all active configurations
git config

Identity and Security

While user.name and user.email identify you in the history, they do not verify your identity. In a DevOps pipeline, you will typically use SSH Keys or Personal Access Tokens (PAT) for authentication with remotes like GitHub or GitLab.

Generating an SSH Key

If you prefer SSH for secure communication without typing passwords:

ssh-keygen -t ed25519 -C "your_email@example.com"

This generates a public/private key pair. You would then provide the public key (~/.ssh/id_ed25519.pub) to your Git hosting provider.

Installation & Config Quiz

1 / 2

Why is core.autocrlf important in a team with both Windows and Mac users?

Runtime Environment

Checking your Environment

1# Run this to see where Git is getting its configuration from

2git config --list --show-origin | head -n 5

System Console

Waiting for signal...

Section Detail

Git Architecture: The Three Stages and Internals

The Three Sections of a Git Project

To understand Git, one must understand its workflow, which is centered around three main sections: the Working Directory, the Staging Area (also known as the Index), and the Git Directory (Repository).

Working Directory: A single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on your disk for you to use or modify.
Staging Area: A file, generally contained in your Git directory, that stores information about what will go into your next commit. It is a “technical middleman” that allows you to craft commits precisely.
Git Directory (.git): This is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

System Diagram

The Git Object Model

Under the hood, Git is essentially a content-addressable filesystem. It is a simple key-value store. When you insert any piece of content into the Git repository, it gives you back a unique key (the SHA-1 hash) that you can use to retrieve that content.

There are three primary types of objects in the Git database:

1. Blobs (Binary Large Objects)

A blob stores the file data, but not the file name or any metadata. If two files have the exact same content, they will share the same blob in the Git database, regardless of their names.

2. Trees

A tree solves the problem of storing filenames and also allows you to group files together. One tree object contains a list of entries, each of which is the SHA-1 hash of a blob or another tree, along with its associated mode, type, and filename. This is analogous to a directory in a filesystem.

3. Commits

A commit object points to a single tree, marking what the project looked like at that point in time. It also contains the author, the committer, a timestamp, a message, and pointers to the parent commit(s).

System Diagram

The `.git` Directory Structure

If you look inside the hidden .git folder in any repository, you will see the mechanics of how Git works:

config: Project-specific configuration settings.
description: Used by the GitWeb program.
HEAD: Points to the branch you currently have checked out.
hooks/: Scripts that run on certain events (e.g., pre-commit).
info/: A global exclude file for ignored patterns.
objects/: The heart of Git—all the blobs, trees, and commits.
refs/: Pointers to master/main, tags, and remotes.

Immutable Data

One of the most powerful aspects of Git’s architecture is that objects are immutable. Once a blob or a commit is written to the database, it cannot be changed. If you modify a file and commit it, Git creates a new blob and a new commit. The old ones remain in the database (until pruned by garbage collection), which is why it is so difficult to truly lose data in Git once it has been committed.

Git Internals Quiz

1 / 2

Which of these is NOT a primary Git object type?

Runtime Environment

Inspecting the Object Database

1# In a real repository, you would use cat-file to see object details

2# git cat-file -p <hash> returns the content of an object

3echo "Hello Git" | git hash-object --stdin

System Console

Waiting for signal...

Section Detail

The Standard Git Workflow

Initiating a Repository

The lifecycle of a Git project begins with the init command. This creates the .git directory and sets up the necessary infrastructure for tracking.

mkdir my-university-project
cd my-university-project
git init

By default, Git will create a branch (usually named master or main). From this point forward, Git will watch for changes in this directory.

Monitoring Status

The most frequently used command is git status. It provides a summary of which files are in which state (Tracked, Untracked, Modified, Staged).

git status

Staging and Committing

The workflow in Git is a two-step process: Staging and Committing.

Staging with `git add`

Staging allows you to group related changes together. You might have changed ten files, but only five of them are related to a specific bug fix. You can stage just those five:

git add file1.c file2.c

Committing with `git commit`

A commit is a permanent snapshot. It is critical to write descriptive commit messages that explain why a change was made, not just what was changed.

git commit -m "Refactor memory allocation in parser to prevent overflow"

Observing Changes

To see exactly what has changed in your files since the last commit (but before you stage them), use git diff:

git diff

Once staged, you can use git diff --staged to see what is ready to be committed.

Exploring History

The git log command displays the commit history in reverse chronological order.

git log --oneline --graph --decorate

Runtime Environment

Walking through a workflow

1# Simulation of a workflow

2# 1. Init

3# 2. Add file

4# 3. Commit

5git init

6echo "Hello" > README.md

7git add README.md

8git commit -m "Initialize project with README"

System Console

Waiting for signal...

Excluding Files: The `.gitignore`

In any project, there are files that you never want to track:

Compile artifacts (*.o, *.exe, bin/)
User-specific IDE settings (.vscode/, .idea/)
Sensitive information (.env, secrets.json)
Dependency folders (node_modules/, venv/)

A .gitignore file is a text file where each line contains a pattern for files/directories to ignore.

# Ignore all object files
*.o

# Ignore the build directory
/build/

# Ignore sensitive files
.env

Atomic Commits

A “Best Practice” in DevOps is the concept of Atomic Commits. Each commit should represent a single logical change. If you are halfway through a feature and you find a typo in a completely different part of the codebase, don’t include the typo fix in your feature commit. Commit them separately. This makes the history easier to read, revert, and debug (e.g., using git bisect).

Workflow Proficiency

1 / 2

Which command shows the difference between the staging area and the last commit?

Interactive Lab

Naming the Branch

# Find which branch you are currently on
git

Section Detail

Branching and Merging Fundamentals

The Lightweight Nature of Branches

In many older VCS, branching involved creating a full copy of the source code—which was slow and expensive. In Git, a branch is simply a lightweight, movable pointer to one of the commits in the repository. The default branch name is usually main. When you create a new branch, Git creates a new pointer; it does not duplicate any file content.

Each pointer is a tiny file (41 bytes) containing the 40-character SHA-1 checksum of the commit it points to.

System Diagram

Creating and Switching Branches

To create a new branch named testing:

git branch testing

However, creating a branch does not switch you to it. To start working on that branch, you must “check it out” or “switch” to it:

git switch testing

(Note: In older tutorials, you will see git checkout -b testing. Modern Git recommends git switch as it is more intuitive.)

The Mechanics of Merging

Merging is the process of bringing changes from one branch into another. There are two primary types of merges you will encounter.

1. Fast-Forward Merge

If the branch you are merging into is a direct ancestor of the branch you are merging (i.e., there have been no other commits on the base branch), Git simply moves the pointer forward. No new commit is created.

git switch main
git merge feature-x

2. Three-Way Merge

If the history has diverged (i.e., both main and feature-x have new, different commits), Git performs a three-way merge. It looks at three snapshots:

The common ancestor (the “Base”).
The tip of Branch A.
The tip of Branch B.

Git creates a new “Merge Commit” that has two parents.

System Diagram

Introduction to Merge Conflicts

A conflict occurs when the same line of the same file was modified in both branches being merged. Git will stop and ask you to resolve the conflict manually.

<<<<<<< HEAD
printf("Hello from Main\n");
=======
printf("Hello from Feature\n");
>>>>>>> feature-x

You must edit the file, choose the correct version (or combine them), remove the markers, and then git add and git commit to complete the merge.

Branching Strategy Quiz

1 / 2

What actually happens when you create a new branch in Git?

Interactive Lab

Switching and Creating

# Create and immediately switch to 'dev' branch
git switch  dev

Runtime Environment

Visualizing Branches

1# Simulating branch creation and log visualization

2git init

3git commit --allow-empty -m "Initial"

4git branch feature

5git log --oneline --decorate --all

System Console

Waiting for signal...

Git and Distributed Version Control

Contents

Foundations

Foundations

The Rationale for Version Control

Centralized vs. Distributed Models

Centralized Version Control (CVCS)

Distributed Version Control (DVCS)

The Genesis of Git

Snapshots, Not Deltas

Data Integrity

Which characteristic primarily distinguishes a Distributed VCS from a Centralized VCS?

Performance Considerations

Understanding the Hash

Multi-Platform Installation

Installation via Package Managers

Linux and BSD

macOS

Windows

Initial Configuration: The Identity

The Problem of Line Endings: CRLF vs. LF

Configuration for Windows Users

Configuration for Linux/macOS/BSD Users

Configuring the Default Editor

Verification of Configuration

Identity and Security

Generating an SSH Key

Why is core.autocrlf important in a team with both Windows and Mac users?

Checking your Environment

The Three Sections of a Git Project

The Git Object Model

1. Blobs (Binary Large Objects)

2. Trees

3. Commits

The .git Directory Structure

Immutable Data

Which of these is NOT a primary Git object type?

Inspecting the Object Database

Initiating a Repository

Monitoring Status

Staging and Committing

Staging with git add

Committing with git commit

Observing Changes

Exploring History

Walking through a workflow

Excluding Files: The .gitignore

Atomic Commits

Which command shows the difference between the staging area and the last commit?

Naming the Branch

The Lightweight Nature of Branches

Creating and Switching Branches

The Mechanics of Merging

1. Fast-Forward Merge

2. Three-Way Merge

Introduction to Merge Conflicts

What actually happens when you create a new branch in Git?

Switching and Creating

Visualizing Branches

The `.git` Directory Structure

Staging with `git add`

Committing with `git commit`

Excluding Files: The `.gitignore`