Lecture 1: Motivation


What is version control?


In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections of information. — Wikipedia


Version control systems (VCS)

… systems responsible for managing changes


Why use version control?


In an ideal world, things develop linearly:

  • Every new version is an improvement upon the previous version.
    • No need to backtrack.
  • Everyone knows what everyone else is doing
  • In the end, things are simply finished.
graph LR A@{ shape: stadium, label: "Monday's improvements"} --> B@{ shape: stadium, label: "Tuesday's improvements"} B --> C@{ shape: stadium, label: "Wednesday's improvements"}

In the real world, things develop non-linearly:

  • A new version can be anything between
    • a complete catastrophe and
    • a major breakthrough.
  • People do not know what others are doing
  • Sometimes we are simply fixing earlier mistakes…
graph LR Mon@{ shape: stadium, label: "Monday's improvements"} Tue@{ shape: stadium, label: "Tuesday's mistakes"} Wed@{ shape: stadium, label: "Wednesday's corrections"} Mon --> Tue Tue --> Wed

Going back to an earlier version

Sometimes, it is easier to simply backtrack to an earlier version

graph LR Mon@{ shape: stadium, label: "Monday's improvements"} Tue@{ shape: stadium, label: "Tuesday's mistakes"} Wed@{ shape: stadium, label: "Wednesday's improvements"} Mon --> Tue Mon --> Wed

Where is this earlier version?

  • CTRL + Z
  • my_file.txt, my_file.txt.old, …
  • My project/
    • 2020-08-12/
    • 2020-08-13/
  • Daily home directory backup

Challenges and obstacles

  • Prone to mistakes
    • CTRL + Z has limits, overwritten/deleted files, human/hardware error
  • How much to save?
    • Individual files? Everything? How much space is required?
  • How to organize versions?
    • What is the difference between different versions?

Overall, difficult to manage!


What about the granularity?

graph LR subgraph cluster1 [Monday's changes] t1a@{ shape: stadium, label: "Component A improvement"} t1b@{ shape: stadium, label: "Component B mistake"} t1c@{ shape: stadium, label: "Component C improvement"} end subgraph cluster2 [Tuesday's changes] t2a@{ shape: stadium, label: "Component A improvement"} t2b@{ shape: stadium, label: "Component B correction"} t2c@{ shape: stadium, label: "Component C mistake"} end subgraph cluster3 [Wednesday's changes] t3a@{ shape: stadium, label: "Component A mistake"} t3b@{ shape: stadium, label: "Component B improvement"} t3c@{ shape: stadium, label: "Component C correction"} end t1a --> t2a t1b --> t2b t1c --> t2c t2a --> t3a t2b --> t3b t2c --> t3c

This compounds the problems!


How does VCS solve this?

  • Stores the history using snapshots (commits)
    • Each snapshot represents the project at a given point in time
  • Manages snapshots and associated metadata
    • Naming (tags), comments, dates, authors, etc
  • Easy to move between different snapshots
  • Can handle different degrees of granularity
  • Can handle multiple development paths (branches)

Comparing and joining

  • VCS makes it easy to compare different snapshots
    • Named revisions, comments, time information, author information
    • Diff tools
    • Search tools
    • Bisection search
  • VCS also allows the joining (merging) of different snapshots
    • Easy to experiment with ideas

Collaboration

  • One of the primary functions of VCS is to allow collaboration
  • Usual setup: server (remote) + multiple clients
    • People work locally and send (push) the changes to the server
    • VCS keeps track of what has been done and by whom
  • Safer since mistakes can be easily remedied
  • The contributions of several people can be merged

Backup

  • VCS functions as a backup
  • Locally, the system maintains a copy of each file
    • Usually only the changes or the files that have changed are stored
  • Globally, lost files can be recovered from the server

Integration

  • VCSs such as Git have been integrated with several services
    • HackMD, Overleaf, …
  • Services such as GitHub can do almost everything for you
    • Store history, distribute, testing / continuous integration, bug reports, milestones, website, …

Summing up

Version control systems

  • keeps track of your files and other output
  • tracks what is created and modified
  • tracks who made the modifications
  • tracks why the modifications were made (if you make good commit comments)

Practical use cases

What are the practical use cases for VCS?


Source code

  • Many VCSs are designed for managing source code
  • Manage deployment (production, development, testing, etc)
  • Manage published versions (v0.1 etc)
  • Manage (experimental) features
  • Bug hunting
  • But also for: writers, artists, composers…

Latex files

  • Track which version of a manuscript has been
    • submitted,
    • revised and/or
    • accepted
  • Collaboration between several authors

HPC: batch files and data

  • Track different versions of your batch scripts
    • Easy to check the used configuration afterwards
  • Track input and output files
    • Limited to smallish files

Examples of VCS

  • SCCS: The first VCS. Created in 1972 at Bell Labs. Was available only for UNIX and worked with Source Code files only.
  • RCS (Revision Control System): First release July 1985. Usually superseded by other systems such as CVS, which began as a wrapper on top of RCS.
  • CVS (centralized version control system): First release July 1986; based on RCS. Expands on RCS by adding support for repository-level change tracking, and a client-server model.
  • Apache Subversion (SVN): First release in 2004 by CVS developers with the goal of replacing CVS.
  • BitKeeper: Initial release May 2000. Distributed version control. Was shortly used for developing the Linux kernel. Proprietary. No longer maintained.
  • Git: Started by Linus Torvalds in April 2005, originally for developing the Linux kernel. Distributed version control. Open source.