Version control, source control, total control

When assisting companies with their development process the topic of version control inevitably comes up. While developers have very strong opinions about version control, project leads are often ambivalent about the subject, even though they should not be. Simply put, Git is an open-source, distributed version control system that makes it possible for software development teams to each maintain local copies of a project's codebase. 

As with any other facet of a project, it is vital that project leads understand how a change in version control techniques affects the dynamics and the extra work that may result. In Part 1 of the article series, we will "flip the script" and interrogate Git's implications to the team and overall organization from an interpersonal perspective. If we do not understand the full implications of something we are doing, we could unwittingly create obstacles for ourselves and others.

 

Source control is a complex subject, especially as it relates to Git. A good place to begin is to introduce some key concepts and terminology and review what excites your developers about a proposed move to Git (or why they may be dreading the switch). 

  • Version control: Any system that tracks how things change from one edition to another. Modern versions of MS Word® technically have version control, since it can show you how versions change over time. Good version control can display who did the edits, what edits were done, and can retrieve old versions on command.
  • Source control: This term usually means version control for source code, but it is often used interchangeably with version control. This article will continue that pattern, but please note that the distinction may be important to your work.  For example, version control of architecture drawing files looks nothing like source code version control (like Git).
  • Commit: A chunk of work registered to source control. It need not be entire files; most good source control will detect changes in already existing files. The implications of what happens to registered work depend on the source control being used.
  • Branch: A named sequential group of commits. As someone does their work, they will add their commits to one or more branches. The top of the branch is representative of the current state of the work for that branch.  A team will agree upon a central, "canonical" branch to base new work off and track accepted finished work.  Tradition dictates that this branch be called "master" or "main", but it can be called anything.
  • Merge: Combines the work of two or more branches. Most source control will merge the work together so it is representative of both branches, though sometimes it needs help determining how things should look.  If two branches touch the same part of a file in different ways, a dreaded merge conflict results. Merge conflicts need manual human intervention to determine how the merge will ultimately result.  Some file types cannot be merged, and version control will make you choose one of the versions.
  • Checkout: Declares to version control, "I want to use this branch now." Git is more flexible than other solutions in this regard. Git is unique in that it allows users to check out any identifiable location, file, or even repos, but checking out a branch is the default assumed usage.

This video gives more information and gives the opportunity to gauge how well you know Git

What Git is, why Git is

Git is a version control platform that came about after 2005 as a replacement source control solution to track Linux kernel development. The Linux kernel development aspect is important here: hundreds of developers submit their changes over email, and random people need to be able to see the current state of the code to meet the needs of open-source development.

To meet these needs, Git had to align with some specific goals:

  • Allow for mass distribution of the code, with confidence.
  • Allow for mass updating of the current state of the code without taking snapshots or manual releases.
  • Prove to yourself and others that the code you have downloaded is, in fact, the code everyone else has.
  • Mix and match branches as needed.
  • Enable curated distributed development (i.e., track personal changes locally and only make it available to others when you are ready).

As it turns out, Git was also bringing to the world features that other source control platforms were not providing at the time. Below are just a few examples of those Git features:

  • Runs completely locally if you so choose. Bedroom coders and other lightweight users don't need to worry about the overhead of a central server if they don't want it.
  • Runs on just about anything. It was originally created for Linux distributions but, through community efforts, it has been ported to Windows® and just about any OS under the sun. Best of all, all operating system (OS) versions can collaborate with each other.
  • Is highly controllable; history can be sliced and remade, and files can be committed or ignored. If you suspect it can be done, it probably can be.
  • Branch switching is safe and reliable.  Switching to a different branch will change all tracked files to the state they should be in on the destination branch.  Most other version control tools don't do this.
  • Hosting is highly portable.  If a repo hosting solution is not working for you, it is trivial to move a code base to any other hosting solution that can host Git repositories.
  • Is, and will always be, free and open source.

It could be said with conviction that Git helped the open-source code boom in the last two (2) decades. Allowing people to chip into projects without huge download commitments or licensing fees has done wonders for the global code base. It was only a matter of time before enterprise applications would need to show themselves.

Git for enterprise

Companies that host their own code will likely already have a source control solution in place and may be wondering what the big deal is. Apache Subversion®, or Microsoft's Team Foundation Server® may be names that you have heard from your team's coders. Why deal with the cost of changing these established tools and processes for this other system? As it turns out, the reasons Git is great for open-source coding mirror nicely to enterprise coding needs. 

History from the same book

One of Git's primary assurances is that if one person checks out code, it is the same exact code that someone else has committed. If there are differences beyond that, it is because someone made the conscious choice to exclude something, or the machine itself is doing something different. Most other version control solutions do not offer this guarantee. 

Infinite streams of work

Nothing technical prevents all your team members from working on their own branch, and then synching up later and merging their work. Most Git-centric workflows heavily leverage this ability. Alternate versions of the product can be trivial to make, and with some finesse, can be created by slicing commits into new branches. 

It also goes the other way: team members can experiment on their own and distribute their experiments to others without fear of disturbing the product proper; this is a significant advantage over systems like Apache Subversion.

Automation

Git has a lot of points that allow for automation, both so it can be automated and that it may trigger automated processes. Programs like Jenkins have had tools to hook into, modify, and manipulate a Git repository for more than a decade. 

Since Git can perfectly reproduce code bases, it is not much of a leap to automatically test code upon detection of a change. This can make the product safer and help prevent massive, showstopping, expensive faults. 

Safety

A big factor Git brought to the programming world was the ability to make the practice of coding safer. If a potential solution isn't working out, just roll back to the last time it was working. If there are two potential solutions that might work, implement both and choose the one you like more. A team that feels safe to try things is more willing to experiment and innovate, leading to bigger wins for everyone.

This concept of safety extends to Git itself: if my local code is deleted, or I do something weird to my commits, it is trivial to simply grab from the team server the canonical version of the code. If an absolute catastrophe happens and the main code server is lost, every team member has a perfect local copy that can become a source to rebuild from. 

Collaboration

If your team has a code base that serves the interests of multiple other teams, and you trust them, you can open that code base to them with whatever access pattern you see fit. If you trust those other teams completely, you can open the codebase to them as collaborators, and they can commit directly to your codebase seamlessly.  Worst case, if they do something you don't like, their commits can be sliced out.

Or, maybe you like their contributions to have a more structured intake process.  Almost every Git hosting solution has a form of a Pull Request, or a formal request to merge a branch into the canonical central branch.  This allows the team to review and change the branch in question before it integrates with your work.  Good hosting platforms allow for commenting on code, automated test triggers, and other neat features centered around Pull Requests. 

These tools slice both ways too; you can also opt to selectively lock code down to different levels with most enterprise hosting solutions.  You can make the code read-only, limit visibility to certain users or groups, or even lock certain branches to everyone, including your own team so that the only way to change canonical branches is through Pull Requests. 

Git Is not perfect

As much as I'll sing the praises of Git, it is still just a tool. Every tool has a right use and an ocean of poor usage. You may even hear some of these points from your developers at some point, so be ready to discuss these points if you hope to make the migration go through.

Git is primarily for text

If your work deals primarily in Photoshop files, CAD drawings, sound files, or other non-text assets, Git is probably not the solution for you. There are ways to make using these files with Git less onerous, but there are other, better tools out there for non-text assets. 

The problem comes down to the fact that Git does not have a good way to compare non-text files, so it will keep a copy of each committed version. This can eat storage space over time, and it will be especially painful if you decide to push it up the network for others to use.

Git has a steep learning curve

If your team was used to Apache Subversion or TFS, which only really has checkout, update, and commit as a concept, things are about to get an order of magnitude more complicated with Git. Checkout, stage, add, pull, push, fetch, reset, commit, branch, and merge are all the basic concepts required for usage of the tool. 

On top of that, to make full use of Git, team members need to be familiar with the command line. There are subtleties in how commands are formatted that can wildly change what the result is. Also, Git being a Linux tool, the commands are built to be used together with command line tools. Locating a branch in a long list of branches is easiest when someone knows how to filter output in BASH, for example. 

Git may be free, but hosting a team's code often is not

When a team is the first in a company to use Git, they are going to be hit with a lot of questions they need to answer in short order. Where should we host our code? Should we do it on-premises? Should we pay for a service like Bitbucket or GitHub ? What are the security implications of these choices? Who is going to pay for it?

No matter what you choose, someone is also going to have to administer the thing.  That is going to look different if you do it on premises or on the cloud. Someone is going to have to set up identities, keep their eyes on uptime, and manage the day to day of hosting a content server full of some of your company's most valuable assets. 

Git will happily tie itself into knots and hand you more rope

Using Git is, for better or for worse, a freeform experience. Nothing is stopping one of your team members from doing all their work in a single commit, or everyone sharing the same branch, or team members overwriting each other's work. Git does provide mechanisms to mitigate most of these problems but using them is an opt-in process. Successful teams, in my experience, will agree to a standard protocol that describes expectations on how work is to be submitted to version control. 

It is not hard to run into an instance of Git that was misconfigured and refuses to work properly. If a strange issue strikes and there is no one experienced to help, it can easily cripple a team member's contribution to the code base. 

The journey continues

In this article, we've gone over why your development team might study the technical considerations of making the version control switch to Git. But as a team lead, there are a lot of questions you need to answer. I've alluded to some, but in the next article, I'll go over some of the unknowns that need to be answered, problems that need to be solved, and fights that will need to be won.  

In the meantime, be sure to check out this article to learn more about Git automation and pro-tips

Technologies