The Layman's Guide to git Sync'ing

  • Posted on: 25 June 2016

This article primarily deals with syncing a local git repo with a remote, for beginners, or for people who can't get interested in the details of source control, or for people who want to be interested in the details of source control, at a later date, and are looking for an entry point.

I'll go over a little personal history here with respect to git; skip it if you feel like it.

History

Recently at my place of work, I suggested that we move to git.

We previously used Microsoft TFS. Prior to that, we used CVS. Since about 90% of our day-to-day was C++ programming in Visual Studio, TFS was a pretty big step up for us. Our sysadmin did the migration from CVS to TFS, and, as a dev., all was well for me. TFS was really a much better experience for me.

Then, one dark day, we had quite the downsizing. Our team was shrunk to four, and I was suddenly a developer/sysadmin. After a little while, I moved the team to git because: a.) I was sick of dealing with Microsoft. They're not really that bad, they're just sort of in the 'too big to function efficiently' category sometimes. I actually really liked TFS, except, b.) I couldn't figure out how to migrate TFS to a new machine, and, c.) I don't know how to sysadmin, so if the TFS server suddenly exploded, I had no solution to that problem.

With git, our team was its own backup mechanism, and I didn't have to understand the details of our MSDN subscription. We migrated to git.

After a while of working with git, the following became clear: a.) Dealing with the local repo is pretty easy, b.) Dealing with the remote is a little less straightforward. While TFS has a pretty 'correct' way to push/pull commits from the server, git is a little more flexible, which creates some confusion sometimes.

To help our team learn how to effectively work with the remote (we had a faux-central bare repo), I decided to suggest some workflows, based on the particular situation the developer is in. For the disinterested, this can be a quick guide to how to 'get the job done.' For people who want to explore git in detail, it can be a good leaping-off point.

Keep in mind I'm talking about a team of three or five here. Larger teams probably have different best-practices. Also, a small team new to git is probably going to work mostly on master at first. I have kept that in mind here.

Prereqs

git can be complicated sometimes. There are some great in-depth articles out there about the ins-and-outs of git.

For this article, it's mostly good enough to know the following:

  • with git, you have a mostly self-contained local repo which is pretty much on equal footing with the remote. There is no centralized server really, even if you choose to artificially specify a git repo as such. All this sync business is about 'matching-up' your repo and another git repo.
  • your git repo contains the state of the local repo, and also the working tree. The working tree is your current copy of the files in the repo. When you make changes to your working tree, they are not changes in the repo until you add and commit them.

We'll learn a bit more along the way.

Some Use-Cases for Sync'ing With a Remote

Here goes:

You have not changed anything in your repo or working tree since last syncing with the remote, and you want to pull updates (from master):

$git pull origin master

This fetches new changesets on the remote, and then merges them into your working tree. If you haven't made a commit since your last pull, and haven't modified any files, this should be mostly worry-free. You will have the most recent repo from the remote, and your working tree will be up to date.

The takeaway here is: you can always pull a branch hassle-free if you haven't made commits.

You have commits on your local repo and you want to sync with the remote

$git pull origin master
(possibly merge conflicts)
$git push origin master

This is equivalent to the 'Sync' button in Visual Studio's git plugin.

Remember in git, there is no central repo. So if other devs. have pushed to origin master while you were creating your local commits, you truly have two different repos that need to be merged together. This is why you need to pull, get the repo together, and then push the merged repo.

Let's just use this as our de-facto sync idiom. There is a whole pull vs. rebase argument going on if you want to search more about it, but let's keep it simple for now; just use pull/push.

You are about to leave the network and you want to update your repo, but you have a dirty working tree and/or commits you aren't ready to push yet.

$git fetch origin master
(continue working)
$git merge
(possibly merge conflicts)
$git push origin master

git fetch pulls the remote into your local repo, but doesn't touch your workspace. It lets you merge at a later time. In fact, git pull is just a git fetch followed by a git merge.

Of course, you must be back on the network when you push.

You want to update your repo, but you have a dirty working tree, AND your repo is untouched since last syncing. In other words, you are editing files, but haven't made a local commit yet.

If you like to sync one commit at a time, this is a pretty common case, and the workflow below is a good way to quickly integrate changes.

$git stash
$git pull origin master
$git stash pop

git stash basically stores your changes in your working tree (unadded, uncommitted changes) to some internal storage in git, and then cleans your working tree (ie, sets it to whatever your local repo is). Once you update your local repo with git pull, git stash pop reapplies these changes to your working tree. It is much akin to copying your modified files, undoing your changes locally, updating, and then pasting the modified files back.

If you're familiar with TFS, this is very much like a shelveset.

This idiom can also be combined with the synchronization methods mentioned above. For example, if your local repo is ahead of the remote by a few commits, you can do git stash, update your repo with git fetch / git merge, and then pop the stash back onto your working tree.

You are working on a change, and it's getting out of hand. Or, you want to use git as it should be used ;).

Branching is really the forte of git. Unlike TFS, a branch isn't a physical copy of all the files in the repo. In git, all a branch is, is a "pointer" back to the last common changeset with whatever branch you branched from. It is super light-weight. A branch isn't exposed to your entire team the instant you make it. You don't have to think ahead whether you want to branch or not. You don't need to undo or redo anything at all to make a branch in git, the new branch will be made from whatever modifications you have in your working tree. You should make a lot of branches. There is no reason to think twice about it.

For now, let's just talk about making a local branch, ie, you'll be the only one to ever work on it; the branch spends its entire life on your machine:

$git branch my_new_branch
$git checkout my_new_branch
(make commits on this branch, as normal, maybe over days or weeks, test, etc)
$git checkout master
$git pull origin master
$git merge my_new_branch
$git push origin master

Remember in the first use-case, when we said it is hassle-free to pull a branch if you haven't made any local commits to that branch? That is the case here, when you do git pull origin master, since you've been working on a branch.

Furthermore, it is much easier to switch back to master during your work on my_new_branch, just to see what's going on. If you are working on your local repo's master, this isn't easy. This basically also solves the "off-network" problem mentioned above. Just switch back to master, pull, head home, and switch back to my_new_branch.

Personally, I think this is the best workflow of them all, but, depending on your team, it's hard to convince them that branching is not some behemoth task, as it may have been in other source control systems.

Okay, that's all the use-cases I described to start off with.

For the team I was working in, this seemed to be a good compromise between allowing the source-control-uninterested to work more effectively with git, and allowing those who were interested a starting point for their git-research.

Hopefully, after following the recipes for a while, one can gain a better feel for when to use what sync-related commands in git.

References