Article on Git workflow

This commit is contained in:
Thomas Schwery 2019-08-18 18:53:29 +02:00
parent cbe30e7f1e
commit 1e9100035d

View file

@ -0,0 +1,304 @@
---
title: My Git workflow
date: 2019-08-17 16:00:00
---
[Git](https://git-scm.com/) is currently the most popular Version Control
System and probably needs no introduction. I have been using it for some
years now, both for work and for personal projects.
Before that, I used Subversion for nearly 10 years and was more or less
happy with it. More or less because it required to be online to do more
or less anything : Commit needs to be online, logs needs to be online,
checking out an older revision needs to be online, ...
Git does not require anything online (except, well, `git push` and `git pull/fetch`
for obvious reasons). Branching is way easier in Git also, allowing you to work
offline on some feature on your branch, commit when you need to and then push your
work when online. It was a pleasure to discover these features and the
workflow that derived from this.
This article will describe my workflow using Git and is not a tutorial or
a guide on using Git. It will also contain my Git configuration that matches
this workflow but could be useful for others.
## Workflow
This workflow comes heavily from the [GitHub Flow](https://guides.github.com/introduction/flow/index.html)
and the [GitLab Flow](https://docs.gitlab.com/ee/topics/gitlab_flow.html).
These workflows are based on branches coming out of master and being
merged back into the master on completion. I found the [Git Flow](https://nvie.com/posts/a-successful-git-branching-model/)
to be too complicated for my personal projects and extending the GitHub Flow
with a set of stable branches and tags has worked really well at work, like
described in the [Release branches with GitLab flow](https://docs.gitlab.com/ee/topics/gitlab_flow.html#release-branches-with-gitlab-flow).
### 1. Create a new branch.
I always create a new branch when starting something.
This allows switching easily between tasks if some urgent work is coming in without
having to pile up modifications in the stash.
When working of personal projects, I tend to be more lax about these branches,
creating a branch that will contain more than one change and review them
all in one go afterwards.
Why create a branch and not commit directly into the master ? Because you
want tests to check that your commits are correct before the changes are
written in stone. A branch can be modified or deleted, the master branch
cannot. Even for small projects, I find that branches allow you to work
more peacefully, allowing you to iterate on your work.
A branch is created by `git checkout -b my-branch` and can immediately be used
to commit things.
### 2. Commit often.
This advice comes everytime on Git: You can commit anytime, anything.
It is way easier to squash commits together further down the line than it is to
split a commit 2 days after the code was written.
Your commits are still local only so have no fear committing incomplete or
what you consider sub-par code that you will refine later. With that come the next points.
### 3. Add only the needed files.
With Git you can and must add files before
your commit. When working on large projects, you will modify multiple files.
When commiting you can add one file to the index, commit changes to this file,
add the second file to the index and commit these changes in a second commit.
Git add also allows you to add only parts of a file with `git add -p`. This
can be useful if you forgot to commit a step before starting work on the
next step.
### 4. Write useful commit messages.
Even though your commits are not yet published, commit messages are also
useful for you.
I won't give you advice on how to write a commit message as this depends
on the projects and the team I'm working on, but remember that a commit
message is something to describe *what* you did and *why*.
Here are some rules I like to follow :
1. Write a short description of *why* and *what*. Your commit message
should be short but explain both. A `git log --oneline` should produce
a readable log that tells you what happened.
2. Be precise. You polished up your cache prototype ? Don't write *General polishing*,
say *what* and *why*, like *Polishing the Redis caching prototype*.
3. Be concise. You fixed tests that were failing because of the moon and
planets alignment and solar flares ? Don't write a novel on one line like
*Adding back the SmurfVillageTest after fixing the planet alignement and
the 100th Smurf was introduced through a mirror and everybody danced happily
ever after*. The longest I would go for is *Fixed failing SmurfVillageTest for 100th Smurf*
4. Use the other lines. You can do a multi-line commit message if you need
to explain the context in details. Treat your commit like you would an
email: Short subject, Long message if needed.
The Linux kernel is generally a really good example of good long commit messages, like
[cramfs: fix usage on non-MTD device](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e5aeec0e267d4422a4e740ce723549a3098a4d1)
or
[bpf, x86: Emit patchable direct jump as tail call](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428d5df1fa4f28daf622c48dd19da35585c9053c).
5. In any case, don't write messages like *Update presentation* in 10
different commits, or even worse *Fix stuff*. It's not useful, neither for
your nor your colleagues.
Here are some links about commit messages. Don't ignore this, in my opinion
it is a really important part of every VCS:
* [Commit Often, Perfect Later, Publish Once - Do make useful commit messages](https://sethrobertson.github.io/GitBestPractices/#usemsg)
* [A Note About Git Commit Messages](https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html)
* [Introduction: Why good commit messages matter](https://chris.beams.io/posts/git-commit/)
### 5. Refine your commits.
At this point, if you coded as a normal human being,
you will have a large number of commits, with some that introduce new
small features, like *Add cache to build process*, some that fix typos,
like *Fix typo in cache configuration key*, some others that add some missing
library, like *Oops, forgot to add the Redis library to the pom*. Nothing
to worry about, to err is human, computers are there to catch them and allow
you to fix them easily.
Before pushing the work online, I like to [hide the sausage making](https://sethrobertson.github.io/GitBestPractices/#sausage).
Personally, I find that the downsides are outweighted by the fact that you
reduce the time needed to commit things while coding and organize stuff
once your mind is free of code-related thoughts.
These commits are not useful for other people, they are only there because
you made a mistake. No shame in that but the reviewers don't need to see
these, they need to have a clear view of *what* and *why*.
The cache library was added because we added a cache, the configuration
key is there because we added a cache. The commits should reflect our work,
not your mistakes. In this example, I would only keep one commit, *Add cache
to the build process* and squash the errors into it.
At this step, I like to rebase my branch on the current master
with `git rebase -i origin/master` so that I can reorder and squash commits
as well as get the latest changes in my branch.
### 6. Rebase your branch
Usually, before your work on a feature is finished, a number of changes
landed on the master branch: New features, fixes, perhaps new tests if
you are lucky. Before pushing, I thus do a quick `git fetch && git rebase origin/master`,
just so that my branch is up to date with the branch I will merge to.
With the lastest changes in my branch, I like to run the test suite one
last time.
### 7. Check your commit messages
Before pushing, I like to do a quick `git log --oneline` to check my
commits.
Your change descriptions should make sense, you should be able at this
point to remember for each commit what changed and why you did it and the
message should reflect this.
If one commit message is vague, this is the last chance to rewrite it. I
usually do that with an interactive rebase: `git rebase origin/master -i`.
### 8. Pushing the branch
Once everything is in order, the branch can be pushed, a Pull Request/Merge request/Review request
can be opened and other people brought into the changes.
### 9. Review
If you work in a team you will have a code review step before merging changes.
I like to see this as a step to ensure that I did not miss anything. When you
code your fix or features, it is really easy to forget some corner-case or
some business requirement that was introduced by an another customer to a
colleague. I like to see the review step as peace of mind that you did not
forget something important and that if you forgot something, it was not
that important as 4 eyes did not spot it.
The review is also a way for your colleagues to keep up to date with your
work. Whatever is in the master branch has been seen by 2 people and should
be understood by 2 people. It's important to have someone else that can
fix that part of the code in case you are absent.
These people will need to quickly know what changed and why you changed
that. Usually the tooling will quickly allow people to check what changed,
comment on those changes and request improvements. The why will come from
your commit messages.
I also like to keep this step even when working alone. I review my own
code to ensure that the changes are clear and that I committed everything
I needed to and only what I wanted to.
### 10. Changes
Usually you will have to change some parts after the review. It can be
because you remembered something walking down the corridor to get tea or
because your colleagues saw possible improvements.
For these changes, I like to follow the same procedure as before. Write
the changes, commit them, fix the old commits to keep the log clean. I
see the review as part of the work, not something that comes after
and will be recorded in the logs. In short :
* A new feature is requested by the reviewer ? New commit.
* A typo must be fixed ? Fix the commit that introduced it.
* Some CI test fails ? Fix the commit that introduced the regression or introduce
a new commit to fix the test.
## The dangers
This workflow is `rebase` heavy. If you have some modifications that
conflict with your changes, you will have to resolve the conflicts, perhaps
on multiple commits during the rebase, with the possible errors that will
come out of it. If the conflicts are too much, you can always abort the
rebase and try to reorder your commits to reduce the conflicts, if possible.
The fact that you rebase will also hide the origin of problems coming from
your parent branch. If you pull code with failing tests, you will have
nothing in the history that tells you that your code worked before pulling
the changes. Only your memory (and the `reflog` but who checks the `reflog` ?)
will tell you that it worked before, there are no commit marking the before
and the after like there would be on a `merge` workflow. On tools like
GitLab, you will see that there were pipelines that were succeeding and then
a pipeline failing but you will need to check the changes between the
succeeding and the failing pipelines.
If you are not alone on your branch, rebasing can cause a lot of issues when
pulling and pushing with two rebased branches with different commits in it.
Be sure to only rebase when everyone has committed everything and the branch
is ready to be reviewed and merged.
## Git aliases
Since I do some operations a number of times each day, I like to simplify
them by using aliases in my `.gitconfig`.
The first two are aliases to check the logs before pushing the changes.
They print a one-liner for each commit, one without merge commits, the
other with merge commits and a graph of the branches.
The last two are aliases for branch creation and publication. Instead
having to know whether I have to create a new branch or can directly
checkout an existing branch, I wrote this alias to `go` to the branch,
creating it if needed. The `publish` alias allows to push a branch created
locally to the origin without having to specify anything.
The `commit-oups` is a short-hand to amend the last commit without changing
the commit message. It happens often that I forgot to add a file to the
index, or committed too early, or forgot to run the tests, or forgot
a library. This alias allows me to do a `git add -u && git commit-oups`
in these cases. (Yes, Oups is french for Oops).
```ini
[alias]
# Shorthand to print a graph log with oneliner commit messages.
glog = log --graph --pretty=format:'%C(yellow)[%ad]%C(reset) %C(green)[%h]%C(reset) %s %C(red)[%an]%C(blue)%d%C(reset)' --date=short
# Shorthand to print a log with onliner commit messages ignoring merge commits.
slog = log --no-merges --pretty=format:'%C(yellow)[%ad]%C(reset) %C(green)[%h]%C(reset) %s %C(red)[%an]%C(blue)%d%C(reset)' --date=short
# Prints out the current branch. This alias is used for other aliases.
branch-name = "!git rev-parse --abbrev-ref HEAD"
# Shorthand to amend the last commit without changing the commit message.
commit-oups = commit --amend --no-edit
# Shorthand to facilitate the remote creation of new branches. This allow
# the user to push a new branch on the origin easily.
publish = "!git push -u origin $(git branch-name)"
# Shorthand to faciliate the creation of new branches. This switch to
# the given branch, creating it if necessary.
go = "!go() { git checkout -b $1 2> /dev/null|| git checkout $1; }; go"
```
## Releases
This article only detailled the daily work on a feature and the
merge but did not go into detail into the release process. This is deliberate
as every release is different. In my personal projects alone I have multiple
ways to represent releases.
On my blog there are no releases, everything on the master is published
as they are merged.
On my Kubernetes project, a release is something more precise but not
static. I want to be sure that it works but it can be updated easily.
It is thus represented by a single stable branch that I merge the master
onto once I want to deploy the changes.
On my keyboard project, a release is something really static
as it represents a PCB, an object that cannot be updated easily. It is
thus a tag with the PCB order reference. Once the firmware is introduced,
this could change with the introduction of a stable branch that will follow
the changes to the firmware and configuration. Or I could continue using tags,
this will be decided once the hardware is finished.
## Conclusion
As always with Git, the tool is so powerful that more or less any workflow
can work with it. There are a number of possible variations on this, with
each team having a favorite way of doing things.
In this article I did not talk about tooling but nowadays with CI/CD
being more and more important, tooling is an important part of the workflow.
Tests will need to be run on branches, perhaps `stable` branches will
have more tests that `feature` branches due to server/time/financial limitations.
Perhaps you have Continuous Deployment of stable branches, perhaps you want
to Continuously reDeploy a developement server when code is merged on the
master.
Your tooling will need a clear a clear flow. If you have conventions that
new features are developed on branches that have a `feature/` prefix, everybody
must follow that otherwise the work to reconcile this in your tooling will
be daunting for the developer in charge of these tools.