Merge branch 'updates-articles' into 'master'

Updates articles

See merge request tschwery/blog-hugo!12
This commit is contained in:
Thomas Schwery 2019-12-02 20:55:05 +00:00
commit 641e14150a
11 changed files with 1277 additions and 29 deletions

View file

@ -0,0 +1,780 @@
---
title: Subversion migration to Git
date: 2017-05-12 18:30:00
---
Some time ago I was tasked with migrating our Subversion repositories to Git. This article was only written
recently because, well, I had forgotten about the notes I had taken during the migration and only stumbled on
them recently.
Our largest repository was something like 500Go and contained a little more than 50'000 commits. The
goal was to recover the svn history into git, keep a much information as possible about the commits and
the links between them, keep the branches. During the history, there were a number of periodic database dumps
that were committed that now weighted down the repository without serving any purpose. There were also a number
of branches that were never used and contained nothing of interest.
The decision was also taken to split some of the tools into their own repositories instead of keeping them
into the same repository, cleaning up the main repository to keep only the main project and related sources.
## Principles
* After some experiments, I decided to use svn2git, a tool used by KDE for their migration. It has the
advantage of taking a rule file that allow splitting a repository by the svn path, processing tags and
branches and transforming them, ignoring other paths, ...
* As the import of such a large repository is slow, I decided to mount a btrfs partition so that each
step can be snapshotted, allowing me to test the next step without having any fear of having to start
again at the beginning.
* Some binary files were added to the svn history and it made sense keeping them. I decided to migrate
them to git-lfs to reduce the history size without losing them completely.
* A lot of commit messages contain references to other commits, I wanted to process these commit messages
and transform the reference to a `r` commit into a git hash so that tools can create a link automatically.
## Tools
The first to retrieve is [svn2git](https://github.com/svn-all-fast-export/svn2git).
The compilation should be easy. First install the dependencies and compile it.
```
$ git clone https://github.com/svn-all-fast-export/svn2git.git
$ sudo apt install libqt4-dev libapr1-dev libsvn-dev
$ qmake .
$ make
```
Once the tool is compiled, we can prepare the btrfs mount in which we will run the migration steps.
```
$ mkdir repositories
$ truncate -s 300G repositories.btrfs
$ sudo mkfs.btrfs repositories.btrfs
$ sudo mount repositories.btrfs repositories
$ sudo chown 1000:1000 repositories
```
We will also write a small tool in Go to process the commit messages.
```
sudo apt install golang
```
We will also need `bfg`, a git cleansing tool. You can download the jar
file on the [BFG Repo-Cleaner website](https://rtyley.github.io/bfg-repo-cleaner/).
## First steps
The first step of the migration is to retrieve the svn repository itself on the local machine. This is not a
checkout of the repository, we need the server folder directly, with the whole history and metadata.
```
rsync -avz --progress sshuser@svn.myserver.com:/srv/svn_myrepository/ .
```
In this case I had SSH access to the server, allowing me to simply rsync the repository. Doing so allowed
me to prepare the migration in advance, only copying the new commits on each synchronisation and not the
whole repository with its large history. Most of the repository files are never updated so this step is
only slow on the first execution.
### User mapping
The first step is to create a mapping file that will map the svn users to git users. A user in svn is a username
whereas in git this is a name and email address.
To get a list of user accounts, we can use the svn command directly on the local repository like this :
```
svn log file:///home/tsc/svn_myrepository \
| egrep '^r.*lines?$' \
| awk -F'|' '{print $2;}' \
| sort \
| uniq
```
This will return the list of users in the logs. For each of these users, you should create a line in a mapping
file, like so :
```
auser Albert User <albert.user@example.com>
aperson Anaelle Personn <anaelle.personn@example.com>
```
This file will be given as input to `svn2git` and should be complete, otherwise the import will fail.
### Path mapping
The second mapping for the svn to git migration of a repository is the svn2git rules. This file will tell
the program what will go where. In our case, the repository was not stricly adhering to the svn standard tree,
containing a trunk, tags and branches structure as well as some other folders for "out-of-branch" projects.
```txt
# We create the main repository
create repository svn_myrepository
end repository
# We create repositories for external tools that will move
# to their own repositories
create repository aproject
end repository
create repository bproject
end repository
create repository cproject
end repository
# We declare a variable to ease the declaration of the
# migration rules further down
declare PROJECTS=aproject|bproject|cproject
# We create repositories for out-of-branch folders
# that will migrate to their own repositories
create repository aoutofbranch
end repository
create repository boutofbranch
end repository
# We always ignore database dumps wherever there are.
# In our case, the database dumps are named "database-dump-20100112"
# or forms close to that.
match /.*/database([_-][^/]+)?[-_](dump|oracle|mysql)[^/]+
end match
# There are also dumps stored in their own folder
match /.*/database/backup(/old)?/.*(.zip|.sql|.lzma)
end match
# At some time the build results were also added to the history, we want
# to ignore them
match /.*/(build|dist|cache)/
end match
# We process our external tools only on the master branch.
# We use the previously declared variable to reduce the repetition
# and use the pattern match to move it to the correct repository.
match /trunk/(tools/)?(${PROJECTS})/
repository \2
branch master
end match
# And we ignore them if there are on tags or branches
match /.*/(tools/)?${PROJECTS}/
end match
# We start processing our main project after the r10, as the
# first commits were missing the trunk and moved the branches, trunk and tags
# folders around.
match /trunk/
min revision 10
repository svn_myrepository
branch master
end match
# There are branches that are hierarchically organized.
# Such cases have to be explicitly configured.
match /branches/(old|dev|customers)/([^/]+)/
repository svn_myrepository
branch \1/\2
end match
# Other branches are as expected directly in the branches folder.
match /branches/([^/]+)/
repository svn_myrepository
branch \1
end match
# The tags were used in a strange fashion before the commit r2500,
# so we ignore everything before that refactoring
match /tags/([^/]+)/
max revision 2500
end match
# After that, we create a branch for each tag as the svn tags
# were not used correctly and were committed to. We just name
# them differently and will process them afterwards.
match /tags/([^/]+)/([^/]+)/
min revision 2500
repository svn_myrepository
branch \1-\2
end match
# Our out-of-branch folder will be processed directly, only creating
# a master branch.
match /aoutofbranch/
repository aoutofbranch
branch master
end match
match /boutofbranch/
repository boutofbranch
branch master
end match
# Everything else is discarded and ignored
match /
end match
```
This file will quickly grow with the number of migration operations that you want to do. Ignore the
files here if possible as it will reduce the migration time as well as the postprocessing that will
need to be done afterwards. In my case, a number of files were too complex to match during the migration
or were spotted only afterwards and had to be cleaned in a second pass with other tools.
### Migration
This step will take a lot of time as it will read the whole svn history, process the declared rules and generate
the git repositories and every commit.
```
$ cd repositories
$ ~/workspace/svn2git/svn-all-fast-export \
--add-metadata \
--svn-branches \
--identity-map ~/workspace/migration-tools/accounts-map.txt \
--rules ~/workspace/migration-tools/svnfast.rules \
--commit-interval 2000 \
--stat \
/home/tsc/svn_myrepository
```
If there is a crash during this step, it means that you are either missing an account in your mapping, that
one of your rule is emitting an erroneous branch, repository or that no rule is matching.
Once this step finished, I like to do a btrfs snapshot so that I can return to this step when putting the
next steps into place.
```
btrfs subvolume snaphost -r repositories repositories/snap-1-import
```
## Cleanup
The next phase is to cleanup our import. There will always be a number of branches that are unused, named
incorrectly, contain only temporary files or branches that are so far from the standard naming that our
rules cannot process them correctly.
We will simply delete them or rename them using git.
```
$ cd svn_myrepository
$ git branch -D oldbranch-0.3.1
$ git branch -D customer/backup_temp
$ git branch -m customer/stable_v1.0 stable-1.0
```
The goal at this step is to cleanup the branches that will be kept after
the migration. We do this now to reduce the repository size early on and
thus reduce the time needed for the next steps.
If you see branches that can be deleted or renamed further down the road,
you can also remove or rename them then.
I like to take a snapshot at this stage as the next stage usually involves
a lot of tests and manually building a list of things to remove.
```
btrfs subvolume snaphost -r repositories repositories/snap-2a-cleanup
```
We can also remove files that were added and should not have been by checking
a list of every file every checked into our new git repository, inspecting
it manually and add the identifiers of files to remove in a new file :
```sh
$ git rev-list --objects --all > ./all-files
$ cat ./all-files | your-filter | cut -d' ' -f1 > ./to-delete-ids
$ java -jar ~/Downloads/bfg-1.12.15.jar --private --no-blob-protection --strip-blobs-with-ids ./to-delete-ids
```
We will take a snapshot again, as the next step also involves checks and
tests.
```
btrfs subvolume snaphost -r repositories repositories/snap-2b-cleanup
```
Next, we will convert the binary files that we still want to keep in our
repository to Git-LFS. This allows git to only keep track of the hash of
the file in the history and not store the whole binary in the repository,
thus reducing the size of the clones.
BFG does this quickly and efficiently, removing every file matching the
given name from the history and storing it in Git-LFS. This step will
require some exploration of the previous `all-files` file to identify which
files need to be converted.
```sh
$ java -jar ~/Downloads/bfg-1.12.15.jar --no-blob-protection --private --convert-to-git-lfs 'my-important-archive*.zip'
$ java -jar ~/Downloads/bfg-1.12.15.jar --no-blob-protection --private --convert-to-git-lfs '*.ear'
```
After the cleanup, I also like to do a btrfs snapshot so that the history
rewrite step can be executed and tested multiple times.
```
btrfs subvolume snaphost -r repositories repositories/snap-2c-cleanup
```
### Linking a svn revision to a git commit
The logs prints for each revision a line mapping to a mark on the git marks file. In the git repository, there
is then a marks file that map this mark to a commit hash. We can use this information to build a mapping database
that can store that information for later.
In our case, I wrote a Java program that will parse both files and store
the resulting mapping into a LevelDB database.
This database will then be used by a Golang server that will read this mapping
database in memory and serve a RPC server that we will call from Golang
binaries in a `git filter-branch` call. The Golang server will also need
to keep track of the modifications to the git commit hashes as the history
rewrite changes them.
First, the Java tool to read the logs and generate the LevelDB database :
```java
import com.google.common.collect.BiMap;
import com.google.common.collect.HashBiMap;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.IOFileFilter;
import org.iq80.leveldb.DB;
import org.iq80.leveldb.Options;
import org.iq80.leveldb.impl.Iq80DBFactory;
public class CommitMapping {
public static String FILE_LOG_IMPORT = "../log-svn_myrepository";
public static String FILE_MARKS = "marks-svn_myrepository";
public static String FILE_BFG_DIR = "../svn_myrepository.bfg-report";
public static Pattern PATTERN_LOG = Pattern.compile("^progress SVN (r\\d+) branch .* = (:\\d+)");
public static void main(String[] args) throws Exception {
List<String> importLines = IOUtils.readLines(new FileReader(new File(FILE_LOG_IMPORT)));
List<String> marksLines = IOUtils.readLines(new FileReader(new File(FILE_MARKS)));
Collection<File> passFilesCol = FileUtils.listFiles(new File(FILE_BFG_DIR), new IOFileFilter() {
@Override
public boolean accept(File pathname, String name) {
return name.equals("object-id-map.old-new.txt");
}
@Override
public boolean accept(File path) {
return this.accept(path, path.getName());
}
}, DirectoryFileFilter.DIRECTORY);
List<File> passFiles = new ArrayList<>(passFilesCol);
Collections.sort(passFiles, (File o1, File o2) -> o1.getParentFile().getName().compareTo(o2.getParentFile().getName()));
Map<String, String> commitToIdentifier = new LinkedHashMap<>();
Map<String, String> identifierToHash = new HashMap<>();
for (String importLine : importLines) {
Matcher marksMatch = PATTERN_LOG.matcher(importLine);
if (marksMatch.find()) {
String dest = marksMatch.group(2);
if (dest == null || dest.length() == 0 || ":0".equals(dest)) continue;
commitToIdentifier.put(marksMatch.group(1), dest);
} else {
System.err.println("Unknown line : " + importLine);
}
}
File dbFile = new File(System.getenv("HOME") + "/mapping-db");
File humanFile = new File(System.getenv("HOME") + "/mapping");
FileUtils.deleteQuietly(dbFile);
Options options = new Options();
options.createIfMissing(true);
DB db = Iq80DBFactory.factory.open(dbFile, options);
marksLines.stream().map((line) -> line.split("\\s", 2)).forEach((parts) -> identifierToHash.put(parts[0], parts[1]));
BiMap<String, String> commitMapping = HashBiMap.create(commitToIdentifier.size());
for (String commit : commitToIdentifier.keySet()) {
String importId = commitToIdentifier.get(commit);
String hash = identifierToHash.get(importId);
if (hash == null) continue;
commitMapping.put(commit, hash);
}
System.err.println("Got " + commitMapping.size() + " svn -> initial import entries.");
for (File file : passFiles) {
System.err.println("Processing file " + file.getAbsolutePath());
List<String> bfgPass = IOUtils.readLines(new FileReader(file));
Map<String, String> hashMapping = bfgPass.stream().map((line) -> line.split("\\s", 2)).collect(Collectors.toMap(parts -> parts[0], parts -> parts[1]));
for (String hash : hashMapping.keySet()) {
String rev = commitMapping.inverse().get(hash);
if (rev != null) {
String newHash = hashMapping.get(hash);
System.err.println("Replacing r" + rev + ", was " + hash + ", is " + newHash);
commitMapping.replace(rev, newHash);
}
}
}
PrintStream fos = new PrintStream(humanFile);
for (Map.Entry<String, String> entry : commitMapping.entrySet()) {
String commit = entry.getKey();
String target = entry.getValue();
fos.println(commit + "\t" + target);
db.put(Iq80DBFactory.bytes(commit), Iq80DBFactory.bytes(target));
}
db.close();
fos.close();
}
}
```
We will use RPC between a client and server to allow the LevelDB database
to be kept open and have very light clients that query a running server
as they will be executed for each commit. After some tests, opening the
database was really time consuming thus this approach, even though the
server will do very little.
The structure of our go project is the following :
```txt
go-gitcommit/client-common:
rpc.go
go-gitcommit/client-insert:
insert-mapping.go
go-gitcommit/client-query:
query-mapping.go
go-gitcommit/server:
server.go
```
First, some plumping for the RPC in `rpc.go` :
```go
package Client
import (
"net"
"net/rpc"
"time"
)
type (
// Client -
Client struct {
connection *rpc.Client
}
// MappingItem is the response from the cache or the item to insert into the cache
MappingItem struct {
Key string
Value string
}
// BulkQuery allows to mass query the DB in one go.
BulkQuery []MappingItem
)
// NewClient -
func NewClient(dsn string, timeout time.Duration) (*Client, error) {
connection, err := net.DialTimeout("tcp", dsn, timeout)
if err != nil {
return nil, err
}
return &Client{connection: rpc.NewClient(connection)}, nil
}
// InsertMapping -
func (c *Client) InsertMapping(item MappingItem) (bool, error) {
var ack bool
err := c.connection.Call("RPC.InsertMapping", item, &ack)
return ack, err
}
// GetMapping -
func (c *Client) GetMapping(bulk BulkQuery) (BulkQuery, error) {
var bulkResponse BulkQuery
err := c.connection.Call("RPC.GetMapping", bulk, &bulkResponse)
return bulkResponse, err
}
```
Next the Golang server that will read this database in `server.go` :
```go
package main
import (
"fmt"
"log"
"net"
"net/rpc"
"os"
"time"
"github.com/syndtr/goleveldb/leveldb"
Client "../client-common"
)
var (
cacheDBPath = os.Getenv("HOME") + "/mapping-db"
cacheDB *leveldb.DB
flowMap map[string]string
f *os.File
g *os.File
)
type (
// RPC is the base class of our RPC system
RPC struct {
}
)
func main() {
var cacheDBerr error
cacheDB, cacheDBerr = leveldb.OpenFile(cacheDBPath, nil)
if cacheDBerr != nil {
fmt.Fprintln(os.Stderr, "Unable to initialize the LevelDB cache.")
log.Fatal(cacheDBerr)
}
roErr := cacheDB.SetReadOnly()
if roErr != nil {
fmt.Fprintln(os.Stderr, "Unable to initialize the LevelDB cache.")
log.Fatal(roErr)
}
flowMap = make(map[string]string)
f, _ = os.Create(os.Getenv("HOME") + "/go-server/gomapping.log")
defer f.Close()
g, _ = os.Create(os.Getenv("HOME") + "/go-server/gomapping.ins")
defer g.Close()
rpc.Register(NewRPC())
l, e := net.Listen("tcp", ":9876")
if e != nil {
log.Fatal("listen error:", e)
}
go flushLog()
rpc.Accept(l)
}
func flushLog() {
for {
time.Sleep(100 * time.Millisecond)
f.Sync()
}
}
// NewRPC -
func NewRPC() *RPC {
return &RPC{}
}
// InsertMapping -
func (r *RPC) InsertMapping(mappingItem Client.MappingItem, ack *bool) error {
old := mappingItem.Key
new := mappingItem.Value
flowMap[old] = new
g.WriteString(fmt.Sprintf("Inserted mapping %s -> %s\n", old, new))
*ack = true
return nil
}
// GetMapping -
func (r *RPC) GetMapping(bulkQuery Client.BulkQuery, resp *Client.BulkQuery) error {
for i := range bulkQuery {
key := bulkQuery[i].Key
response, _ := cacheDB.Get([]byte(key), nil)
gitCommit := key
if response != nil {
responseStr := string(response[:])
responseUpdated := flowMap[responseStr]
if responseUpdated != "" {
gitCommit = string(responseUpdated[:])[:12] + "(" + key + ")"
f.WriteString(fmt.Sprintf("Response to mapping %s -> %s\n", bulkQuery[i].Key, gitCommit))
} else {
f.WriteString(fmt.Sprintf("No git mapping for entry %s\n", responseStr))
}
} else {
f.WriteString(fmt.Sprintf("Unknown revision %s\n", key))
}
bulkQuery[i].Value = gitCommit
}
*resp = bulkQuery
return nil
}
```
And finally our clients. The insert client will be called from `git filter-branch`
with the previous and current commit hashes after processing each commit. We
store this information into the database so that the hashes are correct when
mapping a revision. The code goes into `insert-mapping.go` :
```go
package main
import (
"fmt"
"log"
"os"
"time"
Client "../client-common"
)
func main() {
old := os.Args[1]
new := os.Args[2]
rpcClient, err := Client.NewClient("localhost:9876", time.Millisecond*500)
if err != nil {
log.Fatal(err)
}
mappingItem := Client.MappingItem{
Key: old,
Value: new,
}
ack, err := rpcClient.InsertMapping(mappingItem)
if err != nil || !ack {
log.Fatal(err)
}
fmt.Println(new)
}
```
The query client will receive the commit message for each commit, check
whether it contains a `r` mapping and query the server for a hash for this
commit. It goes into `query-mapping.go` :
```go
package main
import (
"bufio"
"fmt"
"log"
"os"
"regexp"
"strings"
"time"
client "../client-common"
)
func main() {
reader := bufio.NewReader(os.Stdin)
text, _ := reader.ReadString('\n')
re := regexp.MustCompile(`\Wr[0-9]+`)
matches := re.FindAllString(text, -1)
if matches == nil {
fmt.Print(text)
return
}
rpcClient, err := client.NewClient("localhost:9876", time.Millisecond*500)
if err != nil {
log.Fatal(err)
}
var bulkQuery client.BulkQuery
for i := range matches {
if matches[i][0] != '-' {
key := matches[i][1:]
bulkQuery = append(bulkQuery, client.MappingItem{Key: key})
}
}
gitCommits, _ := rpcClient.GetMapping(bulkQuery)
for i := range gitCommits {
gitCommit := gitCommits[i].Value
key := gitCommits[i].Key
text = strings.Replace(text, key, gitCommit, 1)
}
fmt.Print(text)
}
```
For this step, we will need to first compile and execute the Java program.
Once it succeeded in creating the database, we will compile and execute
the Go server in the background.
Then, we can launch `git filter-branch` on our repository to rewrite the
history :
```sh
$ git filter-branch \
--commit-filter 'NEW=`git_commit_non_empty_tree "$@"`; \
${HOME}/migration-tools/go-gitcommit/client-insert/client-insert $GIT_COMMIT $NEW' \
--msg-filter "${HOME}/migration-tools/go-gitcommit/client-query/client-query" \
-- --all --author-date-order
```
As after each step, we will generate a snapshot, even though it should be
the last step that cannot be repeated easily.
```
btrfs subvolume snaphost -r repositories repositories/snap-3-mapping
```
We now clean the repository that should contain a lot of unused blobs,
branches, commits, ...
```sh
$ git reflog expire --expire=now --all
$ git prune --expire=now --progress
$ git repack -adf --window-memory=512m
```
We now have a repository that should be more or less clean. You will have
to check the history, the size of the blobs and whether some branches can
still be deleted before pushing it to your server.

View file

@ -0,0 +1,304 @@
---
title: My Git workflow
date: 2019-08-17 16:00:00
---
[Git](https://git-scm.com/) is currently the most popular Version Control
System and probably needs no introduction. I have been using it for some
years now, both for work and for personal projects.
Before that, I used Subversion for nearly 10 years and was more or less
happy with it. More or less because it required to be online to do more
or less anything : Commit needs to be online, logs needs to be online,
checking out an older revision needs to be online, ...
Git does not require anything online (except, well, `git push` and `git pull/fetch`
for obvious reasons). Branching is way easier in Git also, allowing you to work
offline on some feature on your branch, commit when you need to and then push your
work when online. It was a pleasure to discover these features and the
workflow that derived from this.
This article will describe my workflow using Git and is not a tutorial or
a guide on using Git. It will also contain my Git configuration that matches
this workflow but could be useful for others.
## Workflow
This workflow comes heavily from the [GitHub Flow](https://guides.github.com/introduction/flow/index.html)
and the [GitLab Flow](https://docs.gitlab.com/ee/topics/gitlab_flow.html).
These workflows are based on branches coming out of master and being
merged back into the master on completion. I found the [Git Flow](https://nvie.com/posts/a-successful-git-branching-model/)
to be too complicated for my personal projects and extending the GitHub Flow
with a set of stable branches and tags has worked really well at work, like
described in the [Release branches with GitLab flow](https://docs.gitlab.com/ee/topics/gitlab_flow.html#release-branches-with-gitlab-flow).
### 1. Create a new branch.
I always create a new branch when starting something.
This allows switching easily between tasks if some urgent work is coming in without
having to pile up modifications in the stash.
When working of personal projects, I tend to be more lax about these branches,
creating a branch that will contain more than one change and review them
all in one go afterwards.
Why create a branch and not commit directly into the master ? Because you
want tests to check that your commits are correct before the changes are
written in stone. A branch can be modified or deleted, the master branch
cannot. Even for small projects, I find that branches allow you to work
more peacefully, allowing you to iterate on your work.
A branch is created by `git checkout -b my-branch` and can immediately be used
to commit things.
### 2. Commit often.
This advice comes everytime on Git: You can commit anytime, anything.
It is way easier to squash commits together further down the line than it is to
split a commit 2 days after the code was written.
Your commits are still local only so have no fear committing incomplete or
what you consider sub-par code that you will refine later. With that come the next points.
### 3. Add only the needed files.
With Git you can and must add files before
your commit. When working on large projects, you will modify multiple files.
When commiting you can add one file to the index, commit changes to this file,
add the second file to the index and commit these changes in a second commit.
Git add also allows you to add only parts of a file with `git add -p`. This
can be useful if you forgot to commit a step before starting work on the
next step.
### 4. Write useful commit messages.
Even though your commits are not yet published, commit messages are also
useful for you.
I won't give you advice on how to write a commit message as this depends
on the projects and the team I'm working on, but remember that a commit
message is something to describe *what* you did and *why*.
Here are some rules I like to follow :
1. Write a short description of *why* and *what*. Your commit message
should be short but explain both. A `git log --oneline` should produce
a readable log that tells you what happened.
2. Be precise. You polished up your cache prototype ? Don't write *General polishing*,
say *what* and *why*, like *Polishing the Redis caching prototype*.
3. Be concise. You fixed tests that were failing because of the moon and
planets alignment and solar flares ? Don't write a novel on one line like
*Adding back the SmurfVillageTest after fixing the planet alignement and
the 100th Smurf was introduced through a mirror and everybody danced happily
ever after*. The longest I would go for is *Fixed failing SmurfVillageTest for 100th Smurf*
4. Use the other lines. You can do a multi-line commit message if you need
to explain the context in details. Treat your commit like you would an
email: Short subject, Long message if needed.
The Linux kernel is generally a really good example of good long commit messages, like
[cramfs: fix usage on non-MTD device](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e5aeec0e267d4422a4e740ce723549a3098a4d1)
or
[bpf, x86: Emit patchable direct jump as tail call](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428d5df1fa4f28daf622c48dd19da35585c9053c).
5. In any case, don't write messages like *Update presentation* in 10
different commits, or even worse *Fix stuff*. It's not useful, neither for
your nor your colleagues.
Here are some links about commit messages. Don't ignore this, in my opinion
it is a really important part of every VCS:
* [Commit Often, Perfect Later, Publish Once - Do make useful commit messages](https://sethrobertson.github.io/GitBestPractices/#usemsg)
* [A Note About Git Commit Messages](https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html)
* [Introduction: Why good commit messages matter](https://chris.beams.io/posts/git-commit/)
### 5. Refine your commits.
At this point, if you coded as a normal human being,
you will have a large number of commits, with some that introduce new
small features, like *Add cache to build process*, some that fix typos,
like *Fix typo in cache configuration key*, some others that add some missing
library, like *Oops, forgot to add the Redis library to the pom*. Nothing
to worry about, to err is human, computers are there to catch them and allow
you to fix them easily.
Before pushing the work online, I like to [hide the sausage making](https://sethrobertson.github.io/GitBestPractices/#sausage).
Personally, I find that the downsides are outweighted by the fact that you
reduce the time needed to commit things while coding and organize stuff
once your mind is free of code-related thoughts.
These commits are not useful for other people, they are only there because
you made a mistake. No shame in that but the reviewers don't need to see
these, they need to have a clear view of *what* and *why*.
The cache library was added because we added a cache, the configuration
key is there because we added a cache. The commits should reflect our work,
not your mistakes. In this example, I would only keep one commit, *Add cache
to the build process* and squash the errors into it.
At this step, I like to rebase my branch on the current master
with `git rebase -i origin/master` so that I can reorder and squash commits
as well as get the latest changes in my branch.
### 6. Rebase your branch
Usually, before your work on a feature is finished, a number of changes
landed on the master branch: New features, fixes, perhaps new tests if
you are lucky. Before pushing, I thus do a quick `git fetch && git rebase origin/master`,
just so that my branch is up to date with the branch I will merge to.
With the lastest changes in my branch, I like to run the test suite one
last time.
### 7. Check your commit messages
Before pushing, I like to do a quick `git log --oneline` to check my
commits.
Your change descriptions should make sense, you should be able at this
point to remember for each commit what changed and why you did it and the
message should reflect this.
If one commit message is vague, this is the last chance to rewrite it. I
usually do that with an interactive rebase: `git rebase origin/master -i`.
### 8. Pushing the branch
Once everything is in order, the branch can be pushed, a Pull Request/Merge request/Review request
can be opened and other people brought into the changes.
### 9. Review
If you work in a team you will have a code review step before merging changes.
I like to see this as a step to ensure that I did not miss anything. When you
code your fix or features, it is really easy to forget some corner-case or
some business requirement that was introduced by an another customer to a
colleague. I like to see the review step as peace of mind that you did not
forget something important and that if you forgot something, it was not
that important as 4 eyes did not spot it.
The review is also a way for your colleagues to keep up to date with your
work. Whatever is in the master branch has been seen by 2 people and should
be understood by 2 people. It's important to have someone else that can
fix that part of the code in case you are absent.
These people will need to quickly know what changed and why you changed
that. Usually the tooling will quickly allow people to check what changed,
comment on those changes and request improvements. The why will come from
your commit messages.
I also like to keep this step even when working alone. I review my own
code to ensure that the changes are clear and that I committed everything
I needed to and only what I wanted to.
### 10. Changes
Usually you will have to change some parts after the review. It can be
because you remembered something walking down the corridor to get tea or
because your colleagues saw possible improvements.
For these changes, I like to follow the same procedure as before. Write
the changes, commit them, fix the old commits to keep the log clean. I
see the review as part of the work, not something that comes after
and will be recorded in the logs. In short :
* A new feature is requested by the reviewer ? New commit.
* A typo must be fixed ? Fix the commit that introduced it.
* Some CI test fails ? Fix the commit that introduced the regression or introduce
a new commit to fix the test.
## The dangers
This workflow is `rebase` heavy. If you have some modifications that
conflict with your changes, you will have to resolve the conflicts, perhaps
on multiple commits during the rebase, with the possible errors that will
come out of it. If the conflicts are too much, you can always abort the
rebase and try to reorder your commits to reduce the conflicts, if possible.
The fact that you rebase will also hide the origin of problems coming from
your parent branch. If you pull code with failing tests, you will have
nothing in the history that tells you that your code worked before pulling
the changes. Only your memory (and the `reflog` but who checks the `reflog` ?)
will tell you that it worked before, there are no commit marking the before
and the after like there would be on a `merge` workflow. On tools like
GitLab, you will see that there were pipelines that were succeeding and then
a pipeline failing but you will need to check the changes between the
succeeding and the failing pipelines.
If you are not alone on your branch, rebasing can cause a lot of issues when
pulling and pushing with two rebased branches with different commits in it.
Be sure to only rebase when everyone has committed everything and the branch
is ready to be reviewed and merged.
## Git aliases
Since I do some operations a number of times each day, I like to simplify
them by using aliases in my `.gitconfig`.
The first two are aliases to check the logs before pushing the changes.
They print a one-liner for each commit, one without merge commits, the
other with merge commits and a graph of the branches.
The last two are aliases for branch creation and publication. Instead
having to know whether I have to create a new branch or can directly
checkout an existing branch, I wrote this alias to `go` to the branch,
creating it if needed. The `publish` alias allows to push a branch created
locally to the origin without having to specify anything.
The `commit-oups` is a short-hand to amend the last commit without changing
the commit message. It happens often that I forgot to add a file to the
index, or committed too early, or forgot to run the tests, or forgot
a library. This alias allows me to do a `git add -u && git commit-oups`
in these cases. (Yes, Oups is french for Oops).
```ini
[alias]
# Shorthand to print a graph log with oneliner commit messages.
glog = log --graph --pretty=format:'%C(yellow)[%ad]%C(reset) %C(green)[%h]%C(reset) %s %C(red)[%an]%C(blue)%d%C(reset)' --date=short
# Shorthand to print a log with onliner commit messages ignoring merge commits.
slog = log --no-merges --pretty=format:'%C(yellow)[%ad]%C(reset) %C(green)[%h]%C(reset) %s %C(red)[%an]%C(blue)%d%C(reset)' --date=short
# Prints out the current branch. This alias is used for other aliases.
branch-name = "!git rev-parse --abbrev-ref HEAD"
# Shorthand to amend the last commit without changing the commit message.
commit-oups = commit --amend --no-edit
# Shorthand to facilitate the remote creation of new branches. This allow
# the user to push a new branch on the origin easily.
publish = "!git push -u origin $(git branch-name)"
# Shorthand to faciliate the creation of new branches. This switch to
# the given branch, creating it if necessary.
go = "!go() { git checkout -b $1 2> /dev/null|| git checkout $1; }; go"
```
## Releases
This article only detailled the daily work on a feature and the
merge but did not go into detail into the release process. This is deliberate
as every release is different. In my personal projects alone I have multiple
ways to represent releases.
On my blog there are no releases, everything on the master is published
as they are merged.
On my Kubernetes project, a release is something more precise but not
static. I want to be sure that it works but it can be updated easily.
It is thus represented by a single stable branch that I merge the master
onto once I want to deploy the changes.
On my keyboard project, a release is something really static
as it represents a PCB, an object that cannot be updated easily. It is
thus a tag with the PCB order reference. Once the firmware is introduced,
this could change with the introduction of a stable branch that will follow
the changes to the firmware and configuration. Or I could continue using tags,
this will be decided once the hardware is finished.
## Conclusion
As always with Git, the tool is so powerful that more or less any workflow
can work with it. There are a number of possible variations on this, with
each team having a favorite way of doing things.
In this article I did not talk about tooling but nowadays with CI/CD
being more and more important, tooling is an important part of the workflow.
Tests will need to be run on branches, perhaps `stable` branches will
have more tests that `feature` branches due to server/time/financial limitations.
Perhaps you have Continuous Deployment of stable branches, perhaps you want
to Continuously reDeploy a developement server when code is merged on the
master.
Your tooling will need a clear a clear flow. If you have conventions that
new features are developed on branches that have a `feature/` prefix, everybody
must follow that otherwise the work to reconcile this in your tooling will
be daunting for the developer in charge of these tools.

View file

@ -0,0 +1,111 @@
---
title: Backup archives migration to Borg
date: 2019-11-21 18:30:00
---
Last weekend I found a number of encrypted hard-drives that were used to do periodic
backups from 2006 to 2014. At the time, the backups were made using rsync
with hard-links to save only one occurrence of the file if it was not changed
since the last backup.
I wanted to check that everything was still there and upgrade this to a
Borg repository so that I can profit from compression and deduplication
to reduce these backup size further down and store them in a more secure way.
## Check the backups
The backups were made using hard-links with one backup corresponding to
one folder as follow :
```
$ ls backups/feronia/
back-2014-06-19T19:05:10/ back-2014-10-10T07:30:00/
back-2014-12-24T14:34:44/ current@
```
To check that the backups were still readable, I listed the content of
the different folders and checked that some known configuration files were
present and matched what was expected. This worked until I processed some
backups done before I was using awesomewm, when I changed a lot of config
files to match my usage instead of using the default ones.
All in all, the backups were still good and readable, I could use these
as a basis for the transition to a more robust and space-efficient system.
I saw a number of freezes during the check I interpreted as signs of old
age for the spinning rust.
## Initialize the Borg backup
The first step is to initialize the borg repository. We will put it on
one of the known good backup drive that still has some room. To estimate
the space needed for the backups, I took the size of the most recent backup
and multiplied by two, as I know that I did not delete a lot of files and
that the deduplication will reduce the size of the old backups that contained
a lot of checked-out subversion repositories.
So, with a destination for my borg repository, I created a folder on the
disk and gave my user read-write rights on this folder.
```
$ sudo mkdir backups/borg-feronia
$ sudo chown 1000:1000 backups/borg-feronia -R
```
Then, the creation of the repository with borg :
```
$ borg init --encryption=repokey backups/borg-feronia
Enter new passphrase:
Enter same passphrase again:
Do you want your passphrase to be displayed for verification? [yN]: n
[...]
```
I decided to use the `repokey` encryption mode. This mode stores the key
in the repository, allowing me to only remember the passphrase and not having
to worry about backuping the key file.
## Transfer the existing backups to Borg
The borg repository has been initialized, we can now start migrating the
backups from the hard-linked folders into borg.
As borg does not care about hard-links, we can simply loop over the different
folders and create a new archive from it. It will take some time because
in each directory it will loop over the whole content, hash it, check whether
it changed, deduplicate it, compress it and then write it. Each backup
of approximately 70 GiB took one hour to migrate on my computer. It seems
that the process is limited by the single-thread performance of your CPU.
```
$ export BORG_PASSPHRASE=asdf
$ for i in back*; do \
archivename=$(echo $i | cut -c 6-15); \
pushd $i; \
borg create --stats --progress ~/backups/borg-feronia::$archivename .; \
popd; \
done;
```
The env variable will allow us to walk away at this stage and let the computer
do its magic for some hours.
## Check the migrated backups
Once the backups have been migrated, we need to check that everything is
in order before doing anything else.
I did the same as before, using this time `borg list` and `borg extract`
to check whether the files are present and their content is correct.
## Archive these backups
Once the migrated backups have been tested, we can shred the old hard drives
that were showing signs of old age.
Since storage is so cheap nowadays, I will also transfer an archive of
the Borg backup folder to an online storage service as to be able
to retrieve it in case the local storage supports are destroyed or otherwise
unreadable in the future.
I choose to simply create a tar archive of the Borg folder and upload it
to AWS S3 since these backups will not be updated. Perhaps some day I will
add the more recent backups to this setup but for now they are a read-only
window into the laptop I had during my studies and during my first jobs.

View file

@ -9,17 +9,28 @@ new things. I earned a Masters Degree in Computer Science graduating from
the EPFL (Swiss Federal Institute of Technology in Lausanne), with a
specialization in Software Systems.
When not programming for my job, I like to [design](https://git.inf3.xyz/tschwery/custom-keyboard) and build mechanical keyboards,
[shooting my bow](https://les-archers-du-bisse.ch/) and
[cooking](https://recettes.inf3.ch) with my family.
When not programming for my job, I like to
[design](https://git.inf3.xyz/tschwery/custom-keyboard) and build mechanical keyboards,
[shooting my bow](https://les-archers-du-bisse.ch/),
[cooking](https://recettes.inf3.ch) with my family
and learn new things by coding random and not-so-random projects.
This blog is a collection of things learnt during these explorations.
## My job
I have been working as a Software Developer at [SAI-ERP](https://sai-erp.net) since 2011.
I have been working as a Software Developer at [Groupe T2i](https://groupe-t2i.com)
since 2019.
I have previously worked as a student assistant at [EPFL](https://ic.epfl.ch).
I have previously worked
as a Software Developer at [SAI-ERP](https://sai-erp.net) from 2011 to 2019
and as a student assistant at [EPFL](https://ic.epfl.ch) from 2009 to 2012.
## Contact me
Find me on [Github](https://github.com/tschwery/) / my private [GitLab instance](https://git.inf3.xyz/explore/projects) / [Linkedin](www.linkedin.com/in/thomas-schwery) or just say by email at [thomas@inf3.ch](mailto:thomas@inf3.ch).
Find me on
[Github](https://github.com/tschwery/)
/ my private [GitLab instance](https://git.inf3.xyz/explore/projects)
/ [Linkedin](https://www.linkedin.com/in/thomas-schwery)
or just say Hi by email at [thomas@inf3.ch](mailto:thomas@inf3.ch).

View file

@ -12,6 +12,9 @@ title = "Thomas Schwery"
author = "Thomas Schwery"
copyright = "Thomas Schwery, No rights reserved (CC0)."
pygmentsCodeFences = true
pygmentsCodeFencesGuessSyntax = true
[params]
logo = "/images/logo.png"
subtitle = "A Glog ... Plog ... Blog ..."

View file

@ -1,6 +1,7 @@
body {
font-family: "Roboto", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif;
background-color: #FCFCFC;
text-align: justify;
}
h1 { font-size: 2.1rem; }

View file

@ -303,23 +303,6 @@ ol ul {
li {
margin-bottom: 1rem; }
/* Code
*/
code {
padding: .2rem .5rem;
margin: 0 .2rem;
font-size: 90%;
white-space: nowrap;
background: #F1F1F1;
border: 1px solid #E1E1E1;
border-radius: 4px; }
pre > code {
display: block;
padding: 1rem 1.5rem;
white-space: pre; }
/* Tables
*/
th,

View file

@ -0,0 +1,59 @@
/* Background */ .chroma { color: #93a1a1; background-color: #002b36 }
/* Other */ .chroma .x { color: #cb4b16 }
/* LineTableTD */ .chroma .lntd { vertical-align: top; padding: 0; margin: 0; border: 0; }
/* LineTable */ .chroma .lntable { border-spacing: 0; padding: 0; margin: 0; border: 0; width: auto; overflow: auto; display: block; }
/* LineHighlight */ .chroma .hl { display: block; width: 100%;background-color: #ffffcc }
/* LineNumbersTable */ .chroma .lnt { margin-right: 0.4em; padding: 0 0.4em 0 0.4em; }
/* LineNumbers */ .chroma .ln { margin-right: 0.4em; padding: 0 0.4em 0 0.4em; }
/* Keyword */ .chroma .k { color: #719e07 }
/* KeywordConstant */ .chroma .kc { color: #cb4b16 }
/* KeywordDeclaration */ .chroma .kd { color: #268bd2 }
/* KeywordNamespace */ .chroma .kn { color: #719e07 }
/* KeywordPseudo */ .chroma .kp { color: #719e07 }
/* KeywordReserved */ .chroma .kr { color: #268bd2 }
/* KeywordType */ .chroma .kt { color: #dc322f }
/* NameBuiltin */ .chroma .nb { color: #b58900 }
/* NameBuiltinPseudo */ .chroma .bp { color: #268bd2 }
/* NameClass */ .chroma .nc { color: #268bd2 }
/* NameConstant */ .chroma .no { color: #cb4b16 }
/* NameDecorator */ .chroma .nd { color: #268bd2 }
/* NameEntity */ .chroma .ni { color: #cb4b16 }
/* NameException */ .chroma .ne { color: #cb4b16 }
/* NameFunction */ .chroma .nf { color: #268bd2 }
/* NameTag */ .chroma .nt { color: #268bd2 }
/* NameVariable */ .chroma .nv { color: #268bd2 }
/* LiteralString */ .chroma .s { color: #2aa198 }
/* LiteralStringAffix */ .chroma .sa { color: #2aa198 }
/* LiteralStringBacktick */ .chroma .sb { color: #586e75 }
/* LiteralStringChar */ .chroma .sc { color: #2aa198 }
/* LiteralStringDelimiter */ .chroma .dl { color: #2aa198 }
/* LiteralStringDouble */ .chroma .s2 { color: #2aa198 }
/* LiteralStringEscape */ .chroma .se { color: #cb4b16 }
/* LiteralStringInterpol */ .chroma .si { color: #2aa198 }
/* LiteralStringOther */ .chroma .sx { color: #2aa198 }
/* LiteralStringRegex */ .chroma .sr { color: #dc322f }
/* LiteralStringSingle */ .chroma .s1 { color: #2aa198 }
/* LiteralStringSymbol */ .chroma .ss { color: #2aa198 }
/* LiteralNumber */ .chroma .m { color: #2aa198 }
/* LiteralNumberBin */ .chroma .mb { color: #2aa198 }
/* LiteralNumberFloat */ .chroma .mf { color: #2aa198 }
/* LiteralNumberHex */ .chroma .mh { color: #2aa198 }
/* LiteralNumberInteger */ .chroma .mi { color: #2aa198 }
/* LiteralNumberIntegerLong */ .chroma .il { color: #2aa198 }
/* LiteralNumberOct */ .chroma .mo { color: #2aa198 }
/* Operator */ .chroma .o { color: #719e07 }
/* OperatorWord */ .chroma .ow { color: #719e07 }
/* Comment */ .chroma .c { color: #586e75 }
/* CommentHashbang */ .chroma .ch { color: #586e75 }
/* CommentMultiline */ .chroma .cm { color: #586e75 }
/* CommentSingle */ .chroma .c1 { color: #586e75 }
/* CommentSpecial */ .chroma .cs { color: #719e07 }
/* CommentPreproc */ .chroma .cp { color: #719e07 }
/* CommentPreprocFile */ .chroma .cpf { color: #719e07 }
/* GenericDeleted */ .chroma .gd { color: #dc322f }
/* GenericEmph */ .chroma .ge { font-style: italic }
/* GenericError */ .chroma .gr { color: #dc322f; font-weight: bold }
/* GenericHeading */ .chroma .gh { color: #cb4b16 }
/* GenericInserted */ .chroma .gi { color: #719e07 }
/* GenericStrong */ .chroma .gs { font-weight: bold }
/* GenericSubheading */ .chroma .gu { color: #268bd2 }

View file

@ -1 +0,0 @@
hljs.initHighlightingOnLoad();

View file

@ -13,9 +13,5 @@
</div>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/highlight.min.js"></script>
{{ $highlightInitJs := resources.Get "js/init.js" | resources.Minify | resources.Fingerprint }}
<script src="{{ $highlightInitJs.Permalink }}" integrity="{{ $highlightInitJs.Data.Integrity }}"></script>
</body>
</html>

View file

@ -6,14 +6,15 @@
{{ $skeletonCss := resources.Get "css/skeleton.css" | resources.Minify | resources.Fingerprint }}
{{ $customCss := resources.Get "css/custom.css" | resources.Minify | resources.Fingerprint }}
{{ $normalizeCss := resources.Get "css/normalize.css" | resources.Minify | resources.Fingerprint }}
{{ $syntaxCss := resources.Get "css/syntax.css" | resources.Minify | resources.Fingerprint }}
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="//fonts.googleapis.com/css?family=Roboto:400,700" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<link rel="stylesheet" href="{{ $normalizeCss.Permalink }}">
<link rel="stylesheet" href="{{ $skeletonCss.Permalink }}">
<link rel="stylesheet" href="{{ $customCss.Permalink }}">
<link rel="stylesheet" href="{{ $syntaxCss.Permalink }}">
<link rel="alternate" href="/index.xml" type="application/rss+xml" title="{{ .Site.Title }}">