780 lines
23 KiB
Markdown
780 lines
23 KiB
Markdown
---
|
|
title: Subversion migration to Git
|
|
date: 2017-05-12 18:30:00
|
|
---
|
|
|
|
Some time ago I was tasked with migrating our Subversion repositories to Git. This article was only written
|
|
recently because, well, I had forgotten about the notes I had taken during the migration and only stumbled on
|
|
them recently.
|
|
|
|
Our largest repository was something like 500Go and contained a little more than 50'000 commits. The
|
|
goal was to recover the svn history into git, keep a much information as possible about the commits and
|
|
the links between them, keep the branches. During the history, there were a number of periodic database dumps
|
|
that were committed that now weighted down the repository without serving any purpose. There were also a number
|
|
of branches that were never used and contained nothing of interest.
|
|
|
|
The decision was also taken to split some of the tools into their own repositories instead of keeping them
|
|
into the same repository, cleaning up the main repository to keep only the main project and related sources.
|
|
|
|
## Principles
|
|
* After some experiments, I decided to use svn2git, a tool used by KDE for their migration. It has the
|
|
advantage of taking a rule file that allow splitting a repository by the svn path, processing tags and
|
|
branches and transforming them, ignoring other paths, ...
|
|
* As the import of such a large repository is slow, I decided to mount a btrfs partition so that each
|
|
step can be snapshotted, allowing me to test the next step without having any fear of having to start
|
|
again at the beginning.
|
|
* Some binary files were added to the svn history and it made sense keeping them. I decided to migrate
|
|
them to git-lfs to reduce the history size without losing them completely.
|
|
* A lot of commit messages contain references to other commits, I wanted to process these commit messages
|
|
and transform the reference to a `r` commit into a git hash so that tools can create a link automatically.
|
|
|
|
## Tools
|
|
The first to retrieve is [svn2git](https://github.com/svn-all-fast-export/svn2git).
|
|
|
|
The compilation should be easy. First install the dependencies and compile it.
|
|
|
|
```
|
|
$ git clone https://github.com/svn-all-fast-export/svn2git.git
|
|
$ sudo apt install libqt4-dev libapr1-dev libsvn-dev
|
|
$ qmake .
|
|
$ make
|
|
```
|
|
|
|
Once the tool is compiled, we can prepare the btrfs mount in which we will run the migration steps.
|
|
|
|
```
|
|
$ mkdir repositories
|
|
$ truncate -s 300G repositories.btrfs
|
|
$ sudo mkfs.btrfs repositories.btrfs
|
|
$ sudo mount repositories.btrfs repositories
|
|
$ sudo chown 1000:1000 repositories
|
|
```
|
|
|
|
We will also write a small tool in Go to process the commit messages.
|
|
|
|
```
|
|
sudo apt install golang
|
|
```
|
|
|
|
We will also need `bfg`, a git cleansing tool. You can download the jar
|
|
file on the [BFG Repo-Cleaner website](https://rtyley.github.io/bfg-repo-cleaner/).
|
|
|
|
## First steps
|
|
The first step of the migration is to retrieve the svn repository itself on the local machine. This is not a
|
|
checkout of the repository, we need the server folder directly, with the whole history and metadata.
|
|
|
|
```
|
|
rsync -avz --progress sshuser@svn.myserver.com:/srv/svn_myrepository/ .
|
|
```
|
|
|
|
In this case I had SSH access to the server, allowing me to simply rsync the repository. Doing so allowed
|
|
me to prepare the migration in advance, only copying the new commits on each synchronisation and not the
|
|
whole repository with its large history. Most of the repository files are never updated so this step is
|
|
only slow on the first execution.
|
|
|
|
### User mapping
|
|
The first step is to create a mapping file that will map the svn users to git users. A user in svn is a username
|
|
whereas in git this is a name and email address.
|
|
|
|
To get a list of user accounts, we can use the svn command directly on the local repository like this :
|
|
|
|
```
|
|
svn log file:///home/tsc/svn_myrepository \
|
|
| egrep '^r.*lines?$' \
|
|
| awk -F'|' '{print $2;}' \
|
|
| sort \
|
|
| uniq
|
|
```
|
|
|
|
This will return the list of users in the logs. For each of these users, you should create a line in a mapping
|
|
file, like so :
|
|
|
|
```
|
|
auser Albert User <albert.user@example.com>
|
|
aperson Anaelle Personn <anaelle.personn@example.com>
|
|
```
|
|
|
|
This file will be given as input to `svn2git` and should be complete, otherwise the import will fail.
|
|
|
|
### Path mapping
|
|
The second mapping for the svn to git migration of a repository is the svn2git rules. This file will tell
|
|
the program what will go where. In our case, the repository was not stricly adhering to the svn standard tree,
|
|
containing a trunk, tags and branches structure as well as some other folders for "out-of-branch" projects.
|
|
|
|
```txt
|
|
# We create the main repository
|
|
create repository svn_myrepository
|
|
end repository
|
|
|
|
# We create repositories for external tools that will move
|
|
# to their own repositories
|
|
create repository aproject
|
|
end repository
|
|
create repository bproject
|
|
end repository
|
|
create repository cproject
|
|
end repository
|
|
|
|
# We declare a variable to ease the declaration of the
|
|
# migration rules further down
|
|
declare PROJECTS=aproject|bproject|cproject
|
|
|
|
# We create repositories for out-of-branch folders
|
|
# that will migrate to their own repositories
|
|
create repository aoutofbranch
|
|
end repository
|
|
create repository boutofbranch
|
|
end repository
|
|
|
|
# We always ignore database dumps wherever there are.
|
|
# In our case, the database dumps are named "database-dump-20100112"
|
|
# or forms close to that.
|
|
match /.*/database([_-][^/]+)?[-_](dump|oracle|mysql)[^/]+
|
|
end match
|
|
|
|
# There are also dumps stored in their own folder
|
|
match /.*/database/backup(/old)?/.*(.zip|.sql|.lzma)
|
|
end match
|
|
|
|
# At some time the build results were also added to the history, we want
|
|
# to ignore them
|
|
match /.*/(build|dist|cache)/
|
|
end match
|
|
|
|
# We process our external tools only on the master branch.
|
|
# We use the previously declared variable to reduce the repetition
|
|
# and use the pattern match to move it to the correct repository.
|
|
match /trunk/(tools/)?(${PROJECTS})/
|
|
repository \2
|
|
branch master
|
|
end match
|
|
|
|
# And we ignore them if there are on tags or branches
|
|
match /.*/(tools/)?${PROJECTS}/
|
|
end match
|
|
|
|
# We start processing our main project after the r10, as the
|
|
# first commits were missing the trunk and moved the branches, trunk and tags
|
|
# folders around.
|
|
match /trunk/
|
|
min revision 10
|
|
repository svn_myrepository
|
|
branch master
|
|
end match
|
|
|
|
# There are branches that are hierarchically organized.
|
|
# Such cases have to be explicitly configured.
|
|
match /branches/(old|dev|customers)/([^/]+)/
|
|
repository svn_myrepository
|
|
branch \1/\2
|
|
end match
|
|
|
|
# Other branches are as expected directly in the branches folder.
|
|
match /branches/([^/]+)/
|
|
repository svn_myrepository
|
|
branch \1
|
|
end match
|
|
|
|
# The tags were used in a strange fashion before the commit r2500,
|
|
# so we ignore everything before that refactoring
|
|
match /tags/([^/]+)/
|
|
max revision 2500
|
|
end match
|
|
|
|
# After that, we create a branch for each tag as the svn tags
|
|
# were not used correctly and were committed to. We just name
|
|
# them differently and will process them afterwards.
|
|
match /tags/([^/]+)/([^/]+)/
|
|
min revision 2500
|
|
repository svn_myrepository
|
|
branch \1-\2
|
|
end match
|
|
|
|
# Our out-of-branch folder will be processed directly, only creating
|
|
# a master branch.
|
|
match /aoutofbranch/
|
|
repository aoutofbranch
|
|
branch master
|
|
end match
|
|
|
|
match /boutofbranch/
|
|
repository boutofbranch
|
|
branch master
|
|
end match
|
|
|
|
# Everything else is discarded and ignored
|
|
match /
|
|
end match
|
|
```
|
|
|
|
This file will quickly grow with the number of migration operations that you want to do. Ignore the
|
|
files here if possible as it will reduce the migration time as well as the postprocessing that will
|
|
need to be done afterwards. In my case, a number of files were too complex to match during the migration
|
|
or were spotted only afterwards and had to be cleaned in a second pass with other tools.
|
|
|
|
### Migration
|
|
This step will take a lot of time as it will read the whole svn history, process the declared rules and generate
|
|
the git repositories and every commit.
|
|
|
|
```
|
|
$ cd repositories
|
|
$ ~/workspace/svn2git/svn-all-fast-export \
|
|
--add-metadata \
|
|
--svn-branches \
|
|
--identity-map ~/workspace/migration-tools/accounts-map.txt \
|
|
--rules ~/workspace/migration-tools/svnfast.rules \
|
|
--commit-interval 2000 \
|
|
--stat \
|
|
/home/tsc/svn_myrepository
|
|
```
|
|
|
|
If there is a crash during this step, it means that you are either missing an account in your mapping, that
|
|
one of your rule is emitting an erroneous branch, repository or that no rule is matching.
|
|
|
|
Once this step finished, I like to do a btrfs snapshot so that I can return to this step when putting the
|
|
next steps into place.
|
|
|
|
```
|
|
btrfs subvolume snaphost -r repositories repositories/snap-1-import
|
|
```
|
|
|
|
## Cleanup
|
|
The next phase is to cleanup our import. There will always be a number of branches that are unused, named
|
|
incorrectly, contain only temporary files or branches that are so far from the standard naming that our
|
|
rules cannot process them correctly.
|
|
|
|
We will simply delete them or rename them using git.
|
|
|
|
```
|
|
$ cd svn_myrepository
|
|
$ git branch -D oldbranch-0.3.1
|
|
$ git branch -D customer/backup_temp
|
|
$ git branch -m customer/stable_v1.0 stable-1.0
|
|
```
|
|
|
|
The goal at this step is to cleanup the branches that will be kept after
|
|
the migration. We do this now to reduce the repository size early on and
|
|
thus reduce the time needed for the next steps.
|
|
|
|
If you see branches that can be deleted or renamed further down the road,
|
|
you can also remove or rename them then.
|
|
|
|
I like to take a snapshot at this stage as the next stage usually involves
|
|
a lot of tests and manually building a list of things to remove.
|
|
|
|
```
|
|
btrfs subvolume snaphost -r repositories repositories/snap-2a-cleanup
|
|
```
|
|
|
|
We can also remove files that were added and should not have been by checking
|
|
a list of every file every checked into our new git repository, inspecting
|
|
it manually and add the identifiers of files to remove in a new file :
|
|
|
|
```sh
|
|
$ git rev-list --objects --all > ./all-files
|
|
$ cat ./all-files | your-filter | cut -d' ' -f1 > ./to-delete-ids
|
|
$ java -jar ~/Downloads/bfg-1.12.15.jar --private --no-blob-protection --strip-blobs-with-ids ./to-delete-ids
|
|
```
|
|
|
|
We will take a snapshot again, as the next step also involves checks and
|
|
tests.
|
|
|
|
```
|
|
btrfs subvolume snaphost -r repositories repositories/snap-2b-cleanup
|
|
```
|
|
|
|
Next, we will convert the binary files that we still want to keep in our
|
|
repository to Git-LFS. This allows git to only keep track of the hash of
|
|
the file in the history and not store the whole binary in the repository,
|
|
thus reducing the size of the clones.
|
|
|
|
BFG does this quickly and efficiently, removing every file matching the
|
|
given name from the history and storing it in Git-LFS. This step will
|
|
require some exploration of the previous `all-files` file to identify which
|
|
files need to be converted.
|
|
|
|
```sh
|
|
$ java -jar ~/Downloads/bfg-1.12.15.jar --no-blob-protection --private --convert-to-git-lfs 'my-important-archive*.zip'
|
|
$ java -jar ~/Downloads/bfg-1.12.15.jar --no-blob-protection --private --convert-to-git-lfs '*.ear'
|
|
```
|
|
|
|
After the cleanup, I also like to do a btrfs snapshot so that the history
|
|
rewrite step can be executed and tested multiple times.
|
|
|
|
```
|
|
btrfs subvolume snaphost -r repositories repositories/snap-2c-cleanup
|
|
```
|
|
|
|
### Linking a svn revision to a git commit
|
|
The logs prints for each revision a line mapping to a mark on the git marks file. In the git repository, there
|
|
is then a marks file that map this mark to a commit hash. We can use this information to build a mapping database
|
|
that can store that information for later.
|
|
|
|
In our case, I wrote a Java program that will parse both files and store
|
|
the resulting mapping into a LevelDB database.
|
|
|
|
This database will then be used by a Golang server that will read this mapping
|
|
database in memory and serve a RPC server that we will call from Golang
|
|
binaries in a `git filter-branch` call. The Golang server will also need
|
|
to keep track of the modifications to the git commit hashes as the history
|
|
rewrite changes them.
|
|
|
|
First, the Java tool to read the logs and generate the LevelDB database :
|
|
|
|
```java
|
|
import com.google.common.collect.BiMap;
|
|
import com.google.common.collect.HashBiMap;
|
|
import java.io.File;
|
|
import java.io.FileOutputStream;
|
|
import java.io.FileReader;
|
|
import java.io.FileWriter;
|
|
import java.io.PrintStream;
|
|
import java.util.ArrayList;
|
|
import java.util.Collection;
|
|
import java.util.Collections;
|
|
import java.util.HashMap;
|
|
import java.util.LinkedHashMap;
|
|
import java.util.List;
|
|
import java.util.Map;
|
|
import java.util.regex.Matcher;
|
|
import java.util.regex.Pattern;
|
|
import java.util.stream.Collectors;
|
|
import org.apache.commons.io.FileUtils;
|
|
import org.apache.commons.io.IOUtils;
|
|
import org.apache.commons.io.filefilter.DirectoryFileFilter;
|
|
import org.apache.commons.io.filefilter.IOFileFilter;
|
|
import org.iq80.leveldb.DB;
|
|
import org.iq80.leveldb.Options;
|
|
import org.iq80.leveldb.impl.Iq80DBFactory;
|
|
|
|
public class CommitMapping {
|
|
|
|
public static String FILE_LOG_IMPORT = "../log-svn_myrepository";
|
|
public static String FILE_MARKS = "marks-svn_myrepository";
|
|
public static String FILE_BFG_DIR = "../svn_myrepository.bfg-report";
|
|
|
|
public static Pattern PATTERN_LOG = Pattern.compile("^progress SVN (r\\d+) branch .* = (:\\d+)");
|
|
|
|
public static void main(String[] args) throws Exception {
|
|
|
|
List<String> importLines = IOUtils.readLines(new FileReader(new File(FILE_LOG_IMPORT)));
|
|
List<String> marksLines = IOUtils.readLines(new FileReader(new File(FILE_MARKS)));
|
|
|
|
Collection<File> passFilesCol = FileUtils.listFiles(new File(FILE_BFG_DIR), new IOFileFilter() {
|
|
@Override
|
|
public boolean accept(File pathname, String name) {
|
|
return name.equals("object-id-map.old-new.txt");
|
|
}
|
|
|
|
@Override
|
|
public boolean accept(File path) {
|
|
return this.accept(path, path.getName());
|
|
}
|
|
}, DirectoryFileFilter.DIRECTORY);
|
|
|
|
List<File> passFiles = new ArrayList<>(passFilesCol);
|
|
|
|
Collections.sort(passFiles, (File o1, File o2) -> o1.getParentFile().getName().compareTo(o2.getParentFile().getName()));
|
|
|
|
Map<String, String> commitToIdentifier = new LinkedHashMap<>();
|
|
Map<String, String> identifierToHash = new HashMap<>();
|
|
|
|
for (String importLine : importLines) {
|
|
Matcher marksMatch = PATTERN_LOG.matcher(importLine);
|
|
|
|
if (marksMatch.find()) {
|
|
String dest = marksMatch.group(2);
|
|
if (dest == null || dest.length() == 0 || ":0".equals(dest)) continue;
|
|
|
|
commitToIdentifier.put(marksMatch.group(1), dest);
|
|
} else {
|
|
System.err.println("Unknown line : " + importLine);
|
|
}
|
|
|
|
}
|
|
|
|
File dbFile = new File(System.getenv("HOME") + "/mapping-db");
|
|
File humanFile = new File(System.getenv("HOME") + "/mapping");
|
|
|
|
FileUtils.deleteQuietly(dbFile);
|
|
|
|
Options options = new Options();
|
|
options.createIfMissing(true);
|
|
DB db = Iq80DBFactory.factory.open(dbFile, options);
|
|
|
|
marksLines.stream().map((line) -> line.split("\\s", 2)).forEach((parts) -> identifierToHash.put(parts[0], parts[1]));
|
|
|
|
BiMap<String, String> commitMapping = HashBiMap.create(commitToIdentifier.size());
|
|
for (String commit : commitToIdentifier.keySet()) {
|
|
|
|
String importId = commitToIdentifier.get(commit);
|
|
String hash = identifierToHash.get(importId);
|
|
|
|
if (hash == null) continue;
|
|
commitMapping.put(commit, hash);
|
|
}
|
|
|
|
System.err.println("Got " + commitMapping.size() + " svn -> initial import entries.");
|
|
|
|
for (File file : passFiles) {
|
|
System.err.println("Processing file " + file.getAbsolutePath());
|
|
|
|
List<String> bfgPass = IOUtils.readLines(new FileReader(file));
|
|
Map<String, String> hashMapping = bfgPass.stream().map((line) -> line.split("\\s", 2)).collect(Collectors.toMap(parts -> parts[0], parts -> parts[1]));
|
|
|
|
for (String hash : hashMapping.keySet()) {
|
|
String rev = commitMapping.inverse().get(hash);
|
|
if (rev != null) {
|
|
String newHash = hashMapping.get(hash);
|
|
System.err.println("Replacing r" + rev + ", was " + hash + ", is " + newHash);
|
|
commitMapping.replace(rev, newHash);
|
|
}
|
|
}
|
|
}
|
|
|
|
PrintStream fos = new PrintStream(humanFile);
|
|
for (Map.Entry<String, String> entry : commitMapping.entrySet()) {
|
|
String commit = entry.getKey();
|
|
String target = entry.getValue();
|
|
|
|
fos.println(commit + "\t" + target);
|
|
db.put(Iq80DBFactory.bytes(commit), Iq80DBFactory.bytes(target));
|
|
}
|
|
|
|
db.close();
|
|
fos.close();
|
|
}
|
|
}
|
|
```
|
|
|
|
We will use RPC between a client and server to allow the LevelDB database
|
|
to be kept open and have very light clients that query a running server
|
|
as they will be executed for each commit. After some tests, opening the
|
|
database was really time consuming thus this approach, even though the
|
|
server will do very little.
|
|
|
|
The structure of our go project is the following :
|
|
|
|
```txt
|
|
go-gitcommit/client-common:
|
|
rpc.go
|
|
|
|
go-gitcommit/client-insert:
|
|
insert-mapping.go
|
|
|
|
go-gitcommit/client-query:
|
|
query-mapping.go
|
|
|
|
go-gitcommit/server:
|
|
server.go
|
|
```
|
|
|
|
First, some plumping for the RPC in `rpc.go` :
|
|
|
|
```go
|
|
package Client
|
|
|
|
import (
|
|
"net"
|
|
"net/rpc"
|
|
"time"
|
|
)
|
|
|
|
type (
|
|
// Client -
|
|
Client struct {
|
|
connection *rpc.Client
|
|
}
|
|
|
|
// MappingItem is the response from the cache or the item to insert into the cache
|
|
MappingItem struct {
|
|
Key string
|
|
Value string
|
|
}
|
|
|
|
// BulkQuery allows to mass query the DB in one go.
|
|
BulkQuery []MappingItem
|
|
)
|
|
|
|
// NewClient -
|
|
func NewClient(dsn string, timeout time.Duration) (*Client, error) {
|
|
connection, err := net.DialTimeout("tcp", dsn, timeout)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
return &Client{connection: rpc.NewClient(connection)}, nil
|
|
}
|
|
|
|
// InsertMapping -
|
|
func (c *Client) InsertMapping(item MappingItem) (bool, error) {
|
|
var ack bool
|
|
err := c.connection.Call("RPC.InsertMapping", item, &ack)
|
|
return ack, err
|
|
}
|
|
|
|
// GetMapping -
|
|
func (c *Client) GetMapping(bulk BulkQuery) (BulkQuery, error) {
|
|
var bulkResponse BulkQuery
|
|
err := c.connection.Call("RPC.GetMapping", bulk, &bulkResponse)
|
|
return bulkResponse, err
|
|
}
|
|
```
|
|
|
|
Next the Golang server that will read this database in `server.go` :
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"fmt"
|
|
"log"
|
|
"net"
|
|
"net/rpc"
|
|
"os"
|
|
"time"
|
|
|
|
"github.com/syndtr/goleveldb/leveldb"
|
|
|
|
Client "../client-common"
|
|
)
|
|
|
|
var (
|
|
cacheDBPath = os.Getenv("HOME") + "/mapping-db"
|
|
|
|
cacheDB *leveldb.DB
|
|
flowMap map[string]string
|
|
|
|
f *os.File
|
|
g *os.File
|
|
)
|
|
|
|
type (
|
|
// RPC is the base class of our RPC system
|
|
RPC struct {
|
|
}
|
|
)
|
|
|
|
func main() {
|
|
var cacheDBerr error
|
|
|
|
cacheDB, cacheDBerr = leveldb.OpenFile(cacheDBPath, nil)
|
|
if cacheDBerr != nil {
|
|
fmt.Fprintln(os.Stderr, "Unable to initialize the LevelDB cache.")
|
|
log.Fatal(cacheDBerr)
|
|
}
|
|
|
|
roErr := cacheDB.SetReadOnly()
|
|
if roErr != nil {
|
|
fmt.Fprintln(os.Stderr, "Unable to initialize the LevelDB cache.")
|
|
log.Fatal(roErr)
|
|
}
|
|
|
|
flowMap = make(map[string]string)
|
|
|
|
f, _ = os.Create(os.Getenv("HOME") + "/go-server/gomapping.log")
|
|
defer f.Close()
|
|
g, _ = os.Create(os.Getenv("HOME") + "/go-server/gomapping.ins")
|
|
defer g.Close()
|
|
|
|
rpc.Register(NewRPC())
|
|
|
|
l, e := net.Listen("tcp", ":9876")
|
|
if e != nil {
|
|
log.Fatal("listen error:", e)
|
|
}
|
|
|
|
go flushLog()
|
|
|
|
rpc.Accept(l)
|
|
}
|
|
|
|
func flushLog() {
|
|
for {
|
|
time.Sleep(100 * time.Millisecond)
|
|
f.Sync()
|
|
}
|
|
}
|
|
|
|
// NewRPC -
|
|
func NewRPC() *RPC {
|
|
return &RPC{}
|
|
}
|
|
|
|
// InsertMapping -
|
|
func (r *RPC) InsertMapping(mappingItem Client.MappingItem, ack *bool) error {
|
|
old := mappingItem.Key
|
|
new := mappingItem.Value
|
|
|
|
flowMap[old] = new
|
|
|
|
g.WriteString(fmt.Sprintf("Inserted mapping %s -> %s\n", old, new))
|
|
|
|
*ack = true
|
|
|
|
return nil
|
|
}
|
|
|
|
// GetMapping -
|
|
func (r *RPC) GetMapping(bulkQuery Client.BulkQuery, resp *Client.BulkQuery) error {
|
|
for i := range bulkQuery {
|
|
key := bulkQuery[i].Key
|
|
|
|
response, _ := cacheDB.Get([]byte(key), nil)
|
|
|
|
gitCommit := key
|
|
if response != nil {
|
|
responseStr := string(response[:])
|
|
responseUpdated := flowMap[responseStr]
|
|
if responseUpdated != "" {
|
|
gitCommit = string(responseUpdated[:])[:12] + "(" + key + ")"
|
|
|
|
f.WriteString(fmt.Sprintf("Response to mapping %s -> %s\n", bulkQuery[i].Key, gitCommit))
|
|
} else {
|
|
f.WriteString(fmt.Sprintf("No git mapping for entry %s\n", responseStr))
|
|
}
|
|
} else {
|
|
f.WriteString(fmt.Sprintf("Unknown revision %s\n", key))
|
|
}
|
|
|
|
bulkQuery[i].Value = gitCommit
|
|
}
|
|
|
|
*resp = bulkQuery
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
And finally our clients. The insert client will be called from `git filter-branch`
|
|
with the previous and current commit hashes after processing each commit. We
|
|
store this information into the database so that the hashes are correct when
|
|
mapping a revision. The code goes into `insert-mapping.go` :
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"fmt"
|
|
"log"
|
|
"os"
|
|
"time"
|
|
|
|
Client "../client-common"
|
|
)
|
|
|
|
func main() {
|
|
old := os.Args[1]
|
|
new := os.Args[2]
|
|
|
|
rpcClient, err := Client.NewClient("localhost:9876", time.Millisecond*500)
|
|
if err != nil {
|
|
log.Fatal(err)
|
|
}
|
|
|
|
mappingItem := Client.MappingItem{
|
|
Key: old,
|
|
Value: new,
|
|
}
|
|
|
|
ack, err := rpcClient.InsertMapping(mappingItem)
|
|
if err != nil || !ack {
|
|
log.Fatal(err)
|
|
}
|
|
|
|
fmt.Println(new)
|
|
}
|
|
```
|
|
|
|
The query client will receive the commit message for each commit, check
|
|
whether it contains a `r` mapping and query the server for a hash for this
|
|
commit. It goes into `query-mapping.go` :
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"bufio"
|
|
"fmt"
|
|
"log"
|
|
"os"
|
|
"regexp"
|
|
"strings"
|
|
"time"
|
|
|
|
client "../client-common"
|
|
)
|
|
|
|
func main() {
|
|
reader := bufio.NewReader(os.Stdin)
|
|
text, _ := reader.ReadString('\n')
|
|
|
|
re := regexp.MustCompile(`\Wr[0-9]+`)
|
|
matches := re.FindAllString(text, -1)
|
|
|
|
if matches == nil {
|
|
fmt.Print(text)
|
|
return
|
|
}
|
|
|
|
rpcClient, err := client.NewClient("localhost:9876", time.Millisecond*500)
|
|
if err != nil {
|
|
log.Fatal(err)
|
|
}
|
|
|
|
var bulkQuery client.BulkQuery
|
|
|
|
for i := range matches {
|
|
if matches[i][0] != '-' {
|
|
key := matches[i][1:]
|
|
bulkQuery = append(bulkQuery, client.MappingItem{Key: key})
|
|
}
|
|
}
|
|
|
|
gitCommits, _ := rpcClient.GetMapping(bulkQuery)
|
|
|
|
for i := range gitCommits {
|
|
gitCommit := gitCommits[i].Value
|
|
key := gitCommits[i].Key
|
|
|
|
text = strings.Replace(text, key, gitCommit, 1)
|
|
}
|
|
|
|
fmt.Print(text)
|
|
}
|
|
|
|
```
|
|
|
|
For this step, we will need to first compile and execute the Java program.
|
|
Once it succeeded in creating the database, we will compile and execute
|
|
the Go server in the background.
|
|
|
|
Then, we can launch `git filter-branch` on our repository to rewrite the
|
|
history :
|
|
|
|
```sh
|
|
$ git filter-branch \
|
|
--commit-filter 'NEW=`git_commit_non_empty_tree "$@"`; \
|
|
${HOME}/migration-tools/go-gitcommit/client-insert/client-insert $GIT_COMMIT $NEW' \
|
|
--msg-filter "${HOME}/migration-tools/go-gitcommit/client-query/client-query" \
|
|
-- --all --author-date-order
|
|
```
|
|
|
|
As after each step, we will generate a snapshot, even though it should be
|
|
the last step that cannot be repeated easily.
|
|
|
|
```
|
|
btrfs subvolume snaphost -r repositories repositories/snap-3-mapping
|
|
```
|
|
|
|
We now clean the repository that should contain a lot of unused blobs,
|
|
branches, commits, ...
|
|
|
|
```sh
|
|
$ git reflog expire --expire=now --all
|
|
$ git prune --expire=now --progress
|
|
$ git repack -adf --window-memory=512m
|
|
```
|
|
|
|
We now have a repository that should be more or less clean. You will have
|
|
to check the history, the size of the blobs and whether some branches can
|
|
still be deleted before pushing it to your server.
|