Updates articles #12
1 changed files with 780 additions and 0 deletions
780
articles/2017-05-12-subversion-migration.md
Normal file
780
articles/2017-05-12-subversion-migration.md
Normal file
|
@ -0,0 +1,780 @@
|
|||
---
|
||||
title: Subversion migration to Git
|
||||
date: 2017-05-12 18:30:00
|
||||
---
|
||||
|
||||
Some time ago I was tasked with migrating our Subversion repositories to Git. This article was only written
|
||||
recently because, well, I had forgotten about the notes I had taken during the migration and only stumbled on
|
||||
them recently.
|
||||
|
||||
Our largest repository was something like 500Go and contained a little more than 50'000 commits. The
|
||||
goal was to recover the svn history into git, keep a much information as possible about the commits and
|
||||
the links between them, keep the branches. During the history, there were a number of periodic database dumps
|
||||
that were committed that now weighted down the repository without serving any purpose. There were also a number
|
||||
of branches that were never used and contained nothing of interest.
|
||||
|
||||
The decision was also taken to split some of the tools into their own repositories instead of keeping them
|
||||
into the same repository, cleaning up the main repository to keep only the main project and related sources.
|
||||
|
||||
## Principles
|
||||
* After some experiments, I decided to use svn2git, a tool used by KDE for their migration. It has the
|
||||
advantage of taking a rule file that allow splitting a repository by the svn path, processing tags and
|
||||
branches and transforming them, ignoring other paths, ...
|
||||
* As the import of such a large repository is slow, I decided to mount a btrfs partition so that each
|
||||
step can be snapshotted, allowing me to test the next step without having any fear of having to start
|
||||
again at the beginning.
|
||||
* Some binary files were added to the svn history and it made sense keeping them. I decided to migrate
|
||||
them to git-lfs to reduce the history size without losing them completely.
|
||||
* A lot of commit messages contain references to other commits, I wanted to process these commit messages
|
||||
and transform the reference to a `r` commit into a git hash so that tools can create a link automatically.
|
||||
|
||||
## Tools
|
||||
The first to retrieve is [svn2git](https://github.com/svn-all-fast-export/svn2git).
|
||||
|
||||
The compilation should be easy. First install the dependencies and compile it.
|
||||
|
||||
```
|
||||
$ git clone https://github.com/svn-all-fast-export/svn2git.git
|
||||
$ sudo apt install libqt4-dev libapr1-dev libsvn-dev
|
||||
$ qmake .
|
||||
$ make
|
||||
```
|
||||
|
||||
Once the tool is compiled, we can prepare the btrfs mount in which we will run the migration steps.
|
||||
|
||||
```
|
||||
$ mkdir repositories
|
||||
$ truncate -s 300G repositories.btrfs
|
||||
$ sudo mkfs.btrfs repositories.btrfs
|
||||
$ sudo mount repositories.btrfs repositories
|
||||
$ sudo chown 1000:1000 repositories
|
||||
```
|
||||
|
||||
We will also write a small tool in Go to process the commit messages.
|
||||
|
||||
```
|
||||
sudo apt install golang
|
||||
```
|
||||
|
||||
We will also need `bfg`, a git cleansing tool. You can download the jar
|
||||
file on the [BFG Repo-Cleaner website](https://rtyley.github.io/bfg-repo-cleaner/).
|
||||
|
||||
## First steps
|
||||
The first step of the migration is to retrieve the svn repository itself on the local machine. This is not a
|
||||
checkout of the repository, we need the server folder directly, with the whole history and metadata.
|
||||
|
||||
```
|
||||
rsync -avz --progress sshuser@svn.myserver.com:/srv/svn_myrepository/ .
|
||||
```
|
||||
|
||||
In this case I had SSH access to the server, allowing me to simply rsync the repository. Doing so allowed
|
||||
me to prepare the migration in advance, only copying the new commits on each synchronisation and not the
|
||||
whole repository with its large history. Most of the repository files are never updated so this step is
|
||||
only slow on the first execution.
|
||||
|
||||
### User mapping
|
||||
The first step is to create a mapping file that will map the svn users to git users. A user in svn is a username
|
||||
whereas in git this is a name and email address.
|
||||
|
||||
To get a list of user accounts, we can use the svn command directly on the local repository like this :
|
||||
|
||||
```
|
||||
svn log file:///home/tsc/svn_myrepository \
|
||||
| egrep '^r.*lines?$' \
|
||||
| awk -F'|' '{print $2;}' \
|
||||
| sort \
|
||||
| uniq
|
||||
```
|
||||
|
||||
This will return the list of users in the logs. For each of these users, you should create a line in a mapping
|
||||
file, like so :
|
||||
|
||||
```
|
||||
auser Albert User <albert.user@example.com>
|
||||
aperson Anaelle Personn <anaelle.personn@example.com>
|
||||
```
|
||||
|
||||
This file will be given as input to `svn2git` and should be complete, otherwise the import will fail.
|
||||
|
||||
### Path mapping
|
||||
The second mapping for the svn to git migration of a repository is the svn2git rules. This file will tell
|
||||
the program what will go where. In our case, the repository was not stricly adhering to the svn standard tree,
|
||||
containing a trunk, tags and branches structure as well as some other folders for "out-of-branch" projects.
|
||||
|
||||
```txt
|
||||
# We create the main repository
|
||||
create repository svn_myrepository
|
||||
end repository
|
||||
|
||||
# We create repositories for external tools that will move
|
||||
# to their own repositories
|
||||
create repository aproject
|
||||
end repository
|
||||
create repository bproject
|
||||
end repository
|
||||
create repository cproject
|
||||
end repository
|
||||
|
||||
# We declare a variable to ease the declaration of the
|
||||
# migration rules further down
|
||||
declare PROJECTS=aproject|bproject|cproject
|
||||
|
||||
# We create repositories for out-of-branch folders
|
||||
# that will migrate to their own repositories
|
||||
create repository aoutofbranch
|
||||
end repository
|
||||
create repository boutofbranch
|
||||
end repository
|
||||
|
||||
# We always ignore database dumps wherever there are.
|
||||
# In our case, the database dumps are named "database-dump-20100112"
|
||||
# or forms close to that.
|
||||
match /.*/database([_-][^/]+)?[-_](dump|oracle|mysql)[^/]+
|
||||
end match
|
||||
|
||||
# There are also dumps stored in their own folder
|
||||
match /.*/database/backup(/old)?/.*(.zip|.sql|.lzma)
|
||||
end match
|
||||
|
||||
# At some time the build results were also added to the history, we want
|
||||
# to ignore them
|
||||
match /.*/(build|dist|cache)/
|
||||
end match
|
||||
|
||||
# We process our external tools only on the master branch.
|
||||
# We use the previously declared variable to reduce the repetition
|
||||
# and use the pattern match to move it to the correct repository.
|
||||
match /trunk/(tools/)?(${PROJECTS})/
|
||||
repository \2
|
||||
branch master
|
||||
end match
|
||||
|
||||
# And we ignore them if there are on tags or branches
|
||||
match /.*/(tools/)?${PROJECTS}/
|
||||
end match
|
||||
|
||||
# We start processing our main project after the r10, as the
|
||||
# first commits were missing the trunk and moved the branches, trunk and tags
|
||||
# folders around.
|
||||
match /trunk/
|
||||
min revision 10
|
||||
repository svn_myrepository
|
||||
branch master
|
||||
end match
|
||||
|
||||
# There are branches that are hierarchically organized.
|
||||
# Such cases have to be explicitly configured.
|
||||
match /branches/(old|dev|customers)/([^/]+)/
|
||||
repository svn_myrepository
|
||||
branch \1/\2
|
||||
end match
|
||||
|
||||
# Other branches are as expected directly in the branches folder.
|
||||
match /branches/([^/]+)/
|
||||
repository svn_myrepository
|
||||
branch \1
|
||||
end match
|
||||
|
||||
# The tags were used in a strange fashion before the commit r2500,
|
||||
# so we ignore everything before that refactoring
|
||||
match /tags/([^/]+)/
|
||||
max revision 2500
|
||||
end match
|
||||
|
||||
# After that, we create a branch for each tag as the svn tags
|
||||
# were not used correctly and were committed to. We just name
|
||||
# them differently and will process them afterwards.
|
||||
match /tags/([^/]+)/([^/]+)/
|
||||
min revision 2500
|
||||
repository svn_myrepository
|
||||
branch \1-\2
|
||||
end match
|
||||
|
||||
# Our out-of-branch folder will be processed directly, only creating
|
||||
# a master branch.
|
||||
match /aoutofbranch/
|
||||
repository aoutofbranch
|
||||
branch master
|
||||
end match
|
||||
|
||||
match /boutofbranch/
|
||||
repository boutofbranch
|
||||
branch master
|
||||
end match
|
||||
|
||||
# Everything else is discarded and ignored
|
||||
match /
|
||||
end match
|
||||
```
|
||||
|
||||
This file will quickly grow with the number of migration operations that you want to do. Ignore the
|
||||
files here if possible as it will reduce the migration time as well as the postprocessing that will
|
||||
need to be done afterwards. In my case, a number of files were too complex to match during the migration
|
||||
or were spotted only afterwards and had to be cleaned in a second pass with other tools.
|
||||
|
||||
### Migration
|
||||
This step will take a lot of time as it will read the whole svn history, process the declared rules and generate
|
||||
the git repositories and every commit.
|
||||
|
||||
```
|
||||
$ cd repositories
|
||||
$ ~/workspace/svn2git/svn-all-fast-export \
|
||||
--add-metadata \
|
||||
--svn-branches \
|
||||
--identity-map ~/workspace/migration-tools/accounts-map.txt \
|
||||
--rules ~/workspace/migration-tools/svnfast.rules \
|
||||
--commit-interval 2000 \
|
||||
--stat \
|
||||
/home/tsc/svn_myrepository
|
||||
```
|
||||
|
||||
If there is a crash during this step, it means that you are either missing an account in your mapping, that
|
||||
one of your rule is emitting an erroneous branch, repository or that no rule is matching.
|
||||
|
||||
Once this step finished, I like to do a btrfs snapshot so that I can return to this step when putting the
|
||||
next steps into place.
|
||||
|
||||
```
|
||||
btrfs subvolume snaphost -r repositories repositories/snap-1-import
|
||||
```
|
||||
|
||||
## Cleanup
|
||||
The next phase is to cleanup our import. There will always be a number of branches that are unused, named
|
||||
incorrectly, contain only temporary files or branches that are so far from the standard naming that our
|
||||
rules cannot process them correctly.
|
||||
|
||||
We will simply delete them or rename them using git.
|
||||
|
||||
```
|
||||
$ cd svn_myrepository
|
||||
$ git branch -D oldbranch-0.3.1
|
||||
$ git branch -D customer/backup_temp
|
||||
$ git branch -m customer/stable_v1.0 stable-1.0
|
||||
```
|
||||
|
||||
The goal at this step is to cleanup the branches that will be kept after
|
||||
the migration. We do this now to reduce the repository size early on and
|
||||
thus reduce the time needed for the next steps.
|
||||
|
||||
If you see branches that can be deleted or renamed further down the road,
|
||||
you can also remove or rename them then.
|
||||
|
||||
I like to take a snapshot at this stage as the next stage usually involves
|
||||
a lot of tests and manually building a list of things to remove.
|
||||
|
||||
```
|
||||
btrfs subvolume snaphost -r repositories repositories/snap-2a-cleanup
|
||||
```
|
||||
|
||||
We can also remove files that were added and should not have been by checking
|
||||
a list of every file every checked into our new git repository, inspecting
|
||||
it manually and add the identifiers of files to remove in a new file :
|
||||
|
||||
```sh
|
||||
$ git rev-list --objects --all > ./all-files
|
||||
$ cat ./all-files | your-filter | cut -d' ' -f1 > ./to-delete-ids
|
||||
$ java -jar ~/Downloads/bfg-1.12.15.jar --private --no-blob-protection --strip-blobs-with-ids ./to-delete-ids
|
||||
```
|
||||
|
||||
We will take a snapshot again, as the next step also involves checks and
|
||||
tests.
|
||||
|
||||
```
|
||||
btrfs subvolume snaphost -r repositories repositories/snap-2b-cleanup
|
||||
```
|
||||
|
||||
Next, we will convert the binary files that we still want to keep in our
|
||||
repository to Git-LFS. This allows git to only keep track of the hash of
|
||||
the file in the history and not store the whole binary in the repository,
|
||||
thus reducing the size of the clones.
|
||||
|
||||
BFG does this quickly and efficiently, removing every file matching the
|
||||
given name from the history and storing it in Git-LFS. This step will
|
||||
require some exploration of the previous `all-files` file to identify which
|
||||
files need to be converted.
|
||||
|
||||
```sh
|
||||
$ java -jar ~/Downloads/bfg-1.12.15.jar --no-blob-protection --private --convert-to-git-lfs 'my-important-archive*.zip'
|
||||
$ java -jar ~/Downloads/bfg-1.12.15.jar --no-blob-protection --private --convert-to-git-lfs '*.ear'
|
||||
```
|
||||
|
||||
After the cleanup, I also like to do a btrfs snapshot so that the history
|
||||
rewrite step can be executed and tested multiple times.
|
||||
|
||||
```
|
||||
btrfs subvolume snaphost -r repositories repositories/snap-2c-cleanup
|
||||
```
|
||||
|
||||
### Linking a svn revision to a git commit
|
||||
The logs prints for each revision a line mapping to a mark on the git marks file. In the git repository, there
|
||||
is then a marks file that map this mark to a commit hash. We can use this information to build a mapping database
|
||||
that can store that information for later.
|
||||
|
||||
In our case, I wrote a Java program that will parse both files and store
|
||||
the resulting mapping into a LevelDB database.
|
||||
|
||||
This database will then be used by a Golang server that will read this mapping
|
||||
database in memory and serve a RPC server that we will call from Golang
|
||||
binaries in a `git filter-branch` call. The Golang server will also need
|
||||
to keep track of the modifications to the git commit hashes as the history
|
||||
rewrite changes them.
|
||||
|
||||
First, the Java tool to read the logs and generate the LevelDB database :
|
||||
|
||||
```java
|
||||
import com.google.common.collect.BiMap;
|
||||
import com.google.common.collect.HashBiMap;
|
||||
import java.io.File;
|
||||
import java.io.FileOutputStream;
|
||||
import java.io.FileReader;
|
||||
import java.io.FileWriter;
|
||||
import java.io.PrintStream;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collection;
|
||||
import java.util.Collections;
|
||||
import java.util.HashMap;
|
||||
import java.util.LinkedHashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.regex.Matcher;
|
||||
import java.util.regex.Pattern;
|
||||
import java.util.stream.Collectors;
|
||||
import org.apache.commons.io.FileUtils;
|
||||
import org.apache.commons.io.IOUtils;
|
||||
import org.apache.commons.io.filefilter.DirectoryFileFilter;
|
||||
import org.apache.commons.io.filefilter.IOFileFilter;
|
||||
import org.iq80.leveldb.DB;
|
||||
import org.iq80.leveldb.Options;
|
||||
import org.iq80.leveldb.impl.Iq80DBFactory;
|
||||
|
||||
public class CommitMapping {
|
||||
|
||||
public static String FILE_LOG_IMPORT = "../log-svn_myrepository";
|
||||
public static String FILE_MARKS = "marks-svn_myrepository";
|
||||
public static String FILE_BFG_DIR = "../svn_myrepository.bfg-report";
|
||||
|
||||
public static Pattern PATTERN_LOG = Pattern.compile("^progress SVN (r\\d+) branch .* = (:\\d+)");
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
|
||||
List<String> importLines = IOUtils.readLines(new FileReader(new File(FILE_LOG_IMPORT)));
|
||||
List<String> marksLines = IOUtils.readLines(new FileReader(new File(FILE_MARKS)));
|
||||
|
||||
Collection<File> passFilesCol = FileUtils.listFiles(new File(FILE_BFG_DIR), new IOFileFilter() {
|
||||
@Override
|
||||
public boolean accept(File pathname, String name) {
|
||||
return name.equals("object-id-map.old-new.txt");
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean accept(File path) {
|
||||
return this.accept(path, path.getName());
|
||||
}
|
||||
}, DirectoryFileFilter.DIRECTORY);
|
||||
|
||||
List<File> passFiles = new ArrayList<>(passFilesCol);
|
||||
|
||||
Collections.sort(passFiles, (File o1, File o2) -> o1.getParentFile().getName().compareTo(o2.getParentFile().getName()));
|
||||
|
||||
Map<String, String> commitToIdentifier = new LinkedHashMap<>();
|
||||
Map<String, String> identifierToHash = new HashMap<>();
|
||||
|
||||
for (String importLine : importLines) {
|
||||
Matcher marksMatch = PATTERN_LOG.matcher(importLine);
|
||||
|
||||
if (marksMatch.find()) {
|
||||
String dest = marksMatch.group(2);
|
||||
if (dest == null || dest.length() == 0 || ":0".equals(dest)) continue;
|
||||
|
||||
commitToIdentifier.put(marksMatch.group(1), dest);
|
||||
} else {
|
||||
System.err.println("Unknown line : " + importLine);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
File dbFile = new File(System.getenv("HOME") + "/mapping-db");
|
||||
File humanFile = new File(System.getenv("HOME") + "/mapping");
|
||||
|
||||
FileUtils.deleteQuietly(dbFile);
|
||||
|
||||
Options options = new Options();
|
||||
options.createIfMissing(true);
|
||||
DB db = Iq80DBFactory.factory.open(dbFile, options);
|
||||
|
||||
marksLines.stream().map((line) -> line.split("\\s", 2)).forEach((parts) -> identifierToHash.put(parts[0], parts[1]));
|
||||
|
||||
BiMap<String, String> commitMapping = HashBiMap.create(commitToIdentifier.size());
|
||||
for (String commit : commitToIdentifier.keySet()) {
|
||||
|
||||
String importId = commitToIdentifier.get(commit);
|
||||
String hash = identifierToHash.get(importId);
|
||||
|
||||
if (hash == null) continue;
|
||||
commitMapping.put(commit, hash);
|
||||
}
|
||||
|
||||
System.err.println("Got " + commitMapping.size() + " svn -> initial import entries.");
|
||||
|
||||
for (File file : passFiles) {
|
||||
System.err.println("Processing file " + file.getAbsolutePath());
|
||||
|
||||
List<String> bfgPass = IOUtils.readLines(new FileReader(file));
|
||||
Map<String, String> hashMapping = bfgPass.stream().map((line) -> line.split("\\s", 2)).collect(Collectors.toMap(parts -> parts[0], parts -> parts[1]));
|
||||
|
||||
for (String hash : hashMapping.keySet()) {
|
||||
String rev = commitMapping.inverse().get(hash);
|
||||
if (rev != null) {
|
||||
String newHash = hashMapping.get(hash);
|
||||
System.err.println("Replacing r" + rev + ", was " + hash + ", is " + newHash);
|
||||
commitMapping.replace(rev, newHash);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PrintStream fos = new PrintStream(humanFile);
|
||||
for (Map.Entry<String, String> entry : commitMapping.entrySet()) {
|
||||
String commit = entry.getKey();
|
||||
String target = entry.getValue();
|
||||
|
||||
fos.println(commit + "\t" + target);
|
||||
db.put(Iq80DBFactory.bytes(commit), Iq80DBFactory.bytes(target));
|
||||
}
|
||||
|
||||
db.close();
|
||||
fos.close();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We will use RPC between a client and server to allow the LevelDB database
|
||||
to be kept open and have very light clients that query a running server
|
||||
as they will be executed for each commit. After some tests, opening the
|
||||
database was really time consuming thus this approach, even though the
|
||||
server will do very little.
|
||||
|
||||
The structure of our go project is the following :
|
||||
|
||||
```txt
|
||||
go-gitcommit/client-common:
|
||||
rpc.go
|
||||
|
||||
go-gitcommit/client-insert:
|
||||
insert-mapping.go
|
||||
|
||||
go-gitcommit/client-query:
|
||||
query-mapping.go
|
||||
|
||||
go-gitcommit/server:
|
||||
server.go
|
||||
```
|
||||
|
||||
First, some plumping for the RPC in `rpc.go` :
|
||||
|
||||
```go
|
||||
package Client
|
||||
|
||||
import (
|
||||
"net"
|
||||
"net/rpc"
|
||||
"time"
|
||||
)
|
||||
|
||||
type (
|
||||
// Client -
|
||||
Client struct {
|
||||
connection *rpc.Client
|
||||
}
|
||||
|
||||
// MappingItem is the response from the cache or the item to insert into the cache
|
||||
MappingItem struct {
|
||||
Key string
|
||||
Value string
|
||||
}
|
||||
|
||||
// BulkQuery allows to mass query the DB in one go.
|
||||
BulkQuery []MappingItem
|
||||
)
|
||||
|
||||
// NewClient -
|
||||
func NewClient(dsn string, timeout time.Duration) (*Client, error) {
|
||||
connection, err := net.DialTimeout("tcp", dsn, timeout)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return &Client{connection: rpc.NewClient(connection)}, nil
|
||||
}
|
||||
|
||||
// InsertMapping -
|
||||
func (c *Client) InsertMapping(item MappingItem) (bool, error) {
|
||||
var ack bool
|
||||
err := c.connection.Call("RPC.InsertMapping", item, &ack)
|
||||
return ack, err
|
||||
}
|
||||
|
||||
// GetMapping -
|
||||
func (c *Client) GetMapping(bulk BulkQuery) (BulkQuery, error) {
|
||||
var bulkResponse BulkQuery
|
||||
err := c.connection.Call("RPC.GetMapping", bulk, &bulkResponse)
|
||||
return bulkResponse, err
|
||||
}
|
||||
```
|
||||
|
||||
Next the Golang server that will read this database in `server.go` :
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"net"
|
||||
"net/rpc"
|
||||
"os"
|
||||
"time"
|
||||
|
||||
"github.com/syndtr/goleveldb/leveldb"
|
||||
|
||||
Client "../client-common"
|
||||
)
|
||||
|
||||
var (
|
||||
cacheDBPath = os.Getenv("HOME") + "/mapping-db"
|
||||
|
||||
cacheDB *leveldb.DB
|
||||
flowMap map[string]string
|
||||
|
||||
f *os.File
|
||||
g *os.File
|
||||
)
|
||||
|
||||
type (
|
||||
// RPC is the base class of our RPC system
|
||||
RPC struct {
|
||||
}
|
||||
)
|
||||
|
||||
func main() {
|
||||
var cacheDBerr error
|
||||
|
||||
cacheDB, cacheDBerr = leveldb.OpenFile(cacheDBPath, nil)
|
||||
if cacheDBerr != nil {
|
||||
fmt.Fprintln(os.Stderr, "Unable to initialize the LevelDB cache.")
|
||||
log.Fatal(cacheDBerr)
|
||||
}
|
||||
|
||||
roErr := cacheDB.SetReadOnly()
|
||||
if roErr != nil {
|
||||
fmt.Fprintln(os.Stderr, "Unable to initialize the LevelDB cache.")
|
||||
log.Fatal(roErr)
|
||||
}
|
||||
|
||||
flowMap = make(map[string]string)
|
||||
|
||||
f, _ = os.Create(os.Getenv("HOME") + "/go-server/gomapping.log")
|
||||
defer f.Close()
|
||||
g, _ = os.Create(os.Getenv("HOME") + "/go-server/gomapping.ins")
|
||||
defer g.Close()
|
||||
|
||||
rpc.Register(NewRPC())
|
||||
|
||||
l, e := net.Listen("tcp", ":9876")
|
||||
if e != nil {
|
||||
log.Fatal("listen error:", e)
|
||||
}
|
||||
|
||||
go flushLog()
|
||||
|
||||
rpc.Accept(l)
|
||||
}
|
||||
|
||||
func flushLog() {
|
||||
for {
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
f.Sync()
|
||||
}
|
||||
}
|
||||
|
||||
// NewRPC -
|
||||
func NewRPC() *RPC {
|
||||
return &RPC{}
|
||||
}
|
||||
|
||||
// InsertMapping -
|
||||
func (r *RPC) InsertMapping(mappingItem Client.MappingItem, ack *bool) error {
|
||||
old := mappingItem.Key
|
||||
new := mappingItem.Value
|
||||
|
||||
flowMap[old] = new
|
||||
|
||||
g.WriteString(fmt.Sprintf("Inserted mapping %s -> %s\n", old, new))
|
||||
|
||||
*ack = true
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetMapping -
|
||||
func (r *RPC) GetMapping(bulkQuery Client.BulkQuery, resp *Client.BulkQuery) error {
|
||||
for i := range bulkQuery {
|
||||
key := bulkQuery[i].Key
|
||||
|
||||
response, _ := cacheDB.Get([]byte(key), nil)
|
||||
|
||||
gitCommit := key
|
||||
if response != nil {
|
||||
responseStr := string(response[:])
|
||||
responseUpdated := flowMap[responseStr]
|
||||
if responseUpdated != "" {
|
||||
gitCommit = string(responseUpdated[:])[:12] + "(" + key + ")"
|
||||
|
||||
f.WriteString(fmt.Sprintf("Response to mapping %s -> %s\n", bulkQuery[i].Key, gitCommit))
|
||||
} else {
|
||||
f.WriteString(fmt.Sprintf("No git mapping for entry %s\n", responseStr))
|
||||
}
|
||||
} else {
|
||||
f.WriteString(fmt.Sprintf("Unknown revision %s\n", key))
|
||||
}
|
||||
|
||||
bulkQuery[i].Value = gitCommit
|
||||
}
|
||||
|
||||
*resp = bulkQuery
|
||||
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
And finally our clients. The insert client will be called from `git filter-branch`
|
||||
with the previous and current commit hashes after processing each commit. We
|
||||
store this information into the database so that the hashes are correct when
|
||||
mapping a revision. The code goes into `insert-mapping.go` :
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"os"
|
||||
"time"
|
||||
|
||||
Client "../client-common"
|
||||
)
|
||||
|
||||
func main() {
|
||||
old := os.Args[1]
|
||||
new := os.Args[2]
|
||||
|
||||
rpcClient, err := Client.NewClient("localhost:9876", time.Millisecond*500)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
mappingItem := Client.MappingItem{
|
||||
Key: old,
|
||||
Value: new,
|
||||
}
|
||||
|
||||
ack, err := rpcClient.InsertMapping(mappingItem)
|
||||
if err != nil || !ack {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
fmt.Println(new)
|
||||
}
|
||||
```
|
||||
|
||||
The query client will receive the commit message for each commit, check
|
||||
whether it contains a `r` mapping and query the server for a hash for this
|
||||
commit. It goes into `query-mapping.go` :
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"log"
|
||||
"os"
|
||||
"regexp"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
client "../client-common"
|
||||
)
|
||||
|
||||
func main() {
|
||||
reader := bufio.NewReader(os.Stdin)
|
||||
text, _ := reader.ReadString('\n')
|
||||
|
||||
re := regexp.MustCompile(`\Wr[0-9]+`)
|
||||
matches := re.FindAllString(text, -1)
|
||||
|
||||
if matches == nil {
|
||||
fmt.Print(text)
|
||||
return
|
||||
}
|
||||
|
||||
rpcClient, err := client.NewClient("localhost:9876", time.Millisecond*500)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
var bulkQuery client.BulkQuery
|
||||
|
||||
for i := range matches {
|
||||
if matches[i][0] != '-' {
|
||||
key := matches[i][1:]
|
||||
bulkQuery = append(bulkQuery, client.MappingItem{Key: key})
|
||||
}
|
||||
}
|
||||
|
||||
gitCommits, _ := rpcClient.GetMapping(bulkQuery)
|
||||
|
||||
for i := range gitCommits {
|
||||
gitCommit := gitCommits[i].Value
|
||||
key := gitCommits[i].Key
|
||||
|
||||
text = strings.Replace(text, key, gitCommit, 1)
|
||||
}
|
||||
|
||||
fmt.Print(text)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
For this step, we will need to first compile and execute the Java program.
|
||||
Once it succeeded in creating the database, we will compile and execute
|
||||
the Go server in the background.
|
||||
|
||||
Then, we can launch `git filter-branch` on our repository to rewrite the
|
||||
history :
|
||||
|
||||
```sh
|
||||
$ git filter-branch \
|
||||
--commit-filter 'NEW=`git_commit_non_empty_tree "$@"`; \
|
||||
${HOME}/migration-tools/go-gitcommit/client-insert/client-insert $GIT_COMMIT $NEW' \
|
||||
--msg-filter "${HOME}/migration-tools/go-gitcommit/client-query/client-query" \
|
||||
-- --all --author-date-order
|
||||
```
|
||||
|
||||
As after each step, we will generate a snapshot, even though it should be
|
||||
the last step that cannot be repeated easily.
|
||||
|
||||
```
|
||||
btrfs subvolume snaphost -r repositories repositories/snap-3-mapping
|
||||
```
|
||||
|
||||
We now clean the repository that should contain a lot of unused blobs,
|
||||
branches, commits, ...
|
||||
|
||||
```sh
|
||||
$ git reflog expire --expire=now --all
|
||||
$ git prune --expire=now --progress
|
||||
$ git repack -adf --window-memory=512m
|
||||
```
|
||||
|
||||
We now have a repository that should be more or less clean. You will have
|
||||
to check the history, the size of the blobs and whether some branches can
|
||||
still be deleted before pushing it to your server.
|
Loading…
Add table
Reference in a new issue