I’m using Microsoft Windows 2012 R2 with VisualSVN installed (latest version installed 3.6.4 and latest SVN 1.9.7!).
An older repo model was to use the following schema:
The Client name was the repo and sub-folders were used as the projects. Not good. We decided we needed to clean this up and with an eventual migration to Git we wanted the actual repo to be the project instead of the client. In addition, we used sub-folders to complicate matters.
Here’s a model of the folder structure of the repo. This is not the physical file folder structure – this is the logical repository structure. You will use both the logical and physical structures in this process.
/ClientA/Project1 /ClientA/Project1/trunk /ClientA/Project1/trunk/codebase /ClientA/Project1/documents /ClientA/Project2 /ClientA/Project2/trunk /ClientA/Project2/documents The physical filesystem of SVN for the parent repo is: C:\repos\ClientA
Note the use of forward slashes. This will come in handy later. This would normally be a back slash in Windows and you would need the preceding slash. For direct file paths to repos you use the regular windows backslash.
In our case wanted to take /ClientA/Project1/trunk/codebase and turn it into it’s own repo with the root of the repo being the /codebase folder:
/client-project1 c:\repos\client-project1
There are four main steps:
- Dump the ClientA repo using svndump
- Create a new, filtered dump file of only what we need using svndumpfilter
- Clean up the filepaths within the filtered dump file
- Restore the dump to a new repo
Note: there are a number of considerations when using this method. I implore you to read the SVN book. If you have done any repo copying or moving, or want to only dump to or from a specific revision then you will need to take additional steps!
Dump the ClientA repo using svnadmin dump
First, make sure there is no antivirus running. In fact, I would stop the services and set them to manual and reboot. I had killed the services and didn’t see anything running related to the antivirus, but I was still seeing RAM utilization hit 100%!
Open PowerShell and do the following. It will create a dump file of the entire repo. Note the use of the Windows path of the parent repo.
c:\>svnadmin dump c:\repos\client-project1 > c:\client-project1.dump
Create a new, filtered dump file of only what we need using svndumpfilter
Now all we need is that sub-folder. Within that repo dump is also all the revisions of the entire repo we don’t want, so we will clean that up, as well (-drop-empty-revs –renumber-revs). If you don’t do this, you will have a lot of empty revisions. Also, renumbering the revisions from 0 means the repo will be completely orphaned from the parent repo.
We’re going to make a new dump file called client-project1.dump. I broke it into two lines for readability. Note the use of the forward slashes!
c:\>svndumpfilter include /Project1/trunk/codebase -drop-empty-revs --renumber-revs < c:\client-project1.dump > c:\client-project1.dump
This might take a while based on the size of your repo.
Clean up the filepaths within the filtered dump file
Now, within your new dump file you have one more task to perform: cleaning out the original file structure in order to have the parent be the root of the repo.
Take a look at your file using more (a cmd.exe utility similar to head in linux). My cursor in CMD is at c:\.
c:\>more +20 client-project1.dump
There you will find something like this:
Node-path: Project1/trunk/codebase Node-kind: dir Node-action: add Prop-content-length: 10 Content-length: 10
If you were looking at a file in the dump file it would be:
Node-path: Project1/trunk/codebase/readme.txt
We need to replace Project1/trunk/codebase/ with nothing so it will read:
Node-path: readme.txt
Do not delete the space between : and the filename!
Doing the string replace: Updated
See below for using powershell to do this. It takes forever on large files (still useful to know, however). Now I use f.a.r.t.
For whatever reason, the option “–remove” does NOT show in this list. If you exclude that it will not write the changes. Weird. I also couldn’t get the the replace function to work, so I just use –remove. That said, I didn’t try to use “–replace”.
Download it from here.
For a 6 gig file it takes about 45 seconds to run!
Again, my CMD prompt, fart.exe, and the dump file are in c:\ for clarity:
Note: you will need to do TWO replacements since you have paths with and without the following slash.
c:\fart --remove "client-project1.dump"
"Project1/trunk/codebase/"
then do (note the missing / at the end)
c:\fart --remove "client-project1.dump"
"Project1/trunk/codebase"
Call me lazy. I am. This is what I used in powershell. There are a thousand other ways to do it. I tried others (a vbscript, etc.) but the size of the file (6 gigs) kept choking them.
get-content client-project1.dump | %{$_ -replace "string-to-find","replacement-string"}
get-content client-project1.dump | %{$_ -replace "Project1/trunk/codebase/",""}
Go get a cup of coffee or three. This will take a couple hours in large files.
When it’s done, do another more and check your work.
Remove the first instance of Node-path
As annoying as it is, you now need to manually remove some lines in the beginning of the dump file that tell SVN to create an empty folder:
Node-path: Node-kind: dir Node-action: add Prop-content-length: 10 Content-length: 10 PROPS-END
To edit a very large text file, use gvim (you can also find a list of other options on stackexchange).
I used the “gVim easy” link – it will take a while to load, but it has nicer GUI functionality.
Restore the dump to a new repo
The easy part:
svnadmin create c:\repos\client-project1
svnadmin load c:\repos\client-project1 < client-project1.dump
You’re done!
References:
Using PowerShell commands that are like Grep and Sed on *nix platforms:
https://blogs.msdn.microsoft.com/zainnab/2007/07/08/grep-and-sed-with-powershell/
Python script to make this whole process easier. YMMV.
https://github.com/jasperlee108/svndumpfilterIN
Good FART resource:
http://blog.powercram.com/2009/08/windows-command-line-find-and-replace.html