StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSeparating previously conjoined code into multiple git repositories
text
Body
copied!<p>This question sounds similar to many posed here, but it's obnoxiously different.</p> <p>I have an git repository that was once an svn repository (that was once a cvs repository). This contains data going back to about 1999.</p> <p>The time has come to split this one repository in to several different repositories, preserving all of this rich history. However, the structure of the repository has changed frequently. All current projects came from a base project, which grew to a few projects, which shrunk to two projects, and then grew again. Code has been moved around but was never duplicated; it has now all found a final resting place in one of several mature projects.</p> <p>This makes splitting the repositories very hard if I want to preserve the history. Using git-filter-branch seems like the right approach, but all of these seem to hack off parts of the repository and truncate history with them.</p> <p><strong>EDIT ADDED</strong> To clarify, here's a small example, pretending I'm in the root of the repository. Let's say the repository looks like this:</p> <pre><code>foo/ bar/ file.txt baz/ </code></pre> <p>Now let's say I edit the contents of <code>file.txt</code>. Then I rename it to <code>newfile.txt</code>. Then I edit the contents again. Then I move this file out of <code>bar/</code> and into <code>baz/</code>. My repository now looks like this:</p> <pre><code>foo/ bar/ baz/ newfile.txt </code></pre> <p>Ok, now let's say I want to split <code>baz/</code> out into its own repository. Using git filter-branch or using git subtree split will lose all commit messages and history for <code>newfile.txt</code> back when it was inside <code>bar/</code> and when it was named <code>file.txt</code>.</p> <p>I understand that checking out a historical revision might be crazy; it might reference something called <code>../bar/</code> or it might reference an invalid directory that doesn't exist and fail spectacularly. I don't care as long as I can look at the file contents at any particular revision.</p> <p><strong>END EDIT</strong></p> <p>It seems like there are two paths for what I want to do:</p> <ol> <li><p>Clone the repository N times, preserve the folders that I want in that repository (via git rm-ing other folders), and somehow hack off any revisions that do not eventually reference files that are in the HEAD. I realize this will have a few negative side effects, in that checking out old revisions will not provide a meaningful code base - I don't care. In order to do this I'd need to find a way to get all paths that descend from all files that exist in HEAD, which I could do with an ugly script.</p></li> <li><p>Build some sort of history index of what the repository looked like during each index. Use a tree filter and chop off files that aren't matched in their respective revision. Then, delete the files that don't appear in or descend from files in HEAD.</p></li> </ol> <p>Is it possible to find all files that don't appear in HEAD and remove any history pertaining to them? I don't care about resurrecting files that have been long deleted, and this seems to be at the crux of my issue.</p> <p>Alternative solutions would also be appreciated. I'm relatively new to git, so I'm probably missing something obvious.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload