Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As Grant says, you could combine wget with revision control for space-savings. I am actually trying to write a script to do this for my usual browsing since I don't trust the Internet Archive or WebCite to be around indefinitely (and they are not very searchable).</p> <p>The script would go something like this: cd to directory; invoke the correct <code>wget --mirror</code> command or whatever; run <code>darcs add $(find .)</code> to check into the repository any new files; then <code>darcs record --all</code>.</p> <p>Wget ought to overwrite any changed files with the updated version; darcs add will record any new files/directories; darcs record will save the changes.</p> <p>To get the view as of date X, you simply pull from your repo all patches up to date X.</p> <p>You don't store indefinitely many duplicate copies because DVCSs don't save history unless there's actual changes to file content. You will get 'garbage' in the sense of pages changing to no longer require CSS or JS or images you previously downloaded, but you could just periodically delete everything and record that as a patch, and the next wget invocation will only pull in what is needed for the latest version of a webpage. (And you can still do full-text search, just now you search the history rather than the files on-disk.)</p> <p>(If there are big media files being downloaded, you can toss in something like <code>rm $(find . -size +2M)</code> to delete them before they get <code>darcs add</code>ed.)</p> <p>EDIT: I wound up not bothering with explicit version control, but letting wget create duplicates and occasionally weeding them with <code>fdupes</code>. See <a href="http://www.gwern.net/Archiving%20URLs" rel="nofollow">http://www.gwern.net/Archiving%20URLs</a></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload