Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You need a crawler. It's pretty trivial to build one yourself (for small crawls!)</p> <ul> <li>hit the front page (see below **)</li> <li>parse the page, extract the links. For this you need an HTML parser that can handle badly formatted HTML. Try <a href="http://jericho.htmlparser.net/docs/index.html" rel="nofollow">Jericho</a> , <a href="http://ccil.org/~cowan/XML/tagsoup/" rel="nofollow">TagSoup</a> , <a href="http://sourceforge.net/projects/nekohtml/" rel="nofollow">CyberNecko</a> or <a href="http://tidy.sourceforge.net/" rel="nofollow">HtmlTidy</a>. A normal XML parser probably won't cutr the mustard for most HTML pages, as they are not often well formed XML.</li> <li>check for the link you are looking for. If you can not find it add site local links you have not seen before and go back to step 1, repeat.</li> </ul> <p>For a small site (few thousand pages) you can probably do all this in memory. </p> <p>** Use the usual java <a href="http://download.oracle.com/javase/1.4.2/docs/api/java/net/URLConnection.html" rel="nofollow">URLConnection</a> or commons <a href="http://hc.apache.org/httpcomponents-client-ga/" rel="nofollow">HTTPClient</a> (v4) for making the requests. </p> <p><strong>Note: finding your link</strong> - links can exist on a site in absolute, local or resolved to some base href. You'll need to account for this when looking for yours. Easiest is to translate all links to absolute form, taking care to resolve to the current pages base href, if it has one.</p> <p>Simples.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload