Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>As an introduction to the world of screen-scraping, you've picked a very hard case! Canada post's lookup page works like this:</p> <ol> <li>the first page is a form which accepts the address values</li> <li>this page POSTs to a second URL.</li> <li>that second URL in turn redirects (using an HTTP 302 redirect) to a third URL which actually shows you the HTML response containing the postal code. </li> </ol> <p>Making matters worse, the page in step #3 needs to know the cookie set in step #1. So you need to use the same <a href="http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx" rel="nofollow noreferrer"><code>CookieContainer</code></a> for all three requests (although it may possibly be sufficient to send the same <a href="http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx" rel="nofollow noreferrer"><code>CookieContainer</code></a> to #2 and #3 only). </p> <p>Furthermore, you may need to send additional HTTP headers in these requests as well, like Accept. I suspect where you're running into problems is that HttpWebRequest by default handles redirect transparently for you-- but when it transparently redirects it may not add the right HTTP headers necessary to impersonate a browser.</p> <p>The solution is to set the <code>HttpWebRequest</code>'s <code>AllowAutoRedirect</code> property to false, and handle the redirect yourself. In other words, once the first request returns a redirection, you'll need to pull out the URL in the <code>HttpWebResponse</code>'s <code>Location:</code> header. Then you'll need to create a new <code>HttpWebRequest</code> (this time a regular GET request, not a POST) for that URL. Remeber to send the same cookie! (the <a href="http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx" rel="nofollow noreferrer"><code>CookieContainer</code></a> class makes this very easy)</p> <p>You also may need to make an additional request (#1 in my list above) in order to set up the session cookie. If I were you, I'd assume that this is required, simply to eliminate it as a problem, and try removing that step later and see if your solution still works.</p> <p>You'll want to download and use Fiddler (<a href="http://www.fiddlertool.com" rel="nofollow noreferrer">www.fiddlertool.com</a>) to help you with all this. Fiddler allows you to watch the HTTP requests going over the wire, and allows you (via the request builder feature) allows you to create HTTP requests so you can see which headers are actually required. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload