StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>If your problem is with waiting for the response from the web request, then the actual engine or technique you use to parse it probably has a lot less to do with performance, than simply waiting for each response from the web synchronously. If you have a long list of pages you're scraping, then you can do better by running simultaneous requests asynchronously. It's not clear that's what is going on though.</p> <p>Try <a href="https://github.com/jamietre/CsQuery" rel="nofollow">CsQuery</a> - also on <a href="http://www.nuget.org/packages/CsQuery" rel="nofollow">NuGet</a> - a new C# port of jQuery which should do what you want. It has methods for grabbing data synchronously and asynchronously, so if you did want to start parallel web requests, it can do that out of the box. At the most basic level though, the code would be this to do it synchronously:</p> <pre><code>CQ doc = CQ.CreateFromUrl("http://www.jquery.com"); string allStuffInsideTag = doc["sometag"].Contents().RenderSelection(); </code></pre> <p>It works like jquery. The "CQ" object is the same as a jQuery object. <code>Contents</code> is the jQuery method to return all children of an element; <code>RenderSelection</code> is a CsQuery method that renders the full HTML of every element in the selection set. So this would return the full text & html of everything inside every <code>sometag</code> block.</p> <p>Also it indexes each document for all common selector types and is much faster than HTML Agility Pack.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload