Note that there are some explanatory texts on larger screens.

plurals
  1. POUsing Generics to accomplish an HTML scraper. Right or Wrong?
    primarykey
    data
    text
    <p>My requirement is to download and scrape various HTML pages, extracting lists of Objects from the code on the page depending on what object type we are looking for on that page. Eg one page might contain an embedded list of doctors surgeries, another might contain a list of primary trusts etc. I have to view the pages one by one and end up with lists of the appropriate object types. </p> <p>The way I have chosen to do this is to have a Generic class called <code>HTMLParser&lt;T&gt; where T : IEntity, new()</code></p> <p><code>IEntity</code> is the interface that all the object types that can be scraped will implement, though I haven't figured out yet what the interface members will be. </p> <p>So you will effectively be able to say</p> <pre><code>HTMLParser&lt;Surgery&gt; parser = new HTMLParser&lt;Surgery&gt;(URL, XSD SCHEMA DOC); IList&lt;Surgery&gt; results = parser.Parse(); </code></pre> <p><code>Parse()</code> will validate that the HTML string downloaded from the URL contains a block that conforms to the XSD document provided, then will somehow use this template to extract a <code>List&lt;Surgery&gt;</code> of Surgery objects, each one corresponding to an XML block in the HTML string.</p> <p>The problems I have are</p> <ol> <li><p>Im not sure how to specify the template for each object type in a nice way, other than <code>HTMLParser&lt;Surgery&gt; parser = new HTMLParser&lt;Surgery&gt;(new URI("...."), Surgery.Template);</code> which is a bit clunky. Can anyone suggest a better way using .NET 3.0/4.0?</p></li> <li><p>Im not sure how in a Generic way I can take the HTML string, take an XSD or XML template document, and return a generic list of constructed objects of the Generic Type. Can anyone suggest on how to do this? </p></li> <li><p>Finally, I'm not convinced generics are the right solution to this problem as it's starting to seem very convoluted. Would you agree with or condemn my choice of solution here and if not, what would you do instead?</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload