Note that there are some explanatory texts on larger screens.

plurals
  1. POScraping and Parsing a Wikipedia Page
    primarykey
    data
    text
    <p>I'm wondering if there are any existing libraries in or accessible from Objective-C that would allow me to scrape pages formatted like <a href="http://en.wikipedia.org/wiki/October_27" rel="nofollow noreferrer">this one</a>. Specifically, all of the dates and all of the text next to each date. If not, what would be the best way to go about doing this? Regular expressions? I heard that <code>NSString</code> might already have built-in methods for this. Is this true?</p> <p>I was looking around to see if there were any alternative to scraping, such as an XML file or API. I did find an API but the only clients I see available are in other languages and they seem to just be able to post content to pages, not retrieve it.</p> <p><strong>EDIT</strong>: So I found more information regarding the API at these links:</p> <ul> <li><a href="http://en.wikipedia.org/w/api.php" rel="nofollow noreferrer">MediaWiki API</a></li> <li><a href="http://www.mediawiki.org/wiki/API:Query" rel="nofollow noreferrer">API:Query</a></li> </ul> <p>And I was able to come up with <a href="http://en.wikipedia.org/w/api.php?action=parse&amp;page=October_27" rel="nofollow noreferrer">this request</a> which returns some HTML encoded text (Well the format is XML, but it includes the page's text such as <code>&amp;raquo;a href=</code> etc. I'll keep looking through the docs to see if I can make this come out a bit better, if not though, are there any recommendations on parsing this?</p> <p><strong>EDIT 2</strong>: Alright so thanks to <a href="http://www.mediawiki.org/wiki/Manual:Parameters_to_index.php#Raw" rel="nofollow noreferrer">this doc page</a>, the simplest and cleanest way I've been able to retrieve the data is using this <a href="http://en.wikipedia.org/w/index.php?title=October_27&amp;action=raw&amp;section=1" rel="nofollow noreferrer">constructed link</a> which returns the raw data (<em>In wiki markup</em>) of the relevant section. However, I guess I would then need to parse that, though if that really is the case, it should be a lot easier than the entire article.</p> <p>Does anyone have any recommendations on parsing wiki markup such as the following in Objective-C?</p> <pre><code>==Events== * [[710]] &amp;ndash; [[Saracen]] invasion of [[Sardinia]]. *[[1275]] &amp;ndash; Traditional founding of the city of [[Amsterdam]]. *[[1682]] &amp;ndash; [[Philadelphia]], [[Pennsylvania]] is founded. </code></pre> <p>What I want to end up having is, I guess an <code>NSDictionary</code> or similar collection that will store the date with the accompanying snippet of information. Thanks!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload