Note that there are some explanatory texts on larger screens.

plurals
  1. POProblem with eastern european characters when scraping data from the European Parliament Website
    primarykey
    data
    text
    <p>EDIT: thanks a lot for all the answers an points raised. As a novice I am a bit overwhelmed, but it is a great motivation for continuing learning python!!</p> <p>I am trying to scrape a lot of data from the European Parliament website for a research project. The first step is to create a list of all parliamentarians, however due to the many Eastern European names and the accents they use i get a lot of missing entries. Here is an example of what is giving me troubles (notice the accents at the end of the family name):</p> <pre><code>&lt;td class="listcontentlight_left"&gt; &lt;a href="/members/expert/alphaOrder/view.do?language=EN&amp;amp;id=28276" title="ANDRIKIENĖ, Laima Liucija"&gt;ANDRIKIENĖ, Laima Liucija&lt;/a&gt; &lt;br/&gt; Group of the European People's Party (Christian Democrats) &lt;br/&gt; &lt;/td&gt; </code></pre> <p>So far I have been using PyParser and the following code:</p> <pre><code>#parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "&gt;&lt;") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) </code></pre> <p>However this does not catch the name from the html above. Any advice in how to proceed?</p> <p>Best, Thomas</p> <p>P.S: Here is all the code i have so far:</p> <pre><code># -*- coding: utf-8 -*- import urllib.request from pyparsing_py3 import * page = urllib.request.urlopen("http://www.europarl.europa.eu/members/expert/alphaOrder.do?letter=B&amp;language=EN") page = page.read().decode("utf8") #parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "&gt;&lt;") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload