Note that there are some explanatory texts on larger screens.

plurals
  1. POurlopen.request with an umlaut in the url
    primarykey
    data
    text
    <p>i want to scrape a website with a german umlaut in the url. Here is my code in python 3.3, that works very fine without any umlauts.</p> <pre><code>def numResults(keyword): try: page_google = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&amp;q=' +keyword print(page_google) req_google = Request(page_google) req_google.add_header('User Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1') html_google = urlopen(req_google).read() soup = BeautifulSoup(html_google) except URLError as e: print(e) return soup </code></pre> <p>But when i request something like:</p> <pre><code>print(numResults('älterer')) </code></pre> <p>i get the following error because urllib cannot handle the umlaut i guess:</p> <pre><code>Traceback (most recent call last): File "C:\Users\zwieback86\Desktop\programming\scrape.py", line 137, in &lt;module&gt; print(numResults('älterer')) File "C:\Users\zwieback86\Desktop\programming\scrape.py", line 73, in numResults html_google = urlopen(req_google).read() File "c:\python33\lib\urllib\request.py", line 156, in urlopen return opener.open(url, data, timeout) File "c:\python33\lib\urllib\request.py", line 469, in open response = self._open(req, data) File "c:\python33\lib\urllib\request.py", line 487, in _open '_open', req) File "c:\python33\lib\urllib\request.py", line 447, in _call_chain result = func(*args) File "c:\python33\lib\urllib\request.py", line 1268, in http_open return self.do_open(http.client.HTTPConnection, req) File "c:\python33\lib\urllib\request.py", line 1248, in do_open h.request(req.get_method(), req.selector, req.data, headers) File "c:\python33\lib\http\client.py", line 1061, in request self._send_request(method, url, body, headers) File "c:\python33\lib\http\client.py", line 1089, in _send_request self.putrequest(method, url, **skips) File "c:\python33\lib\http\client.py", line 953, in putrequest self._output(request.encode('ascii')) UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 38: ordinal not in range(128) </code></pre> <p>When i type in the adress "<a href="http://ajax.googleapis.com/ajax/services/search/web?v=1.0&amp;q" rel="nofollow">http://ajax.googleapis.com/ajax/services/search/web?v=1.0&amp;q</a>=älterer" into the browser i get the wanted page.</p> <p>So i assume urllib cannot handle requests with umlauts in the url. But how can i fix it that it will accept the german umlauts? To change the umlauts like ä -> ae is not an option.</p> <p>Many thanks and regards!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload