Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The urllib2 library uses OpenerDirector objects to handle the actual opening. Fortunately, the python library provides defaults so you don't have to. It is, however, these OpenerDirector objects that are adding the extra headers.</p> <p>To see what they are after the request has been sent (so that you can log it, for example):</p> <pre><code>req = urllib2.Request(url='http://google.com') response = urllib2.urlopen(req) print req.unredirected_hdrs (produces {'Host': 'google.com', 'User-agent': 'Python-urllib/2.5'} etc) </code></pre> <p>The unredirected_hdrs is where the OpenerDirectors dump their extra headers. Simply looking at <code>req.headers</code> will show only your own headers - the library leaves those unmolested for you.</p> <p>If you need to see the headers before you send the request, you'll need to subclass the OpenerDirector in order to intercept the transmission.</p> <p>Hope that helps.</p> <p>EDIT: I forgot to mention that, once the request as been sent, <code>req.header_items()</code> will give you a list of tuples of ALL the headers, with both your own and the ones added by the OpenerDirector. I should have mentioned this first since it's the most straightforward :-) Sorry.</p> <p>EDIT 2: After your question about an example for defining your own handler, here's the sample I came up with. The concern in any monkeying with the request chain is that we need to be sure that the handler is safe for multiple requests, which is why I'm uncomfortable just replacing the definition of putheader on the HTTPConnection class directly.</p> <p>Sadly, because the internals of HTTPConnection and the AbstractHTTPHandler are very internal, we have to reproduce much of the code from the python library to inject our custom behaviour. Assuming I've not goofed below and this works as well as it did in my 5 minutes of testing, please be careful to revisit this override if you update your Python version to a revision number (ie: 2.5.x to 2.5.y or 2.5 to 2.6, etc).</p> <p>I should therefore mention that I am on Python 2.5.1. If you have 2.6 or, particularly, 3.0, you may need to adjust this accordingly.</p> <p>Please let me know if this doesn't work. I'm having waaaayyyy too much fun with this question:</p> <pre><code>import urllib2 import httplib import socket class CustomHTTPConnection(httplib.HTTPConnection): def __init__(self, *args, **kwargs): httplib.HTTPConnection.__init__(self, *args, **kwargs) self.stored_headers = [] def putheader(self, header, value): self.stored_headers.append((header, value)) httplib.HTTPConnection.putheader(self, header, value) class HTTPCaptureHeaderHandler(urllib2.AbstractHTTPHandler): def http_open(self, req): return self.do_open(CustomHTTPConnection, req) http_request = urllib2.AbstractHTTPHandler.do_request_ def do_open(self, http_class, req): # All code here lifted directly from the python library host = req.get_host() if not host: raise URLError('no host given') h = http_class(host) # will parse host:port h.set_debuglevel(self._debuglevel) headers = dict(req.headers) headers.update(req.unredirected_hdrs) headers["Connection"] = "close" headers = dict( (name.title(), val) for name, val in headers.items()) try: h.request(req.get_method(), req.get_selector(), req.data, headers) r = h.getresponse() except socket.error, err: # XXX what error? raise urllib2.URLError(err) r.recv = r.read fp = socket._fileobject(r, close=True) resp = urllib2.addinfourl(fp, r.msg, req.get_full_url()) resp.code = r.status resp.msg = r.reason # This is the line we're adding req.all_sent_headers = h.stored_headers return resp my_handler = HTTPCaptureHeaderHandler() opener = urllib2.OpenerDirector() opener.add_handler(my_handler) req = urllib2.Request(url='http://www.google.com') resp = opener.open(req) print req.all_sent_headers shows: [('Accept-Encoding', 'identity'), ('Host', 'www.google.com'), ('Connection', 'close'), ('User-Agent', 'Python-urllib/2.5')] </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload