Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Regex is a bad idea for parsing HTML. It's cryptic to read and relies of well-formed HTML.</p> <p>Try <a href="http://www.crummy.com/software/BeautifulSoup/" rel="noreferrer">BeautifulSoup</a> for Python. Here's an example script that returns URLs from the first 10 pages of a site:domain.com Google query.</p> <pre><code>import sys # Used to add the BeautifulSoup folder the import path import urllib2 # Used to read the html document if __name__ == "__main__": ### Import Beautiful Soup ### Here, I have the BeautifulSoup folder in the level of this Python script ### So I need to tell Python where to look. sys.path.append("./BeautifulSoup") from BeautifulSoup import BeautifulSoup ### Create opener with Google-friendly user agent opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] ### Open page &amp; generate soup ### the "start" variable will be used to iterate through 10 pages. for start in range(0,10): url = "http://www.google.com/search?q=site:stackoverflow.com&amp;start=" + str(start*10) page = opener.open(url) soup = BeautifulSoup(page) ### Parse and find ### Looks like google contains URLs in &lt;cite&gt; tags. ### So for each cite tag on each page (10), print its contents (url) for cite in soup.findAll('cite'): print cite.text </code></pre> <p>Output:</p> <pre><code>stackoverflow.com/ stackoverflow.com/questions stackoverflow.com/unanswered stackoverflow.com/users meta.stackoverflow.com/ blog.stackoverflow.com/ chat.meta.stackoverflow.com/ ... </code></pre> <p>Of course, you could append each result to a list so you can parse it for subdomains. I just got into Python and scraping a few days ago, but this should get you started.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload