Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I can offer 2 popular solution 1) Google have a search-engine API <a href="https://developers.google.com/products/#google-search" rel="nofollow">https://developers.google.com/products/#google-search</a> (It have restriction on 100 requests per day)</p> <p>cutted code:</p> <pre><code>def gapi_parser(args): query = args.text; count = args.max_sites import config api_key = config.api_key cx = config.cx #Note: This API returns up to the first 100 results only. #https://developers.google.com/custom-search/v1/using_rest?hl=ru-RU#WorkingResults results = []; domains = set(); errors = []; start = 1 while True: req = 'https://www.googleapis.com/customsearch/v1?key={key}&amp;cx={cx}&amp;q={q}&amp;alt=json&amp;start={start}'.format(key=api_key, cx=cx, q=query, start=start) if start&gt;=100: #google API does not can do more break con = urllib2.urlopen(req) if con.getcode()==200: data = con.read() j = json.loads(data) start = int(j['queries']['nextPage'][0]['startIndex']) for item in j['items']: match = re.search('^(https?://)?\w(\w|\.|-)+', item['link']) if match: domain = match.group(0) if domain not in results: results.append(domain) domains.update([domain]) else: errors.append('Can`t recognize domain: %s' % item['link']) if len(domains) &gt;= args.max_sites: break print for error in errors: print error return (results, domains) </code></pre> <p>2) I wrote a selenuim based script what parse a page in real browser instance, but this solution have a some restrictions, for example captcha if you run searches like a robots.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload