Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You can parse at least the USPTO using any XML parsing tool such as the lxml python module.</p> <p>There is a great paper on doing just this by Gabe Fierro, available here: <a href="http://funginstitute.berkeley.edu/wp-content/uploads/2013/06/Extracting_and_Formatting.pdf" rel="nofollow">Extracting and Formatting Patent Data from USPTO XML</a> (no paywall)</p> <p>Gabe also participated in some useful discussion on doing this <a href="https://groups.google.com/forum/#!msg/disambiguation/hm5vsUqhiFo/wh2sdOyaNswJ" rel="nofollow">here on this google group</a>.</p> <p>Finally, if you know what you're looking for and have plenty of disk space you can also get the bulk data stored locally for processing. USPTO bulk downloads <a href="http://www.google.com/googlebooks/uspto-patents-grants-text.html" rel="nofollow">here</a>.</p> <p>Any more specific questions please let me know! I've trod some of this ground before :)</p> <p>Also, the Google Patent search API is deprecated but you can now do those same searches through the main Google search API using URL tags (I don't have them handy but you can find them with a search via Google patents which will be responded to by google.com).</p> <p>UPDATE: At home now, the flag you want to use the google custom search API for patent searching is &amp;tbm=pts - please note that the google custom search engine and getting a code for same is hugely beneficial for patent searching because the JSON delivered has a nice data structure with patent-specific fields.</p> <p>Example Code:</p> <pre><code>import requests import urllib import time import json access_token = &lt;get yours by signing up for google custom search engine api&gt; cse_id = &lt;get yours by signing up for google custom search engine api&gt; # Build url start=1 search_text = "+(inassignee:\"Altera\" | \"Owner name: Altera\") site:www.google.com/patents/" # &amp;tbm=pts sets you on the patent search url = 'https://www.googleapis.com/customsearch/v1?key='+access_token+'&amp;cx='+cse_id+'&amp;start='+str(start)+'&amp;num=10&amp;tbm=pts&amp;q='+ urllib.quote(search_text) response = requests.get(url) response.json() f = open('Sample_patent_data'+str(int(time.time()))+'.txt', 'w') f.write(json.dumps(response.json(), indent=4)) f.close() </code></pre> <p>This will (once you add the free API access info) grab the first ten entries of patents owned by Altera (as an example) and save the resulting JSON to a text file. Pull up your favorite web JSON editor and take a look at the JSON file. In particular I recommend looking in ['items'][] and the sub ['pagemap']. Just by parsing this JSON you can get titles, thumbnails, snippets, title, link, even citations (when relevant).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload