Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is the case for pymongo. I have also prototyped using sql server, sqlite, HDF, ORM (SQLAlchemy) in python. First and foremost pymongo is a document based DB, so each person would be a document (<code>dict</code> of attributes). Many people form a collection and you can have many collections (people, stock market, income).</p> <p>pd.dateframe -> pymongo Note: I use the <code>chunksize</code> in <code>read_csv</code> to keep it to 5 to 10k records(pymongo drops the socket if larger)</p> <pre><code>aCollection.insert((a[1].to_dict() for a in df.iterrows())) </code></pre> <p>querying: gt = greater than...</p> <pre><code>pd.DataFrame(list(mongoCollection.find({'anAttribute':{'$gt':2887000, '$lt':2889000}}))) </code></pre> <p><code>.find()</code> returns an iterator so I commonly use <code>ichunked</code> to chop into smaller iterators. </p> <p>How about a join since I normally get 10 data sources to paste together:</p> <pre><code>aJoinDF = pandas.DataFrame(list(mongoCollection.find({'anAttribute':{'$in':Att_Keys}}))) </code></pre> <p>then (in my case sometimes I have to agg on <code>aJoinDF</code> first before its "mergeable".)</p> <pre><code>df = pandas.merge(df, aJoinDF, on=aKey, how='left') </code></pre> <p>And you can then write the new info to your main collection via the update method below. (logical collection vs physical datasources).</p> <pre><code>collection.update({primarykey:foo},{key:change}) </code></pre> <p>On smaller lookups, just denormalize. For example, you have code in the document and you just add the field code text and do a <code>dict</code> lookup as you create documents.</p> <p>Now you have a nice dataset based around a person, you can unleash your logic on each case and make more attributes. Finally you can read into pandas your 3 to memory max key indicators and do pivots/agg/data exploration. This works for me for 3 million records with numbers/big text/categories/codes/floats/...</p> <p>You can also use the two methods built into MongoDB (MapReduce and aggregate framework). <a href="http://docs.mongodb.org/manual/tutorial/aggregation-examples/">See here for more info about the aggregate framework</a>, as it seems to be easier than MapReduce and looks handy for quick aggregate work. Notice I didn't need to define my fields or relations, and I can add items to a document. At the current state of the rapidly changing numpy, pandas, python toolset, MongoDB helps me just get to work :)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload