Note that there are some explanatory texts on larger screens.

plurals
  1. POBest way to aggregate data from NDB datastore?
    primarykey
    data
    text
    <p>I have a <code>StatisticStore</code> model defined as:</p> <pre><code>class StatisticStore(ndb.Model): user = ndb.KeyProperty(kind=User) created = ndb.DateTimeProperty(auto_now_add=True) kind = ndb.StringProperty() properties = ndb.PickleProperty() @classmethod def top_links(cls, user, start_date, end_date): ''' returns the user's top links for the given date range e.g. {'http://stackoverflow.com': 30, 'http://google.com': 10, 'http://yahoo.com': 15} ''' stats = cls.query( cls.user == user.key, cls.created &gt;= start_date, cls.created &lt;= end_date, cls.kind == 'link_visited' ) links_dict = {} # generate links_dict from stats # keys are from the 'properties' property return links_dict </code></pre> <p>I want to have an <code>AggregateStatisticStore</code> model which stores the aggregate of <code>StatisticStore</code> per day. It could be generated once a day. Something like:</p> <pre><code>class AggregateStatisticStore(ndb.Model): user = ndb.KeyProperty(kind=User) date = ndb.DateProperty() kinds_count = ndb.PickleProperty() top_links = ndb.PickleProperty() </code></pre> <p>So that the following would be true:</p> <pre><code>start = datetime.datetime(2013, 8, 22, 0, 0, 0) end = datetime.datetime(2013, 8, 22, 23, 59, 59) aug22stats = StatisticStore.query( StatisticStore.user == user, StatisticStore.kind == 'link_visited', StatisticStore.created &gt;= start, StatisticStore.created &lt;= end ).count() aug22toplinks = StatisticStore.top_links(user, start, end) aggregated_aug22stats = AggregateStatisticStore.query( AggregateStatisticStore.user == user, AggregateStatisticStore.date == start.date() ) aug22stats == aggregated_aug22stats.kinds_count['link_visited'] aug22toplinks == aggregated_aug22stats.top_links </code></pre> <p>I was thinking of just running a cronjob with the taskqueue API. The task would generate the <code>AggregateStatisticStore</code> of each day. But I was worried it might run into memory issues? Seeing as <code>StatisticStore</code> could have a lot of records per user.</p> <p>Also, the <code>top_links</code> property kind of complicates things a bit. I'm not sure yet if having a property for it in the aggregate model is the best way. Any suggestion for that property would be great.</p> <p>Ultimately I only want to have a record for <code>StatisticStore</code> up to only ~30 days ago. If the record is older than 30 days, it should be aggregated (and then deleted). To save on space and to improve query times for visualization.</p> <p><strong>EDIT:</strong> How about every time a <code>StatisticStore</code> is recorded, it creates/updates the appropriate <code>AggregateStatisticStore</code> record. That way, all the cronjob has to do is cleanup. Thoughts?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload