StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POBest way to aggregate data from NDB datastore?
primarykey
Id
18371461
data
AcceptedAnswerId
0
AnswerCount
3
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2013-08-22T04:00:31.510
FavoriteCount
0
LastActivityDate
2013-08-22T21:11:00.610
LastEditDate
2013-08-22T04:19:29.397
LastEditorUserId
159685
OwnerUserId
159685
ParentId
0
PostTypeId
1
Score
2
ViewCount
652
LastEditorDisplayName
text
Body
I have a <code>StatisticStore</code> model defined as: <pre><code>class StatisticStore(ndb.Model): user = ndb.KeyProperty(kind=User) created = ndb.DateTimeProperty(auto_now_add=True) kind = ndb.StringProperty() properties = ndb.PickleProperty() @classmethod def top_links(cls, user, start_date, end_date): ''' returns the user's top links for the given date range e.g. {'http://stackoverflow.com': 30, 'http://google.com': 10, 'http://yahoo.com': 15} ''' stats = cls.query( cls.user == user.key, cls.created >= start_date, cls.created <= end_date, cls.kind == 'link_visited' ) links_dict = {} # generate links_dict from stats # keys are from the 'properties' property return links_dict </code></pre> I want to have an <code>AggregateStatisticStore</code> model which stores the aggregate of <code>StatisticStore</code> per day. It could be generated once a day. Something like: <pre><code>class AggregateStatisticStore(ndb.Model): user = ndb.KeyProperty(kind=User) date = ndb.DateProperty() kinds_count = ndb.PickleProperty() top_links = ndb.PickleProperty() </code></pre> So that the following would be true: <pre><code>start = datetime.datetime(2013, 8, 22, 0, 0, 0) end = datetime.datetime(2013, 8, 22, 23, 59, 59) aug22stats = StatisticStore.query( StatisticStore.user == user, StatisticStore.kind == 'link_visited', StatisticStore.created >= start, StatisticStore.created <= end ).count() aug22toplinks = StatisticStore.top_links(user, start, end) aggregated_aug22stats = AggregateStatisticStore.query( AggregateStatisticStore.user == user, AggregateStatisticStore.date == start.date() ) aug22stats == aggregated_aug22stats.kinds_count['link_visited'] aug22toplinks == aggregated_aug22stats.top_links </code></pre> I was thinking of just running a cronjob with the taskqueue API. The task would generate the <code>AggregateStatisticStore</code> of each day. But I was worried it might run into memory issues? Seeing as <code>StatisticStore</code> could have a lot of records per user. Also, the <code>top_links</code> property kind of complicates things a bit. I'm not sure yet if having a property for it in the aggregate model is the best way. Any suggestion for that property would be great. Ultimately I only want to have a record for <code>StatisticStore</code> up to only ~30 days ago. If the record is older than 30 days, it should be aggregated (and then deleted). To save on space and to improve query times for visualization. EDIT: How about every time a <code>StatisticStore</code> is recorded, it creates/updates the appropriate <code>AggregateStatisticStore</code> record. That way, all the cronjob has to do is cleanup. Thoughts?
Tags
<python><google-app-engine><app-engine-ndb>
Title
Best way to aggregate data from NDB datastore?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USjohn2x
UserOwnerUserId
1. USjohn2x
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COhave you looked into the mapreduce api? Updating AgregateStatisticStore when you update a StatistcStore is probably a better idea though. You probably want to shard AgregateStatisticStore though, but that may depend on your perf requirements. If the StatisticStore aren't updated frequently for a given user, you might not need to shard it.
 singulars
 PostPostId
 POBest way to aggregate data from NDB datastore?
 UserUserId
 USdragonx
2. COYes, I've looked at mapreduce, but I'm having a hard time grokking it. What do you mean by "sharding" AggregateStatisticStore?
 singulars
 PostPostId
 POBest way to aggregate data from NDB datastore?
 UserUserId
 USjohn2x

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.