StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POR: How to represent a table augmented by arbitrary key/value pairs for each row?
primarykey
Id
19203741
data
AcceptedAnswerId
19209087
AnswerCount
1
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-10-05T23:22:08.330
FavoriteCount
0
LastActivityDate
2016-03-14T14:26:27.110
LastEditDate
2016-03-14T14:26:27.110
LastEditorUserId
322912
OwnerUserId
300347
ParentId
0
PostTypeId
1
Score
0
ViewCount
478
LastEditorDisplayName
text
Body
This is a newbie R question. I am beginning to explore the use of R for website analytics. I have a set of page view events which have common properties along with an arbitrary set of properties that depend on the page. For instance, all events will have a <code>userId</code>, <code>createdAt</code>, and <code>pageId</code>, but the <code>"signup"</code> page might have a special property <code>origin</code> whose value could be <code>"adwords"</code> or <code>"organic"</code>, etc. In JSON, the data might look like this: <pre><code>[ { "userId":null, "pageId":"home", "sessionId":"abcd", "createdAt":1381013741, "parameters":{}, }, { "userId":123, "pageId":"signup", "sessionId":"abcd", "createdAt":1381013787, "parameters":{ "origin":"adwords", "campaignId":4 } } ] </code></pre> I have been struggling to represent this data in R data structures effectively. In particular I need to be able to subset the event list by conditions based on the arbitrary key/value pairs, for instance, select all events whose <code>pageId=="signup"</code> and <code>origin=="adwords"</code>. There is enough diversity in the keys used for the arbitrary parameters that it seems unreasonable to create sparsely-populated columns for every possible key. What I'm currently doing is pre-processing the data into two CSV files, <code>core_properties.csv</code> and <code>parameters.csv</code>, in the form: <pre><code># core_properties.csv (one record per pageview) userId,pageId,sessionId,createdAt ,home,abcd 123,signup,abcd,1381013741 ... # parameters.csv (one record per k/v pair) row,key,value # <- "row" here denotes the record index in core_properties.csv 1,origin,adwords 1,campaignId,4 ... </code></pre> I then <code>read.table</code> each file into a data frame, and I am now attempting to store the k/v pairs a list (with names=keys) inside cells of the core events data frame. This has been a lot of awkward trial and error, and the best approach I've found so far is the following: <pre><code>events <- read.csv('core_properties.csv', header=TRUE) parameters <- read.csv('parameters.csv', header=TRUE,colClasses=c("character","character","character")) paramLists <- sapply(1:nrow(events), function(x) { list() }) apply(parameters,1,function(x) { paramLists [[ as.numeric(x[["row"]]) ]][[ x[["key"]] ]] <<- x[["value"]] }) events$parameters <- paramLists </code></pre> I can now access the origin property of the first event by the syntax: <code>events[1,][["parameters"]][[1]][["origin"]]</code> - note it requires for some reason an extra <code>[[1]]</code> subscript in there. Data frames do not seem to appreciate being given lists as individual values for cells: <pre><code>> events[1,][["parameters"]] <- list() Error in `[[<-.data.frame`(`*tmp*`, "parameters", value = list()) : replacement has 0 rows, data has 1 </code></pre> Is there a best practice for handling this sort of data? I have not found it discussed in the manuals and tutorials. Thank you!
Tags
<r><hash><dataframe>
Title
R: How to represent a table augmented by arbitrary key/value pairs for each row?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USRoman Luštrik
UserOwnerUserId
1. USYetanotherjosh
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COJSON translates nicely to `list`s in R. The names of the list serve as your keys. For a keyed tabular data structure, have a look at data.table.
 singulars
 PostPostId
 POR: How to represent a table augmented by arbitrary key/value pairs for each row?
 UserUserId
 USRicardo Saporta

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.