Note that there are some explanatory texts on larger screens.

plurals
  1. POR: How to represent a table augmented by arbitrary key/value pairs for each row?
    primarykey
    data
    text
    <p>This is a newbie R question. I am beginning to explore the use of R for website analytics. I have a set of page view events which have common properties along with an arbitrary set of properties that depend on the page. For instance, all events will have a <code>userId</code>, <code>createdAt</code>, and <code>pageId</code>, but the <code>"signup"</code> page might have a special property <code>origin</code> whose value could be <code>"adwords"</code> or <code>"organic"</code>, etc.</p> <p>In JSON, the data might look like this:</p> <pre><code>[ { "userId":null, "pageId":"home", "sessionId":"abcd", "createdAt":1381013741, "parameters":{}, }, { "userId":123, "pageId":"signup", "sessionId":"abcd", "createdAt":1381013787, "parameters":{ "origin":"adwords", "campaignId":4 } } ] </code></pre> <p>I have been struggling to represent this data in R data structures effectively. <strong>In particular I need to be able to subset the event list by conditions based on the arbitrary key/value pairs,</strong> for instance, select all events whose <code>pageId=="signup"</code> and <code>origin=="adwords"</code>.</p> <p>There is enough diversity in the keys used for the arbitrary parameters that it seems unreasonable to create sparsely-populated columns for every possible key.</p> <p>What I'm currently doing is pre-processing the data into two CSV files, <code>core_properties.csv</code> and <code>parameters.csv</code>, in the form:</p> <pre><code># core_properties.csv (one record per pageview) userId,pageId,sessionId,createdAt ,home,abcd 123,signup,abcd,1381013741 ... # parameters.csv (one record per k/v pair) row,key,value # &lt;- "row" here denotes the record index in core_properties.csv 1,origin,adwords 1,campaignId,4 ... </code></pre> <p>I then <code>read.table</code> each file into a data frame, and I am now attempting to store the k/v pairs a list (with names=keys) inside cells of the core events data frame. This has been a lot of awkward trial and error, and the best approach I've found so far is the following:</p> <pre><code>events &lt;- read.csv('core_properties.csv', header=TRUE) parameters &lt;- read.csv('parameters.csv', header=TRUE,colClasses=c("character","character","character")) paramLists &lt;- sapply(1:nrow(events), function(x) { list() }) apply(parameters,1,function(x) { paramLists [[ as.numeric(x[["row"]]) ]][[ x[["key"]] ]] &lt;&lt;- x[["value"]] }) events$parameters &lt;- paramLists </code></pre> <p>I can now access the origin property of the first event by the syntax: <code>events[1,][["parameters"]][[1]][["origin"]]</code> - note it requires for some reason an extra <code>[[1]]</code> subscript in there. Data frames do not seem to appreciate being given lists as individual values for cells:</p> <pre><code>&gt; events[1,][["parameters"]] &lt;- list() Error in `[[&lt;-.data.frame`(`*tmp*`, "parameters", value = list()) : replacement has 0 rows, data has 1 </code></pre> <p>Is there a best practice for handling this sort of data? I have not found it discussed in the manuals and tutorials.</p> <p>Thank you!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload