Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to Correctly Use Lists in R?
    text
    copied!<p>Brief background: Many (most?) contemporary programming languages in widespread use have at least a handful of ADTs [abstract data types] in common, in particular,</p> <ul> <li><p><strong>string</strong> (a sequence comprised of characters)</p></li> <li><p><strong>list</strong> (an ordered collection of values), and</p></li> <li><p><strong>map-based type</strong> (an unordered array that maps keys to values)</p></li> </ul> <p>In the R programming language, the first two are implemented as <code>character</code> and <code>vector</code>, respectively.</p> <p>When I began learning R, two things were obvious almost from the start: <code>list</code> is the most important data type in R (because it is the parent class for the R <code>data.frame</code>), and second, I just couldn't understand how they worked, at least not well enough to use them correctly in my code.</p> <p>For one thing, it seemed to me that R's <code>list</code> data type was a straightforward implementation of the map ADT (<code>dictionary</code> in Python, <code>NSMutableDictionary</code> in Objective C, <code>hash</code> in Perl and Ruby, <code>object literal</code> in Javascript, and so forth).</p> <p>For instance, you create them just like you would a Python dictionary, by passing key-value pairs to a constructor (which in Python is <code>dict</code> not <code>list</code>):</p> <pre><code>x = list("ev1"=10, "ev2"=15, "rv"="Group 1") </code></pre> <p>And you access the items of an R List just like you would those of a Python dictionary, e.g., <code>x['ev1']</code>. Likewise, you can retrieve just the <em>'keys'</em> or just the <em>'values'</em> by: </p> <pre><code>names(x) # fetch just the 'keys' of an R list # [1] "ev1" "ev2" "rv" unlist(x) # fetch just the 'values' of an R list # ev1 ev2 rv # "10" "15" "Group 1" x = list("a"=6, "b"=9, "c"=3) sum(unlist(x)) # [1] 18 </code></pre> <p>but R <code>list</code>s are also <strong><em>unlike</em></strong> other map-type ADTs (from among the languages I've learned anyway). My guess is that this is a consequence of the initial spec for S, i.e., an intention to design a data/statistics DSL [domain-specific language] from the ground-up. </p> <p><em>three</em> significant differences between R <code>list</code>s and mapping types in other languages in widespread use (e.g,. Python, Perl, JavaScript):</p> <p><em>first</em>, <code>list</code>s in R are an <em>ordered</em> collection, just like vectors, even though the values are keyed (ie, the keys can be any hashable value not just sequential integers). Nearly always, the mapping data type in other languages is <em>unordered</em>.</p> <p><em>second</em>, <code>list</code>s can be returned from functions even though you never passed in a <code>list</code> when you called the function, and <em>even though</em> the function that returned the <code>list</code> doesn't contain an (explicit) <code>list</code> constructor (Of course, you can deal with this in practice by wrapping the returned result in a call to <code>unlist</code>):</p> <pre><code>x = strsplit(LETTERS[1:10], "") # passing in an object of type 'character' class(x) # returns 'list', not a vector of length 2 # [1] list </code></pre> <p>A <em>third</em> peculiar feature of R's <code>list</code>s: it doesn't seem that they can be members of another ADT, and if you try to do that then the primary container is coerced to a <code>list</code>. E.g.,</p> <pre><code>x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE) class(x) # [1] list </code></pre> <p>my intention here is not to criticize the language or how it is documented; likewise, I'm not suggesting there is anything wrong with the <code>list</code> data structure or how it behaves. All I'm after is to correct is my understanding of how they work so I can correctly use them in my code. </p> <p>Here are the sorts of things I'd like to better understand:</p> <ul> <li><p>What are the rules which determine when a function call will return a <code>list</code> (e.g., <code>strsplit</code> expression recited above)?</p></li> <li><p>If I don't explicitly assign names to a <code>list</code> (e.g., <code>list(10,20,30,40)</code>) are the default names just sequential integers beginning with 1? (I assume, but I am far from certain that the answer is yes, otherwise we wouldn't be able to coerce this type of <code>list</code> to a vector w/ a call to <code>unlist</code>.)</p></li> <li><p>Why do these two different operators, <code>[]</code>, and <code>[[]]</code>, return the <em>same</em> result?</p> <p><code>x = list(1, 2, 3, 4)</code></p> <p>both expressions return "1":</p> <p><code>x[1]</code></p> <p><code>x[[1]]</code></p></li> <li><p>why do these two expressions <strong>not</strong> return the same result?</p> <p><code>x = list(1, 2, 3, 4)</code></p> <p><code>x2 = list(1:4)</code></p></li> </ul> <p>Please don't point me to the R Documentation (<a href="http://www.inside-r.org/r-doc/base/list" rel="noreferrer"><code>?list</code></a>, <a href="http://cran.r-project.org/doc/manuals/r-devel/R-intro.html#Lists" rel="noreferrer"><code>R-intro</code></a>)--I have read it carefully and it does not help me answer the type of questions I recited just above.</p> <p>(lastly, I recently learned of and began using an R Package (available on CRAN) called <a href="http://mran.revolutionanalytics.com/packages/info/?hash" rel="noreferrer"><code>hash</code></a> which implements <em>conventional</em> map-type behavior via an S4 class; I can certainly recommend this Package.)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload