Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h2>Lists Rock</h2> <p>By far the most friendly data structure for sequential data in Haskell is the List</p> <pre><code> data [a] = a:[a] | [] </code></pre> <p>Lists give you &#1012;(1) cons and pattern matching. The standard library, and for that matter the prelude, is full of useful list functions that should litter your code (<code>foldr</code>,<code>map</code>,<code>filter</code>). Lists are <em>persistant</em> , aka purely functional, which is very nice. Haskell lists aren't really "lists" because they are coinductive (other languages call these streams) so things like</p> <pre><code>ones :: [Integer] ones = 1:ones twos = map (+1) ones tenTwos = take 10 twos </code></pre> <p>work wonderfully. Infinite data structures rock.</p> <p>Lists in Haskell provide an interface much like iterators in imperative languages (because of laziness). So, it makes sense that they are widely used. </p> <h1>On the other hand</h1> <p>The first problem with lists is that to index into them <code>(!!)</code> takes &#1012;(k) time, which is annoying. Also, appends can be slow <code>++</code>, but Haskell's lazy evaluation model means that these can be treated as fully amortized, if they happen at all.</p> <p>The second problem with lists is that they have poor data locality. Real processors incur high constants when objects in memory are not laid out next to each other. So, in C++ <code>std::vector</code> has faster "snoc" (putting objects at the end) than any pure linked list data structure I know of, although this is not a persistant data structure so less friendly than Haskell's lists.</p> <p>The third problem with lists is that they have poor space efficiency. Bunches of extra pointers push up your storage (by a constant factor).</p> <h2>Sequences Are Functional</h2> <p><code>Data.Sequence</code> is internally based on <a href="http://en.wikipedia.org/wiki/Finger_tree" rel="noreferrer">finger trees</a> (I know, you don't want to know this) which means that they have some nice properties</p> <ol> <li>Purely functional. <code>Data.Sequence</code> is a fully persistant data structure.</li> <li>Darn fast access to the beginning and end of the tree. &#1012;(1) (amortized) to get the first or last element, or to append trees. At the thing lists are fastest at, <code>Data.Sequence</code> is at most a constant slower. </li> <li>&#1012;(log n) access to the middle of the sequence. This includes inserting values to make new sequences</li> <li>High quality API</li> </ol> <p>On the other hand, <code>Data.Sequence</code> doesn't do much for the data locality problem, and only works for finite collections (it is less lazy than lists)</p> <h2>Arrays are not for the faint of heart</h2> <p>Arrays are one of the most important data structures in CS, but they dont fit very well with the lazy pure functional world. Arrays provide &#1012;(1) access to the middle of the collection and exceptionally good data locality/constant factors. But, since they dont fit very well into Haskell, they are a pain to use. There are actually a multitude of different array types in the current standard library. These include fully persistant arrays, mutable arrays for the IO monad, mutable arrays for the ST monad, and un-boxed versions of the above. For more check out <a href="http://www.haskell.org/haskellwiki/Arrays" rel="noreferrer">the haskell wiki</a></p> <h2>Vector is a "better" Array</h2> <p>The <code>Data.Vector</code> package provides all of the array goodness, in a higher level and cleaner API. Unless you really know what you are doing, you should use these if you need array like performance. Of-course, some caveats still apply--mutable array like data structures just dont play nice in pure lazy languages. Still, sometimes you want that O(1) performance, and <code>Data.Vector</code> gives it to you in a useable package.</p> <h2>You have other options</h2> <p>If you just want lists with the ability to efficiently insert at the end, you can use a <a href="http://www.haskell.org/haskellwiki/Difference_list" rel="noreferrer">difference list</a>. The best example of lists screwing up performance tends to come from <code>[Char]</code> which the prelude has aliased as <code>String</code>. <code>Char</code> lists are convient, but tend to run on the order of 20 times slower than C strings, so feel free to use <code>Data.Text</code> or the very fast <code>Data.ByteString</code>. I'm sure there are other sequence oriented libraries I'm not thinking of right now. </p> <h2>Conclusion</h2> <p>90+% of the time I need a sequential collection in Haskell lists are the right data structure. Lists are like iterators, functions that consume lists can easily be used with any of these other data structures using the <code>toList</code> functions they come with. In a better world the prelude would be fully parametric as to what container type it uses, but currently <code>[]</code> litters the standard library. So, using lists (almost) every where is definitely okay.<br> You can get fully parametric versions of most of the list functions (and are noble to use them)</p> <pre><code>Prelude.map ---&gt; Prelude.fmap (works for every Functor) Prelude.foldr/foldl/etc ---&gt; Data.Foldable.foldr/foldl/etc Prelude.sequence ---&gt; Data.Traversable.sequence etc </code></pre> <p>In fact, <code>Data.Traversable</code> defines an API that is more or less universal across any thing "list like".</p> <p>Still, although you can be good and write only fully parametric code, most of us are not and use list all over the place. If you are learning, I strongly suggest you do too. </p> <hr> <p>EDIT: Based on comments I realize I never explained when to use <code>Data.Vector</code> vs <code>Data.Sequence</code>. Arrays and Vectors provide extremely fast indexing and slicing operations, but are fundamentally transient (imperative) data structures. Pure functional data structures like <code>Data.Sequence</code> and <code>[]</code> let efficiently produce <em>new</em> values from old values as if you had modified the old values. </p> <pre><code> newList oldList = 7 : drop 5 oldList </code></pre> <p>doesn't modify old list, and it doesn't have to copy it. So even if <code>oldList</code> is incredibly long, this "modification" will be very fast. Similarly</p> <pre><code> newSequence newValue oldSequence = Sequence.update 3000 newValue oldSequence </code></pre> <p>will produce a new sequence with a <code>newValue</code> for in the place of its 3000 element. Again, it doesn't destroy the old sequence, it just creates a new one. But, it does this very efficiently, taking O(log(min(k,k-n)) where n is the length of the sequence, and k is the index you modify. </p> <p>You cant easily do this with <code>Vectors</code> and <code>Arrays</code>. They can be <em>modified</em> but that is real imperative modification, and so cant be done in regular Haskell code. That means operations in the <code>Vector</code> package that make modifications like <code>snoc</code> and <code>cons</code> have to copy the entire vector so take <code>O(n)</code> time. The only exception to this is that you can use the mutable version (<code>Vector.Mutable</code>) inside the <code>ST</code> monad (or <code>IO</code>) and do all your modifications just like you would in an imperative language. When you are done, you "freeze" your vector to turn in into the immutable structure you want to use with pure code. </p> <p>My feeling is that you should default to using <code>Data.Sequence</code> if a list is not appropriate. Use <code>Data.Vector</code> only if your usage pattern doesn't involve making many modifications, or if you need extremely high performance within the ST/IO monads.</p> <p>If all this talk of the <code>ST</code> monad is leaving you confused: all the more reason to stick to pure fast and beautiful <code>Data.Sequence</code>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload