Note that there are some explanatory texts on larger screens.

plurals
  1. POHandling exceptions in an iteratee library without an error state
    primarykey
    data
    text
    <p>I'm trying to write an enumerator for reading files line by line from a <code>java.io.BufferedReader</code> using <a href="https://github.com/scalaz/scalaz" rel="noreferrer">Scalaz</a> 7's iteratee library, which currently only provides an (extremely slow) enumerator for <code>java.io.Reader</code>.</p> <p>The problems I'm running into are related to the fact that all of the other iteratee libraries I've used (e.g. <a href="http://www.playframework.org/documentation/2.0/Iteratees" rel="noreferrer">Play 2.0's</a> and <a href="http://hackage.haskell.org/package/enumerator" rel="noreferrer">John Millikin's <code>enumerator</code></a> for Haskell) have had an error state as one of their <code>Step</code> type's constructors, and Scalaz 7 doesn't.</p> <h2>My current implementation</h2> <p>Here's what I currently have. First for some imports and <code>IO</code> wrappers:</p> <pre><code>import java.io.{ BufferedReader, File, FileReader } import scalaz._, Scalaz._, effect.IO, iteratee.{ Iteratee =&gt; I, _ } def openFile(f: File) = IO(new BufferedReader(new FileReader(f))) def readLine(r: BufferedReader) = IO(Option(r.readLine)) def closeReader(r: BufferedReader) = IO(r.close()) </code></pre> <p>And an type alias to clean things up a bit:</p> <pre><code>type ErrorOr[A] = Either[Throwable, A] </code></pre> <p>And now a <code>tryIO</code> helper, modeled (loosely, and probably wrongly) on the one in <code>enumerator</code>:</p> <pre><code>def tryIO[A, B](action: IO[B]) = I.iterateeT[A, IO, ErrorOr[B]]( action.catchLeft.map( r =&gt; I.sdone(r, r.fold(_ =&gt; I.eofInput, _ =&gt; I.emptyInput)) ) ) </code></pre> <p>An enumerator for the <code>BufferedReader</code> itself:</p> <pre><code>def enumBuffered(r: =&gt; BufferedReader) = new EnumeratorT[ErrorOr[String], IO] { lazy val reader = r def apply[A] = (s: StepT[ErrorOr[String], IO, A]) =&gt; s.mapCont(k =&gt; tryIO(readLine(reader)) flatMap { case Right(None) =&gt; s.pointI case Right(Some(line)) =&gt; k(I.elInput(Right(line))) &gt;&gt;== apply[A] case Left(e) =&gt; k(I.elInput(Left(e))) } ) } </code></pre> <p>And finally an enumerator that's responsible for opening and closing the reader:</p> <pre><code>def enumFile(f: File) = new EnumeratorT[ErrorOr[String], IO] { def apply[A] = (s: StepT[ErrorOr[String], IO, A]) =&gt; s.mapCont(k =&gt; tryIO(openFile(f)) flatMap { case Right(reader) =&gt; I.iterateeT( enumBuffered(reader).apply(s).value.ensuring(closeReader(reader)) ) case Left(e) =&gt; k(I.elInput(Left(e))) } ) } </code></pre> <p>Now suppose for example that I want to collect all the lines in a file that contain at least twenty-five <code>'0'</code> characters into a list. I can write:</p> <pre><code>val action: IO[ErrorOr[List[String]]] = ( I.consume[ErrorOr[String], IO, List] %= I.filter(_.fold(_ =&gt; true, _.count(_ == '0') &gt;= 25)) &amp;= enumFile(new File("big.txt")) ).run.map(_.sequence) </code></pre> <p>In many ways this seems to work beautifully: I can kick the action off with <code>unsafePerformIO</code> and it will chunk through tens of millions of lines and gigabytes of data in a couple of minutes, in constant memory and without blowing the stack, and then close the reader when it's done. If I give it the name of a file that doesn't exist, it will dutifully give me back the exception wrapped in a <code>Left</code>, and <code>enumBuffered</code> at least seems to behave appropriately if it hits an exception while reading.</p> <h2>Potential problems</h2> <p>I have some concerns about my implementation, though—particularly of <code>tryIO</code>. For example, suppose I try to compose a few iteratees:</p> <pre><code>val it = for { _ &lt;- tryIO[Unit, Unit](IO(println("a"))) _ &lt;- tryIO[Unit, Unit](IO(throw new Exception("!"))) r &lt;- tryIO[Unit, Unit](IO(println("b"))) } yield r </code></pre> <p>If I run this, I get the following:</p> <pre><code>scala&gt; it.run.unsafePerformIO() a b res11: ErrorOr[Unit] = Right(()) </code></pre> <p>If I try the same thing with <code>enumerator</code> in GHCi, the result is more like what I'd expect:</p> <pre><code>...&gt; run $ tryIO (putStrLn "a") &gt;&gt; tryIO (error "!") &gt;&gt; tryIO (putStrLn "b") a Left ! </code></pre> <p>I just don't see a way to get this behavior without an error state in the iteratee library itself.</p> <h2>My questions</h2> <p>I don't claim to be any kind of expert on iteratees, but I have used the various Haskell implementations in a few projects, feel like I more or less understand the fundamental concepts, and had coffee with Oleg once. I'm at a loss here, though. Is this a reasonable way to handle exceptions in the absence of an error state? Is there a way to implement <code>tryIO</code> that would behave more like the <code>enumerator</code> version? Is there some kind of time bomb waiting for me in the fact that my implementation behaves differently?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. COI won't have a good answer for you anyway, but I'm curious: what are your performance and/or abstraction goals? Presumably you care some about performance, or `Reader` would be fine; and presumably you care some about abstraction, or you'd bail on this overhead and use much more compact and probably better-performing strategies (for cases which are simple and do not need to compose). For example, `Try(closing(io.Source.fromFile("big.txt")){_.getLines.filter(_.count(_=='0') >= 25)}.toList)`, with the obvious three-line definition for `closing`.
      singulars
    2. CO@RexKerr: The performance of the `Reader` enumerator that comes with Scalaz is incredibly bad—literally dozens of times worse than the enumerator I've implemented here. But I'm happy to take a reasonable performance hit for the convenience of being able for example to `flatMap(enumFile)` a enumerator that lists the files in a directory and get an enumerator of the lines of all those files, without ever having to worry about explicitly closing any resources (which the iteratee approach allows).
      singulars
    3. COIf you can provide a fully working gist with your current example, I can try and convert it to what I wrote in my answer.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload