StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
20816935
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2013-12-28T16:50:01.593
FavoriteCount
0
LastActivityDate
2013-12-28T19:15:17.303
LastEditDate
2013-12-28T19:15:17.303
LastEditorUserId
2407038
OwnerUserId
2407038
ParentId
20804329
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
Lets start with question 2 because it is easier to answer. Your approach is correct: as you parse things, you remove those characters from the input string, and return a tuple containing the result of the parse, and the remaining string. However, thereis no reason to write all this from scratch (except perhaps as an academic exercise) - there are plenty of parsers which will take care of this issue for you. The one I will use is <a href="http://hackage.haskell.org/package/parsec-3.0.0" rel="nofollow"><code>Parsec</code></a>. If you are new to monadic parsing you should first read <a href="http://book.realworldhaskell.org/read/using-parsec.html" rel="nofollow">the section on <code>Parsec</code> in RWH.</a> As for question 1, if you use <code>ByteString</code> instead of <code>String</code>, then parsing single bytes is easy since single bytes are the atomic elements of <code>ByteString</code>s! There is also the issue of the <code>Char</code>/<code>ByteString</code> interface. With <code>Parsec</code>, this is a non-issue since you can treat a <code>ByteString</code> as a sequence of <code>Byte</code> or <code>Char</code> - we will see this later. I decided to just write the full parser - this is a very simple language so with all the primitives defined for you in the <code>Parsec</code> library, it is very easy and very concise. The file header: <pre><code>import Text.Parsec.Combinator import Text.Parsec.Char import Text.Parsec.ByteString import Text.Parsec import Text.Parsec.Pos import Data.ByteString (ByteString, pack) import qualified Data.ByteString.Char8 as C8 import Control.Monad (replicateM) import Data.Monoid </code></pre> First, we write the 'primitive' parsers - that is, parsing bytes, parsing textual numbers, and parsing whitespace (which the PPM format uses as a seperator): <pre><code>parseIntegral :: (Read a, Integral a) => Parser a parseIntegral = fmap read (many1 digit) </code></pre> <code>digit</code> parses a single digit - you'll notice that many function names explain what the parser does - and <code>many1</code> will apply the given parser 1 or more times. Then we read the resulting string to return an actual number (as opposed to a string). In this case, the input <code>ByteString</code> is being treated as text. <pre><code>parseByte :: Integral a => Parser a parseByte = fmap (fromIntegral . fromEnum) $ tokenPrim show (\pos tok _ -> updatePosChar pos tok) Just </code></pre> For this parser, we parse a single <code>Char</code> - which is really just a byte. It is just returned as a <code>Char</code>. We could safely make the return type <code>Parser Word8</code> because the universe of values that can be returned is <code>[0..255]</code> <pre><code>whitespace1 :: Parser () whitespace1 = many1 (oneOf "\n ") >> return () </code></pre> <code>oneOf</code> takes a list of <code>Char</code> and parses any one of the characters in the order given - again, the <code>ByteString</code> is being treated as <code>Text</code>. Now we can write the parser for the header. <pre><code>parseHeader :: Parser Header parseHeader = do f <- choice $ map try $ [string "P3" >> return TextualBitmap ,string "P6" >> return BinaryBitmap] w <- whitespace1 >> parseIntegral h <- whitespace1 >> parseIntegral d <- whitespace1 >> parseIntegral return $ Header f w h d </code></pre> A few notes. <code>choice</code> takes a list of parsers and tries them in order. <code>try p</code> takes the parser p, and 'remembers' the state before <code>p</code> starts parsing. If p succeeds, then <code>try p == p</code>. If p fails, then the state before p started is restored and you pretend you never tried <code>p</code>. This is necessary due to how <code>choice</code> behaves. For the pixels, we have two choices as of now: <pre><code>parseTextual :: Header -> Parser [Pixel] parseTextual h = do xs <- replicateM (3 * width h * height h) (whitespace1 >> parseIntegral) return $ map (\[a,b,c] -> Pixel a b c) $ chunksOf 3 xs </code></pre> We could use <code>many1 (whitespace 1 >> parseIntegral)</code> - but this wouldn't enforce the fact that we know what the length should be. Then, converting the list of numbers to a list of pixels is trivial. For binary data: <pre><code>parseBinary :: Header -> Parser [Pixel] parseBinary h = do whitespace1 xs <- replicateM (3 * width h * height h) parseByte return $ map (\[a,b,c] -> Pixel a b c) $ chunksOf 3 xs </code></pre> Note how the two are almost identical. You could probably generalize this function (it would be especially useful if you decided to parse the other types of pixel data - monochrome and greyscale). Now to bring it all together: <pre><code>parsePPM :: Parser PPM parsePPM = do h <- parseHeader fmap (PPM h) $ case format h of TextualBitmap -> parseTextual h BinaryBitmap -> parseBinary h </code></pre> This should be self-explanatory. Parse the header, then parse the body based on the format. Here are some examples to try it on. They are the ones from the specification page. <pre><code>example0 :: ByteString example0 = C8.pack $ unlines ["P3" , "4 4" , "15" , " 0 0 0 0 0 0 0 0 0 15 0 15" , " 0 0 0 0 15 7 0 0 0 0 0 0" , " 0 0 0 0 0 0 0 15 7 0 0 0" , "15 0 15 0 0 0 0 0 0 0 0 0" ] example1 :: ByteString example1 = C8.pack ("P6 4 4 15 ") <> pack [0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 15, 0, 0, 0, 0, 15, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 7, 0, 0, 0, 15, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] </code></pre> Several notes: this doesn't handle comments, which are part of the spec. The error messages are not very useful; you can use the <a href="http://hackage.haskell.org/package/parsec-3.0.0/docs/Text-Parsec-Prim.html#v%3a-60--63--62-" rel="nofollow"><code><?></code></a> function to create your own error messages. The spec also indicates 'The lines should not be longer than 70 characters.' - this is also not enforced. edit: Just because you see do-notation, doesn't necessarily mean that you are working with impure code. Some monads (like this parser) are still pure - they are just used for convenience. For example, you can write your parser with the type <code>parser :: String -> (a, String)</code>, or, what we have done here, is we use a new type: <code>data Parser a = Parser (String -> (a, String))</code> and have <code>parser :: Parser a</code>; we then write a monad instance for <code>Parser</code> to get the useful do-notation. To be clear, <code>Parsec</code> supports monadic parsing, but our parser is not monadic - or rather, uses the <code>Identity</code> monad, which is just <code>newtype Identity a = Identity { runIdentity :: a }</code>, and is only necessary because if we used <code>type Identity a = a</code> we would have 'overlapping instances' errors everywhere, which is not good. <pre><code>>:i Parser type Parser = Parsec ByteString () -- Defined in `Text.Parsec.ByteString' >:i Parsec type Parsec s u = ParsecT s u Data.Functor.Identity.Identity -- Defined in `Text.Parsec.Prim' </code></pre> So then, the type of <code>Parser</code> is really <code>ParsecT ByteString () Identity</code>. That is, the parser input is <code>ByteString</code>, the user state is <code>()</code> - which just means we aren't using the user state, and the monad in which we are parsing is <code>Identity</code>. <code>ParsecT</code> is itself just a newtype of: <pre><code>forall b. State s u -> (a -> State s u -> ParseError -> m b) -> (ParseError -> m b) -> (a -> State s u -> ParseError -> m b) -> (ParseError -> m b) -> m b </code></pre> All those functions in the middle are just used to pretty-print errors. If you are parsing 10's of thousands of characters and an error occurs, you won't be able to just look at it and see where that happened - but <code>Parsec</code> will tell you the line and column. If we specialize all the types to our <code>Parser</code>, and pretend that <code>Identity</code> is just <code>type Identity a = a</code>, then all the monads disappear and you can see that the parser is not impure. As you can see, <code>Parsec</code> is a lot more powerful than is required for this problem - I just used it due to familiarity, but if you were willing to write your own primitive functions like <code>many</code> and <code>digit</code>, then you could get away with using <code>newtype Parser a = Parser (ByteString -> (a, ByteString))</code>.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POParsing PPM images in Haskell
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USuser2407038
UserOwnerUserId
1. USuser2407038
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POParsing PPM images in Haskell
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.