Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Hopefully someone with more experience will have a more polished, battle-tested and ready answer, but here's my shot at it.</p> <p>You can have your pie and eat part of it too at relatively little cost with GADTs:</p> <pre><code>{-# LANGUAGE GADTs #-} data P0 -- phase zero data P1 -- phase one data Type p where TypeInt :: Id -&gt; Type p TypePointer :: Id -&gt; Type p -&gt; Type p -- target type TypeArray :: Id -&gt; Type p -&gt; Expr p -&gt; Type p -- elt type, elt count TypeStruct :: Id -&gt; [(String, Type p)] -&gt; Type p -- [(field name, field type)] TypeOf :: Id -&gt; Expr P0 -&gt; Type P0 TypeDef :: Id -&gt; Type P0 -&gt; Type P0 data Expr p where ExprInt :: Int -&gt; Expr p -- literal int ExprVar :: Var -&gt; Expr p -- variable ExprSizeof :: Type P0 -&gt; Expr P0 ExprUnop :: Unop -&gt; Expr p -&gt; Expr p ExprBinop :: Binop -&gt; Expr p -&gt; Expr p -&gt; Expr p ExprField :: Bool -&gt; Expr P0 -&gt; String -&gt; Expr P0 -- Bool gives true=&gt;s.field, false=&gt;p-&gt;field </code></pre> <p>Here the things we've changed are:</p> <ul> <li><p>The datatypes now use GADT syntax. This means that constructors are declared using their type signatures. <code>data Foo = Bar Int Char</code> becomes <code>data Foo where Bar :: Int -&gt; Char -&gt; Foo</code> (aside from syntax, the two are completely equivalent).</p></li> <li><p>We have added a type variable to both <code>Type</code> and <code>Expr</code>. This is a so-called phantom type variable: there is no actual data stored that is of type <code>p</code>, it is used only to enforce invariants in the type system.</p></li> <li><p>We've declared dummy types to represent the phases before and after the transformation: phase zero and phase one. (In a more elaborate system with multiple phases we could potentially use type-level numbers to denote them.)</p></li> <li><p>GADTs let us store type-level invariants in the data structure. Here we have two of them. The first is that recursive positions must be of the same phase as the structure containing them. For example, looking at <code>TypePointer :: Id -&gt; Type p -&gt; Type p</code>, you pass a <code>Type p</code> to the <code>TypePointer</code> constructor and get a <code>Type p</code> as result, and those <code>p</code>s must be the same type. (If we wanted to allow different types, we could use <code>p</code> and <code>q</code>.)</p></li> <li><p>The second is that we enforce the fact that some constructors can only be used in the first phase. Most of the constructors are polymorphic in the phantom type variable <code>p</code>, but some of them require that it be <code>P0</code>. This means that those constructors can only be used to construct values of type <code>Type P0</code> or <code>Expr P0</code>, not any other phase.</p></li> </ul> <p>GADTs work in two directions. The first is that if you have a function which returns a <code>Type P1</code>, and try to use one of the constructors which returns a <code>Type P0</code> to construct it, you will get a type error. This is what's called "correct by construction": it's statically impossible to construct an invalid structure (provided you can encode all of the relevant invariants in the type system). The flip side of it is that if you have a value of <code>Type P1</code>, you can be sure that it was constructed correctly: the <code>TypeOf</code> and <code>TypeDef</code> constructors can't have been used (in fact, the compiler will complain if you try to pattern match on them), and any recursive positions must also be of phase <code>P1</code>. Essentially, when you construct a GADT you store evidence that the type constraints are satisfied, and when you pattern match on it you retrieve that evidence and can take advantage of it.</p> <p>That was the easy part. Unfortunately, we have some differences between the two types beyond just which constructors are allowed: some of the constructor arguments are different between the phases, and some are only present in the post-transformation phase. We can again use GADTs to encode this, but it's not as low-cost and elegant. One solution would be to duplicate all of the constructors which are different, and have one each for <code>P0</code> and <code>P1</code>. But duplication isn't nice. We can try to do it more fine-grained:</p> <pre><code>-- a couple of helper types -- here I take advantage of the fact that of the things only present in one phase, -- they're always present in P1 and not P0, and not vice versa data MaybeP p a where NothingP :: MaybeP P0 a JustP :: a -&gt; MaybeP P1 a data EitherP p a b where LeftP :: a -&gt; EitherP P0 a b RightP :: b -&gt; EitherP P1 a b data Type p where TypeInt :: Id -&gt; Type p TypePointer :: Id -&gt; Type p -&gt; Type p TypeArray :: Id -&gt; Type p -&gt; EitherP p (Expr p) Int -&gt; Type p TypeStruct :: Id -&gt; [(String, MaybeP p Int, Type p)] -&gt; Type p TypeOf :: Id -&gt; Expr P0 -&gt; Type P0 TypeDef :: Id -&gt; Type P0 -&gt; Type P0 -- for brevity type MaybeType p = MaybeP p (Type p) data Expr p where ExprInt :: MaybeType p -&gt; Int -&gt; Expr p ExprVar :: MaybeType p -&gt; Var -&gt; Expr p ExprSizeof :: Type P0 -&gt; Expr P0 ExprUnop :: MaybeType p -&gt; Unop -&gt; Expr p -&gt; Expr p ExprBinop :: MaybeType p -&gt; Binop -&gt; Expr p -&gt; Expr p -&gt; Expr p ExprField :: Bool -&gt; Expr P0 -&gt; String -&gt; Expr P0 </code></pre> <p>Here with some helper types we've enforced the fact that some constructor arguments can only be present in phase one (<code>MaybeP</code>) and some are different between the two phases (<code>EitherP</code>). While this makes us completely type-safe, it feels a bit ad-hoc and we still have to wrap things in and out of the <code>MaybeP</code>s and <code>EitherP</code>s all the time. I don't know if there's a better solution in that respect. Complete type safety is something, though: we could write <code>fromJustP :: MaybeP P1 a -&gt; a</code> and be sure that it is completely and utterly safe.</p> <p>Update: An alternative is to use <code>TypeFamilies</code>:</p> <pre><code>data Proxy a = Proxy class Phase p where type MaybeP p a type EitherP p a b maybeP :: Proxy p -&gt; MaybeP p a -&gt; Maybe a eitherP :: Proxy p -&gt; EitherP p a b -&gt; Either a b phase :: Proxy p phase = Proxy instance Phase P0 where type MaybeP P0 a = () type EitherP P0 a b = a maybeP _ _ = Nothing eitherP _ a = Left a instance Phase P1 where type MaybeP P1 a = a type EitherP P1 a b = b maybeP _ a = Just a eitherP _ a = Right a </code></pre> <p>The only change to <code>Expr</code> and <code>Type</code> relative the previous version is that the constructors need to have an added <code>Phase p</code> constraint, e.g. <code>ExprInt :: Phase p =&gt; MaybeType p -&gt; Int -&gt; Expr p</code>.</p> <p>Here if the type of <code>p</code> in a <code>Type</code> or <code>Expr</code> is known, you can statically know whether the <code>MaybeP</code>s will be <code>()</code> or the given type and which of the types the <code>EitherP</code>s are, and can use them directly as that type without explicit unwrapping. When <code>p</code> is unknown you can use <code>maybeP</code> and <code>eitherP</code> from the <code>Phase</code> class to find out what they are. (The <code>Proxy</code> arguments are necessary, because otherwise the compiler wouldn't have any way to tell which phase you meant.) This is analogous to the GADT version where, if <code>p</code> is known, you can be sure of what <code>MaybeP</code> and <code>EitherP</code> contains, while otherwise you have to pattern match both possibilities. This solution isn't perfect either in the respect that the 'missing' arguments become <code>()</code> rather than disappearing entirely.</p> <p>Constructing <code>Expr</code>s and <code>Type</code>s also seems to be broadly similar between the two versions: if the value you're constructing has anything phase-specific about it then it must specify that phase in its type. The trouble seems to come when you want to write a function polymorphic in <code>p</code> but still handling phase-specific parts. With GADTs this is straightforward:</p> <pre><code>asdf :: MaybeP p a -&gt; MaybeP p a asdf NothingP = NothingP asdf (JustP a) = JustP a </code></pre> <p>Note that if I had merely written <code>asdf _ = NothingP</code> the compiler would have complained, because the type of the output wouldn't be guaranteed to be the same as the input. By pattern matching we can figure out what type the input was and return a result of the same type.</p> <p>With the <code>TypeFamilies</code> version, though, this is a lot harder. Just using <code>maybeP</code> and the resulting <code>Maybe</code> you can't prove anything to the compiler about types. You can get part of the way there by, instead of having <code>maybeP</code> and <code>eitherP</code> return <code>Maybe</code> and <code>Either</code>, making them deconstructor functions like <code>maybe</code> and <code>either</code> which also make a type equality available:</p> <pre><code>maybeP :: Proxy p -&gt; (p ~ P0 =&gt; r) -&gt; (p ~ P1 =&gt; a -&gt; r) -&gt; MaybeP p a -&gt; r eitherP :: Proxy p -&gt; (p ~ P0 =&gt; a -&gt; r) -&gt; (p ~ P1 =&gt; b -&gt; r) -&gt; EitherP p a b -&gt; r </code></pre> <p>(Note that we need <code>Rank2Types</code> for this, and note also that these are essentially the CPS-transformed versions of the GADT versions of <code>MaybeP</code> and <code>EitherP</code>.)</p> <p>Then we can write:</p> <pre><code>asdf :: Phase p =&gt; MaybeP p a -&gt; MaybeP p a asdf a = maybeP phase () id a </code></pre> <p>But that's still not enough, because GHC says:</p> <pre><code>data.hs:116:29: Could not deduce (MaybeP p a ~ MaybeP p0 a0) from the context (Phase p) bound by the type signature for asdf :: Phase p =&gt; MaybeP p a -&gt; MaybeP p a at data.hs:116:1-29 NB: `MaybeP' is a type function, and may not be injective In the fourth argument of `maybeP', namely `a' In the expression: maybeP phase () id a In an equation for `asdf': asdf a = maybeP phase () id a </code></pre> <p>Maybe you could solve this with a type signature somewhere, but at that point it seems like more bother than it's worth. So pending further information from someone else I'm going to recommend using the GADT version, being the simpler and more robust one, at the cost of a little syntactic noise.</p> <p>Update again: The problem here was that because <code>MaybeP p a</code> is a type function and there is no other information to go by, GHC has no way to know what <code>p</code> and <code>a</code> should be. If I pass in a <code>Proxy p</code> and use that instead of <code>phase</code> that solves <code>p</code>, but <code>a</code> is still unknown.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload