Preview: Things that amuse me

Last Build Date: Thu, 03 Aug 2017 13:39:28 +0000

A commentary on 24 days of GHC extensions, part 3

Sat, 20 Dec 2014 11:52:00 +0000

It's time for some more Haskell opinions. Day 13: Functional DependenciesSee day 12. The addition of functional dependencies to multi-parameter type classes all of a sudden opened up the possibility of doing type level computations that had not been there before. The type level programming industry seems to have started with Thomas Hallgren's 2001 paper Fun with Functional Dependencies (which got some inspiration from the type level numeric computation available in (classic) Bluespec). Day 14: DerivingI don't know the history of deriving Functor et al. I believe Simon PJ added it after enough nagging. I do have an intense dislike for GeneralizedNewtypeDeriving as implemented in GHC. As illustrated by ghc bug 1496 (found by Stefan O'Rear who was 14-15 at the time) it leads to an inconsistent type system. Instead of removing the root cause of the problem GHC HQ has decided to add a rather ingenious, but grotesque, solution where type variables are given roles. It's complex and it's going to get worse. It also brings its own problems (e.g., join can no longer be made a method in the monad class. WAT?!?!). In the two compilers where I've implemented GND I've chosen a different route. The compiler tries to construct a proper instance declaration for the derived class. If it can be constructed and type checks then all is well. If it doesn't, there's something tricky going on and the instance has to be written by hand. I claim this covers 99% of all GND uses. And it makes me sleep better at night having had the instance declaration approved by the type checker. I also prefer to make the compiler recognise and remove the identity functions that newtype gives rise to rather than forcing the user to explicitly using the new Coercible class. Day 15: Derive GenericThe generic deriving extension is a clever idea from Clean, where all the meta-data about a data type (constructor names, types, etc) is encoded at the type level, and thus available at compile time. The details of this encoding are rather gruesome, but I think it could be a lot nicer using the new type level strings, type level numbers, and type families. Time for a small redesign? Day 16: Overloaded StringOverloaded strings was another extension I added to GHC because of my work on the Paradise DSL. Since Paradise is deeply embedded using DSL-level strings would be ugly unless the string literals were overloaded. It was quite easy to add to GHC. The least easy part was adapting the defaulting rules to strings as well. Worth noting is that pattern matching on overloaded strings turns into an equality comparison (just as with numeric literals), and so requires the string type to be an instance of Eq, which is not a superclass of IsString. Oh, and the name IsString for the class with the fromString was just temporary since I couldn't think of a good name. I guess it stuck. Day 17: Rank-N TypesRank-n types are, of course, a rather old idea, but it took a while before they made it into Haskell. At the Haskell Workshop in La Jolla in 1995 Mark Jones had a very nice paper, From Hindley-Milner Types to First-Class Structures. It shows how to add a form of records to Haskell, where the record components can be polymorhic. This has the same expressivity as current rank-n types, but looks a little different. For example, the archetypical Haskell rank-2 function has this type now runST :: (forall s . ST s a) -> a With records with polymorphic components we have to introduce a new record type to write this. With modern Haskell it would look like data ArgST a = ArgST (forall s . ST s a) runST :: ArgST a -> a Or with Mark's suggested anonymous record types runST :: { arg :: forall s . ST s a } -> a The 1996 release of HBC contained this kind of polymorphic components, but it did not have the anonymous records. GHC got rank-n (or was it rank-2?) types in 2000. The GHC rank-n types are somewhat more convenient, but requires function type signatures which the hbc version did not. At the La Jolla ICFP we had a Haskell committee meeting about the next Haskell rel[...]
A commentary on 24 days of GHC extensions, part 2

Fri, 19 Dec 2014 16:45:00 +0000

Day 7: Type OperatorsI understand the usefulness of the type operator GHC extension, but I don't like it. For value identifiers any operator not starting with a ':' is treated the same way as identifiers starting with a lower case letter. This is how it used to be on the type level too. But with the latest version of the TypeOperators extension these operators now behave like upper case identifiers and can be used to name types. An ugly wart, IMO. Day 8: Recursive doThis extension stems from Levent Erkök's PhD thesis Value Recursion in Monadic Computations. The motivation was to be able to describe circuits with feedback using monadic notation. There's actually two way to get recursive do-notation. You can either use the rec keyword inside the do or you can use the keyword mdo instead of do. Why mdo, you ask? It should really be μdo, where μ is the recursion operator. But people at GHC HQ are scared of Unicode, so instead we got mdo. Day 9: Nullary Type ClassesOnce you have accepted multi-parameter type classes then nullary type classes are a natural consequence; zero is also a number, after all. In 1994 I worked on the pH language when I visited MIT. The acronym pH stands for "parallel Haskell", which is what it is. It had some imperative features that were not in a monad. This was in 1994, and the ST monad was just being invented, so we were trying out other things. To keep track of what parts of the program used imperative features I used a nullary type class Imperative that tainted any code that used assignment et al. (As far as I know, this is the earliest use of a nullary type class.) When mixing imperative code and pure code it is important to not be too polymorphic. ML-like languages use the value restriction to accomplish this. Since Haskell's monomorphism restriction is essentially the same as the value restriction, having the Imperative type class neatly accomplished the value restriction for pH. Day 10: Implicit ParametersThe implicit parameter started as an attempt to simplify Haskell overloading. Haskell overloading, when implemented by dictionary passing, really has two parts: first there is a way to implicitly pass dictionaries around, and second, dictionaries are constructed automatically by using the instance declarations as a logical inference system. The implicit parameters do the automagic passing of parameters, and is thus half of what is needed for overloading. But they were never used for overloading, instead it was discovered that they can be made to behave like dynamic binding, but with static types. This is quite cool! Even if one doesn't like dynamic binding (I don't). There idea is presented Erik Meijer et al's paper Implicit Parameters: Dynamic Scoping with Static Types. It's worth reading. Day 11: Type FamiliesType families grew out of the need to have type classes with associated types. The latter is not strictly necessary since it can be emulated with multi-parameter type classes, but it gives a much nicer notation in many cases. The same is true for type families; they can also be emulated by multi-parameter type classes. But MPTC gives a very logic programming style of doing type computation; whereas type families (which are just type functions that can pattern match on the arguments) is like functional programming. Using closed type families adds some extra strength that cannot be achieved by type classes. To get the same power from type classes we would need to add closed type classes. Which would be quite useful; this is what instance chains gives you. Day 12: Multi-Parameter Type ClassesEven the original paper on Haskell type classes mentions multiple parameters, so this idea has been with us from the start. But it's not present in Stefan Kaes original work on type classes. Unfortunately, multiple type class parameters very quickly lead to type ambiguities in the code, and we are forced to add type signatures. So MPTC was not really useful until Mark Jones introduced functional dependencies. Mark got the idea from[...]
A commentary on 24 days of GHC extensions

Mon, 15 Dec 2014 17:36:00 +0000

Ollie Charles has continued his excellent series of Christmas Haskell blogs. This year he talks about 24 Days of GHC Extensions. His blog posts are an excellent technical introduction to various Haskell extensions. Reading them inspired me to write some non-technical comments about the various extensions; giving a little bit of history and personal comments about them. Day 1: View PatternsView patterns have a long history. As far as I know views were first suggested by Phil Wadler in 1987, Views: a way for pattern matching to cohabit with data abstraction. As the title of the paper suggests, it was about being able to pattern match on abstract data types. Variations of Phil's suggestions have been implemented several times (I did it both in LML and for Haskell in hbc). In all these early suggestions the conversion between the concrete type and the view type were implicit, whereas in the ViewPatterns extension the conversion is explicit. The addition of view patterns was partly spurred by the fact that F# has a very neat way of introducing pattern matching for abstract types, called active patterns. Since Simon Peyton Jones is in the same research lab as Don Syme it's natural that Simon couldn't let F# have this advantage over Haskell. :) Day 2: Pattern SynonymsPattern synonyms is a more recent addition. The first time I remember hearing the idea was this the paper Abstract Value Constructors, and ever since then I wished that Haskell would have something like that. Patterns were one of the few constructs in Haskell that could not be named and reused. The first time they were added to Haskell was in the SHE preprocessor. At a Haskell hackathon in Cambridge, August 2011, Simon Peyton Jones, Simon Marlow and I met to hash out how they could be added to GHC. I wanted to go beyond simple pattern synonyms. The simplest kind is uni-directional synonyms which can only occur in patterns. The simply bi-directional synonyms can be used both in patterns and expressions, but are limited in what they can do. The explicit bi-directional synonyms allow the full power of both patterns and expressions. With explicit bi-directional synonyms and view patterns we can finally implement views as envisioned by Phil, but now broken down into smaller reusable pieces. The result of our discussions at the hackathon can ge found here. But we were all too busy to implement any of it. All the hard implementation work, and fixing all the additional snags found, was done by Gergő Érdi. I find this a very exciting extension, and you can take it even further, like Associated Pattern Synonyms in class declarations. Day 3: Record WildcardsThe record wildcard extension allows you to open a record and get access to all the fields without using qualified names for them. The first time I encountered this idea was in Pascal which as a with-statement. It looks like with expr do stmt where the expr is a record value and inside the stmt all the fields of the record can be accessed directly. The construct later appeared in Ada as well, where it's called use. So having something similar in Haskell doesn't seem so far fetched. But in was actually the dual, namely to construct values using record wildcard notation that inspired me to make this extension. In 2006 I was developing the Paradise EDSL which is a DSL embedded in Haskell for describing computations and user interfaces. I wanted a way to make the EDSL less verbose and that's why record wildcards came about. Here's a simple example to illustrate the idea. We want to be able to input a coordinate record. data Coord = Coord { x :: Double, y :: Double } coord :: P Coord coord = do x [...]
Haskell error reporting with locations, update

`Prelude` and always in scope. The type cannot be compared, shown, or anything. There's just one thing that can be done, namely:

`error` function needs a new exception to throw

`IO` monad.

Sat, 05 Apr 2014 19:53:00 +0000

Since some people (I'm among them) dislike impure features in Haskell I thought I'd present a slight variation on the error location feature that is "pure".First, the `__LOCATION__` variable gets an abstract type. So

data Location __LOCATION__ :: LocationIt's defined in the

extractLocation :: Location -> IO StringThe

data ErrorCallLoc = ErrorCallLoc Location String {-# LOCATIONTRANSPARENT error #-} error :: String -> a error s = throw (ErrorCallLoc __LOCATION__ s)This means that the location string cannot be used when we throw the error. But it can be used where the error is caught, since this can only be done in the

Under the hood the everything is just as before, `Location` is just a string. It just can't be manipulated except in the `IO` monad, so we can pretend it's pure.

newtype Location = Location String extractLocation (Location s) = return sIt now looks a lot like Michael Snoyman's proposal.

Haskell error reporting with locations

Fri, 04 Apr 2014 07:28:00 +0000

Error reporting in GHC is not always the nicest. For example, I often develop code by using undefined as a placeholder for code I've not written yet. Here's a simple example: import System.Environment main = do args a error s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s)) {-# LOCATIONTRANSPARENT undefined #-} undefined = error "undefined" Since both error and undefined are transparent any use of undefined will be reported with the location of the use. Furthermore, we can make a few more functions location transparent, e.g., {-# LOCATIONTRANSPARENT head #-} head :: [a] -> a head [] = error "Empty list" head (x:xs) = x A simple example: main = putStr (head []) Which will print: test/Head.hs:1:16: Empty list which is the location where head was called. ImplementationThere are different ways to implement this feature, and I'm going to sketch two of them. First: Every function that has the LOCATIONTRANSPARENT pragma will be inlined at the point of use, and the __LOCATION__ identifier in the inlined code will be updated to reflect the call site. The definitions must be processed in a bottom-up fashion for this to work. It's fairly simple to implement, but will cause some code bloat due to inlining. Second: Every function that has LOCATIONTRANSPARENT pragma will be rewritten (by the compiler) to have an extra location argument, and each use of this function will be rewritten to pass in the current location. For example (using $$f for the location version of f): main = putStr ($$head __LOCATION__ []) $$head __LOCATION__ [] = $$error __LOCATION__ "Empty list" $$head __LOCATION__ (x:xs) = x $$error __LOCATION__ s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s)) This should be fairly straightforward to implement, but I've not tried it. (It's somewhat like dynamic binding, so maybe ghc could reuse that mechanism for locations.) And, of course, the global __LOCATION__ identifier has to be recognized by the compiler and replaced by a string that is its location. ConclusionI implemented the __LOCATION__ hack quite a while ago, and I like the much improved reporting of error locations. I hope someone will add it to ghc as well. [...]
A small Haskell extension

Thu, 03 Apr 2014 07:40:00 +0000

The extension In Haskell you can give a type to an expression by writing expr :: type. To an untrained eye the :: looks just like an infix operator, even if it is described as a special syntactical construct in the Haskell report. But let's think about it as an infix operator for a moment. For an infix operator you you can for a section, i.e., a use of the operator with one operand left out. For instance (* 2) leaves out the first operand, and Haskell defines this to be the same as (\ x -> x * 2). Regarding :: as an operator we should be able to write (:: type) and it should have the obvious meaning (\ x -> x :: type). I suggest, and I plan sending the haskell-prime mailing list, Haskell should adopt this small extension. Why? First, the extension is very light weight and has almost no extra intellectual weight for anyone learning Haskell. I'd argue it makes the language simpler because it allows :: to be treated more like an infix operator. But without use cases this would probably not be enough of an argument. Example 1 We want to make a function, canonDouble, that takes a string representing a Double and changes it to the standard Haskell string representing this Double. E.g. canonDouble "0.1e1" == "1.0". A first attempt might look like this: canonDouble :: String -> String canonDouble = show . read -- WRONG! This is, of course, wrong since the compiler cannot guess that the type between read and show should be a Double. We can convey this type information in different ways, e.g.: canonDouble :: String -> String canonDouble = show . asDouble . read where asDouble :: Double -> Double asDouble x = x This is somewhat clumsy. Using my proposed extension we can instead write: canonDouble :: String -> String canonDouble = show . (:: Double) . read This has the obvious meaning, and succinctly describes what we want. Example 2 In ghc 7.8 there is a new, better implementation of Data.Typeable. It used to be (before ghc 7.8) that to get a TypeRep for some type you would have to have a value of that type. E.g., typeOf True gives the TypeRep for the Bool type. If we don't have a value handy of the type, then we will have to make one, e.g., by using undefined. So we could write typeOf (undefined :: Bool). This way of using undefined is rather ugly, and relies on non-strictness to work. Ghc 7.8 add a new, cleaner way of doing it. typeRep :: proxy a -> TypeRep The typeRep function does not need an actual value, but just a proxy for the value. A common proxy is the Proxy type from Data.Proxy: data Proxy a = Proxy Using this type we can now get the TypeRep of a Bool by writing typeRep (Proxy :: Proxy Bool). Note that in the type signature of typeRep the proxy is a type variable. This means we can use other values as proxies, e.g., typeRep ([] :: [Bool]). We can in fact use anything as a proxy that has a structure that unifies with proxy a. For instance, if we want a proxy for the type T we could use T -> T, which is the same as (->) T T. The (->) T part makes of it is the proxy and the last T makes up the a. The extension I propose provides an easy way to write a function of type T -> T, just write (:: T). So to get a TypeRep for Bool we can simply write typeRep (:: Bool). Doesn't that look (deceptively) simple? In fact, my driving force for coming up with this language extension was to get an easy and natural way to write type proxies, and I think using (:: T) for a type proxy is a as easy and natural as it gets (even if the reason it works is rather peculiar). Implementation I've implemented the extension in one Haskell compiler and [...]Sat, 09 Jul 2011 17:54:00 +0000

Impredicative polymorphism, a use case In a recent question on stackoverflow I made a comment about how Haskell can be considered a good imperative language because you can define abstractions to make it convenient. When I was going to make my point by implementing a simple example of it I found that what I wanted to do no longer works in ghc-7.0.4. Here's a simple example of what I wanted to do (which works in ghc-6.12.3). It's a simple library that makes Haskell code look a bit like Python. {-# LANGUAGE ExtendedDefaultRules, TupleSections #-} module Main where import qualified Prelude import Boa last_elt(xs) = def $: do assert xs "Empty list" lst <- var xs -- Create a new variable, lst ret <- var (xs.head) while lst $: do ret #= lst.head -- Assign variable ret lst #= lst.tail return ret first_elt(xs) = def $: do l <- var xs l.reverse -- Destructive reverse return (last_elt(l)) factorial(n) = def $: do assert (n<=0) "Negative factorial" ret <- var 1 i <- var n while i $: do ret *= i i -= 1 return ret test = def $: do print "Hello" print ("factorial 10 =", factorial(10)) main = do test l <- varc [1, 2, 3] print ("first and last:",) print (first_elt(l),) print (last_elt(l)) On the whole it's pretty boring, except for one thing. In imperative languages there is (usually) a disctinction between l-values and r-values. An l-value represent a variable, i.e., something that can be assigned, whereas an r-value is simply a value. So in Python, in the statement x = x + 1 the x on the left is an l-value (it's being assigned), whereas on the right it's an r-value. You can use the same notation for both in most imperative languages. If you want to do the same in Haskell you have (at least) two choices. First you can unify the concepts of l-value and r-value and have a runtime test that you only try to assign variables. So, e.g., 5 = x would type check but have a runtime failure. I will not dwell further on this since it's not very Haskelly. Instead, we want to have l-values and r-values being statically type checked. Here's the interesting bit of the types for a simple example. data LValue a data RValue a instance (Num a) => Num (RValue a) class LR lr instance LR RValue instance LR LValue var :: RValue a -> IO (forall lr . (LR lr) => lr a) (#=) :: LValue a -> RValue a -> IO () foo = do x <- var 42 x #= x + 1 We have two type constructors LValue and RValue representing l-values and r-values of some type a. The r-values is an instance of Num. Furthermore, the class LR where the type is either LValue or RValue. The var function creates a new variable given a value. The return type of var is the interesting part. It says that the return value is polymorphic; it can be used either as an l-value or as r-value. Just as we want. The assignment operator, (#=), takes an l-value, an r-value, and returns nothing. So in the example we expect x to have type forall lr . (LR lr) => lr a, in which case the assignment will type check. If we try to compile this we get Illegal polymorphic or qualified type: forall (lr :: * -> *). LR lr => lr a Perhaps you intended to use -XImpredicativeTypes In the type signature for `var': var :: RValue a -> IO (forall lr. LR lr => lr a) The problem is that universal quantification is not normally allowed as the argument to a type constructor. This requires the impredicative polymorphism extension to ghc. If we turn it on it compiles fine in ghc-6.12.3. But, with ghc-7.0.4 we get Couldn't match expected type `LValue a0' with actual type `forall (lr :: * -> *). LR lr => lr a1' In the first argument of `(#=)', namely `x' In the expression: x #= x + 1 [...]Mon, 02 May 2011 18:28:00 +0000

More points for lazy evaluation In a recent blog post Bob Harper shows one use of laziness, but I think he misses the real import points of laziness. So I will briefly enumerate what I think are the important points of lazyness (excuse me for not coming up with any kind of pun about points). First, I'd like to say that I don't think strict or lazy matters that much in practice; they both work fine. I have programmed in regular non-strict Haskell as well as strict Haskell, and most things carry over well to a strict Haskell. Furthermore, when I say strict language I mean a strict language with at least non-termination as an effect; total languages is another matter. Lazy bindings I like to tell Haskell beginners that any subexpression can be named and "pulled out", modulo name capture, of course. For instance ... e ... is the same as let x = e in ... x ... The key thing is that ... e ... e ... is the same as let x = e in ... x ... x ... so that common subexpressions can be given a name. This is (in general) just wrong in a strict language, of course. Just take the simple example if c then error "BOO!" else 0 Which is not the same as let x = error "BOO!" in if c then x else 0 In this case you can easily fix the problem by delaying the computation with a lambda (a common theme). let x () = error "BOO!" in if c then x () else 0 But for a slightly more complicated example this simple technique goes wrong. Consider map (\ a -> a + expensive) xs where expensive does not depend on a. In this case you want to move the expensive computation out of the loop (cf. loop invariant code motion in imperative languages). Like so let x = expensive in map (\ a -> a + x) xs In a lazy language x will be evaluated exactly zero times or once, just as we want. Using the delaying trick doesn't work here: let x () = expensive in map (\ a -> a + x ()) xs since expensive will get evaluated once for every list element. This is easy enough to remedy, by introducing an abstraction for lazy computations (which will contain an assignment to compute the value just once). The signature for the abstract type of lazy values is something like data Lazy a delay :: (() -> a) -> Lazy a force :: Lazy a -> a Note that the delay needs to take a function to avoid the a being evaluated early. (This is probably what Bob would name a benign effect and is easily programmed using unsafePerformIO, which means it needs careful consideration.) And so we get let x = delay (\ () -> expensive) in map (\ a -> a + force x) xs This isn't exactly pretty, but it works fine. In a language with macros the ugliness can be hidden better. Lazy functions Even strict languages like ML and C have some lazy functions even if they don't call them that, like SML's if, andthen, and orelse. You really need the if construct to evaluate the condition and then one of the branches depending on the condition. But what if I want to define my own type with the same kind of functions? In ML I can't, in C I would have to use a macro. The ability to define new functions that can be used as control constructs is especially important when you want to design embedded domain specific languages. Take the simple example of the when (i.e., one-arm if) function in Haskell. when :: (Monad m) => Bool -> m () -> m () A quite common use of this function in monadic code is to check for argument preconditions in a function, like f x = do when (x < 0) $ error "x must be >= 0" ... If the when function is strict this is really bad, of course, since the call to error will happen before the when is called. Again, one can work around this by using lazy values, like myAnd :: MyBool -> [...]Wed, 20 Apr 2011 22:25:00 +0000

Ugly memoization Here's a problem that I recently ran into. I have a function taking a string and computing some value. I call this function a lot, but a lot of the time the argument has occurred before. The function is reasonably expensive, about 10 us. Only about 1/5 of the calls to the function has a new argument. So naturally I want to memoize the function. Luckily Hackage has a couple packages for memoization. I found data-memocombinators and MemoTrie and decided to try them. The basic idea with memoization is that you have a function like memo :: (a->b) -> (a->b) I.e., you give a function to memo and you get a new function of the same type back. This new function behaves like the original one, but it remembers every time it is used and the next time it gets the same argument it will just return the remembered result. This is only safe in a pure language, but luckily Haskell is pure. In an imperative language you can use a mutable memo table that stores all the argument-result pairs and updates the memo table each time the function is used. But how is it even possible to implement that in a pure language? The idea is to lazily construct the whole memo table in the call to memo, and it will then be lazily filled in. Assume that all values of the argument type a can be enumerated by the method enumerate, we could then write memo like this: memo f = let table = [ (x, f x) | x <- enumerate ] in \ y -> let Just r = lookup y table in r Note how the memo table is constructed given just f, and this memo table is then used in the returned function. The type of this function would be something like memo (Enumerate a, Eq a) => (a->b) -> (a->b) assuming that the class Enumerate has the magic method enumerate. This just a very simplified example, if you tried to use this it would be terrible because the returned function does linear lookup in a list. Instead we want some kind of search tree, which is what the two packages I mention implement. The MemoTrie package does this in a really beautiful way, I recommend reading Conal's blog post about it. OK, enough preliminaries. I used criterion to perform the benchmarking, and I tried with no memoization (none), memo-combinators (comb), and MemoTrie (beau). I had a test function taking about 10us, and then i called this functions with different number of repeated arguments: 1, 2, 5, and 10. I.e., 5 means that each argument occurred 5 times as the memoized function was called. 1 2 5 10 none 10.7 10.7 10.7 10.7 comb 62.6 52.2 45.8 43.4 beau 27.6 17.0 10.4 8.1 So with no memoization the time per call was 10.7 us all the time, no surprise there. With the memo combinators it was much slower than no memoization; the overhead for looking something up is bigger than the cost of computing the result. So that was a failure. The MemoTrie does better, at about an argument repetition of five it starts to break even, and at ten it's a little faster to memoize. Since I estimated my repetition factor in the real code to be about five even the fastest memoization would not be any better then recomputation. So now what? Give up? Of course not! It's time to get dirty. Once you know a function can be implemented in a pure way, there's no harm in implementing the same function in an impure way as long as it presents the pure interface. So lets write the memo function the way it would be done in, e.g., Scheme or ML. We will use a reference to hold a memo table that gets updated on each call. Here's the code, with the type that the function gets. import Data.IORef import qualified Data.Map as M memoIO :: (Ord a) => (a -> b) -> IO (a -> IO b) memoIO f = do v <- newIORef M.empty let f' x = do m <- readIORef v case M.lookup x m of Nothing -[...]Sun, 10 Apr 2011 18:51:00 +0000

Phew! Cleaned out a lot of spam comments in my blog. Hopefully my new settings will prevent the crazy onslaught of spammers.Wed, 10 Jun 2009 21:35:00 +0000

More LLVMRecently someone asked me on #haskell if you could use the Haskell LLVM bindings to compile some abstract syntax to a Haskell function. Naturally I said yes, but then I realized I had only done it for a boring language with just one type. I had no doubt that it could be done for more complicated languages with multiple types, but it might not be totally obvious how. So I decided to write a simple compiler, and this blog post is the result. First, a simple example: main = do let f :: Double -> Double Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)" print (f 2, f 3, f 0) Running this program produces (as expected) (0.25,0.1111111111111111,0.0) What has happened is that the string has been parsed to an abstract syntax tree, translated into LLVM code, then to machine code, and finally turned back into a Haskell callable function. Many things can go wrong along the way, like syntax and type errors, so compile returns a Maybe type to indicate if things went right or wrong. (A more serious version of the compile function would return an error message when something has gone wrong.) The definition of the compilation function is simple and illustrates the flow of the compiler compile :: (Translate a) => String -> Maybe a compile = fmap translate . toTFun <=< mParseUFun The context Translate is there to limit the types that can actually be translated; it's a necessary evil and exactly what types are allowed depends on how advanced we make the compiler. Had we ignored the Maybe type the definitions would have been compile = translate . toTFun . mParseUFun which says, first parse to the type UFun (untyped expressions), then type check and turn it into the type TFun a, and finally translate TFun a into an a by LLVM compilation. Let's see how this all works. The UExp moduleThe first step is to just define an abstract syntax for the expressions that we want to handle. I'm only allowing leading lambdas (this a very first order language), so there's a distiction between the top level UFun type and the expression type UExp. The U prefix indicates that this version of the syntax is not yet type checked. The definition is pretty boring, but here it is: {-# OPTIONS_GHC -fno-warn-unused-binds #-} {-# LANGUAGE RecordWildCards #-} module UExp(Id, UFun(..), UTyp(..), UExp(..), parseUFun, showOp, mParseUFun) where import Data.Maybe import Data.List import Data.Function import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Expr import Text.ParserCombinators.Parsec.Token import Text.ParserCombinators.Parsec.Language type Id = String data UFun = UFun [(Id, UTyp)] UExp data UTyp = UTBol | UTDbl data UExp = UDbl Double -- ^ Double literal | UBol Bool -- ^ Bool literal | UVar Id -- ^ Variable | UApp Id [UExp] -- ^ Function application | ULet Id UExp UExp -- ^ Local binding Naturally, we want to be able to show the expressions, if nothing else so for debugging. So I make a Show instance that shows them in a nice way respecting operator precedences etc. There's nothing exciting going on, the large number of lines is just to cover operator printing. instance Show UFun where showsPrec p (UFun [] e) = showsPrec p e showsPrec p (UFun vts e) = showParen (p>0) (showString "\\ " . foldr (.) (showString "-> ") (map f vts) . showsPrec 0 e) where f (v, t) = showParen True (showString v . showString " :: " . showsPrec 0 t) . showString " " instance Show UTyp where showsPrec _ UTDbl = showString "Double" showsPrec _ UTBol = showString "Bool" instance Show UExp where showsPrec p (UDbl d) = showsPrec p d showsPrec p (UBol b) = showsPrec p b showsPrec _ (UVar i) = showString i showsPrec p (UApp "if" [c, t, e]) =[...]Sat, 07 Feb 2009 20:31:00 +0000

Here's a simple program.

{-# LANGUAGE ExtendedDefaultRules, OverloadedStrings #-} import BASIC main = runBASIC $ do 10 GOSUB 1000 20 PRINT "* Welcome to HiLo *" 30 GOSUB 1000 100 LET I := INT(100 * RND(0)) 200 PRINT "Guess my number:" 210 INPUT X 220 LET S := SGN(I-X) 230 IF S <> 0 THEN 300 240 FOR X := 1 TO 5 250 PRINT X*X;" You won!" 260 NEXT X 270 STOP 300 IF S <> 1 THEN 400 310 PRINT "Your guess ";X;" is too low." 320 GOTO 200 400 PRINT "Your guess ";X;" is too high." 410 GOTO 200 1000 PRINT "*******************" 1010 RETURN 9999 ENDIn some ways this is a step backwards, since it requires some language extensions in Main. But I wanted to be able to use semicolon in the print statement.

But there it is, an exciting game!

******************* * Welcome to HiLo * ******************* Guess my number: 50 Your guess 50 is too high. Guess my number: 25 Your guess 25 is too low. Guess my number: 37 Your guess 37 is too low. Guess my number: 44 Your guess 44 is too low. Guess my number: 47 Your guess 47 is too low. Guess my number: 48 1 You won! 4 You won! 9 You won! 16 You won! 25 You won!

Fri, 06 Feb 2009 16:50:00 +0000

#includeAnd running itint main(int argc, char **argv) { double i, s; s = 0; for (i = 1; i < 100000000; i++) s += 1/i; printf("Almost infinity is %g\n", s); }

Lennarts-Computer% gcc -O3 inf.c -o inf Lennarts-Computer% time ./inf Almost infinity is 18.9979 1.585u 0.009s 0:01.62 97.5% 0+0k 0+0io 0pf+0wAnd now the Haskell code:

import BASIC main = runBASIC' $ do 10 LET I =: 1 20 LET S =: 0 30 LET S =: S + 1/I 40 LET I =: I + 1 50 IF I <> 100000000 THEN 30 60 PRINT "Almost infinity is" 70 PRINT S 80 ENDAnd running it:

Lennarts-Computer% ghc --make Main.hs [4 of 4] Compiling Main ( Main.hs, Main.o ) Linking Main ... Lennarts-Computer% ./Main Almost infinity is 18.9979 CPU time: 1.57sAs you can see it's about the same time. In fact the assembly code for the loops look pretty much the same. Here's the Haskell one:

LBB1_1: ## _L4 movsd LCPI1_0, %xmm2 movapd %xmm1, %xmm3 addsd %xmm2, %xmm3 ucomisd LCPI1_1, %xmm3 divsd %xmm1, %xmm2 addsd %xmm2, %xmm0 movapd %xmm3, %xmm1 jne LBB1_1 ## _L4

Fri, 06 Feb 2009 11:12:00 +0000

import BASIC main = runBASIC $ do 10 LET X =: 1 20 PRINT "Hello BASIC world!" 30 LET X =: X + 1 40 IF X <> 11 THEN 20 50 ENDYes, it runs. (I'm sorry about the

Wed, 21 Jan 2009 01:07:00 +0000

So here are some updated numbers for computing an option prices for 10,000,000 options:

- Pure Haskell: 8.7s
- LLVM: 2.0s

normcdf x = x %< 0 ?? (1 - w, w) where w = 1.0 - 1.0 / sqrt (2.0 * pi) * exp(-l*l / 2.0) * poly k k = 1.0 / (1.0 + 0.2316419 * l) l = abs x poly = horner coeff coeff = [0.0,0.31938153,-0.356563782,1.781477937,-1.821255978,1.330274429]A noteworthy thing is that exactly the same code can be used both for the pure Haskell and the LLVM code generation; it's just a matter of overloading.

Again, by using overloading, the exact same code can be used to compute over vectors of numbers as with individual numbers.

So what about performance? I used four element vectors of 32 bit floating point numbers and got these results:

- Pure Haskell: 8.7s
- LLVM, scalar: 2.0s
- LLVM, vector: 1.1s
- C, gcc -O3: 2.5s

- Only on MacOS does the LLVM package give you fast primitive functions, because that's the only platform that seems to have this as a standard.
- The vector version of floating point comparison has not been implemented in the LLVM yet.
- Do not use two element vectors of type 32 bit floats. This will generate code that is wrong on the x86. I sent in a bug report about this, but was told that it is a feature and not a bug. (I kid you not.) To make the code right you have to manually insert EMMS instructions.
- The GHC FFI is broken for all operations that allocate memory for a
`Storable`, e.g.,`alloca`,`with`,`withArray`etc. These operations do not take the alignment into account when allocating. This means that, e.g., a vector of four floats may end up on 8 byte alignment instead of 16. This generates a segfault.

[Edit:] Added point about broken FFI.

Sat, 10 Jan 2009 15:33:00 +0000

LLVM arithmetic So we want to compute x2-5x+6 using the Haskell LLVM bindings. It would look some something like this. xsq <- mul x x x5 <- mul 5 x r1 <- sub xsq x5 r <- add r1 6 Not very readable, it would be nicer to write r <- x^2 - 5*x + 6 But, e.g., the add instruction has the (simplified) type Value a -> Value a -> CodeGenFunction r (Value a), where CodeGenFunction is the monad for generating code for a function. (BTW, the r type variable is used to keep track of the return type of the function, used by the ret instruction.) We can't change the return type of add, but we can change the argument type. type TValue r a = CodeGenFunction r (Value a) add' :: TValue r a -> TValue r a -> TValue r a add' x y = do x' CodeGenModule (Function (a -> IO b)) mFib = recursiveFunction $ \ rfib n -> n %< 2 ? (1, rfib (n-1) + rfib (n-2)) It's impossible to generate code for mFib directly; code can only be generated for monomorphic types. So a type signature is needed when generating code to force a monomorphic type. main = do let fib :: Int32 -> Double fib = unsafeGenerateFunction mFib fib' :: Int16 -> Int64 fib' = unsafeGenerateFunction mFib print (fib 22, fib' 22) Another example Let's try a more complex example. To pick something mathematical to have lots of formulae we'll do the Cumulative Distribution Function. For the precision of a Float it can be coded like this in Haskell (normal Haskell, no LLVM): normcdf x = if x < 0 then 1 - w else w where w = 1 - 1 / sqrt (2 * pi) * exp(-l*l / 2) * poly k k = 1 / (1 + 0.2316419 * l) l = abs x poly = horner coeff coeff = [0.0,0.31938153,-0.356563782,1.781477937,-1.821255978,1.330274429] horner coeff base = foldr1 multAdd coeff where multAdd x y = y*base + x We cannot use this directly, it has type normcdf :: (Floating a, Ord a) => a -> a. The Ord context is a problem, because there are no instance of Ord for LLVM types. The comparison is the root of the problem, since it returns a Bool rather than a TValue r Bool. It's possible to hide the Prelude and overload the comparisons, but you cannot overload the if construct. So a little rewriting is necessary regardless. Let's just bite the bullet and change the first line. normcdf x = x %< 0 ? (1 - w, w) And with that change, all we need to do is mNormCDF = createFunction ExternalLinkage $ arithFunction $ normcdf main :: IO () main = do writeFunction "CDF.bc" (mNormCDF :: CodeGenModule (Function (Float -> IO Float))) So what happened here? Looking at normcdf it contains a things that the LLVM cannot handle, like lists. But all the list operations happen when the Haskell code runs and nothing of that is left in the LLVM code. If you optimize and generate code for normcdf it looks like this (leaving out constants and half the code): __fun1: subl $28, %esp pxor %xmm0, %xmm0 ucomiss 32(%esp), %xmm0 jbe LBB1_2 movss 32(%esp), %xmm0 mulss %xmm0, %xmm0 divss LCPI1_0, %xmm0 movss %xmm0, (%esp) call _expf fstps 24(%esp) movss 32(%esp), %xmm0 mulss LCPI1_1, %xmm0 movss %xmm0, 8(%esp) movss LCPI1_2, %xmm0 movss 8(%esp), %xmm1 addss %xmm0, %xmm1 movss %xmm1, 8(%esp) movaps %xmm0, %xmm1 divss 8(%esp), %xmm1 movaps %xmm1, %xmm2 mulss LCPI1_3, %xmm2 addss LCPI1_4, %xmm2 mulss %xmm1, %xmm2 addss LCPI1_5, %xmm2 mulss %xmm1, %xmm2 addss LCPI[...]Wed, 07 Jan 2009 16:14:00 +0000

LLVM The LLVM, Low Level Virtual Machine, is a really cool compiler infrastructure project with many participants. The idea is that if you want to make a new high quality compiler you just have to generate LLVM code, and then there are lots of optimizations and code generators available to get fast code. There are different ways to generate input to the LLVM tools. You can generate a text file with LLVM code and feed it to the tools, or you can use bindings for some programming language and programmatically build the LLVM code. The original bindings from the LLVM project is for C++, but they also provide C bindings. On top of the C bindings you can easily interface to other languages; for instance O'Caml and Haskell. There are also diffent things you can do to LLVM code you have build programmatically. You can transform it, you can write to a file, you can run an interpreter on it, or execute it with a JIT compiler. Haskell LLVM bindings There is a Haskell binding to the LLVM. It has two layers. You can either work on the C API level and have ample opportunity to shoot your own limbs to pieces, or you can use the high level interface which is mostly safe. Bryan O'Sullivan did all the hard work of taking the C header files and producing the corresponding Haskell FFI files. He also made a first stab at the high level interface, which I have since change a lot (for better or for worse). An example Let's do an example. We'll write the LLVM code for this function f x y z = (x + y) * z In Haskell this function is polymorphic, but when generating machine code we have to settle for a type. Let's pick Int32. (The Haskell Int type cannot be used in talking to LLVM; it doesn't a well defined size.) Here is how it looks: mAddMul :: CodeGenModule (Function (Int32 -> Int32 -> Int32 -> IO Int32)) mAddMul = createFunction ExternalLinkage $ \ x y z -> do t Int32 -> IO Int32. That last is the type of f above, except for that IO. Why the IO? The Haskell LLVM bindings forces all defined functions to return something in the IO monad, because there are no restriction on what can happen in the LLVM code; it might very well do IO. So to be on the safe side, there's always an IO on the type. If we know the function is harmless, we can use unsafePerformIO to get rid of it. So the code does a createFunction which does what the name suggests. The ExternalLinkage argument says that this function will be available outside the module it's in, the obvious opposite being InternalLinkage. Using InternalLinkage is like saying static on the top level in C. In this examples it doesn't really matter which we pick. The function has three arguments x y z. The last argument to createFunction should be a lambda expression with the right number of arguments, i.e., the number of arguments should agree with the type. We the use monadic syntax to generate an add, mul, and ret instruction. The code looks like assembly code, which is the level that LLVM is at. It's a somewhat peculiar assembly code, because it's on SSA (Static Single Assignment) form. More about that later. So what can we do with this function? Well, we can generate machine code for it and call it. main = do addMul Int32 -> IO Int32, so it has to be called in the IO monad. Since this is a pure function, we can make the type pure, i.e., Int32 -> Int32 -> Int32 -> Int32. main = do addMul IO Word32)) mFib = do fib do -- Create the two basic blocks. recurse 2, using the cumbersome plus to add the results. defineBasicBlock recurse x1 Phi (a, b) where phis bb (a, b) = do [...]Wed, 10 Dec 2008 15:27:00 +0000

module MkPerson(O: sig type xString type xDouble val opConcat : xString -> xString -> xString val opShow : xDouble -> xString end) = struct open O type person = Person of (xString * xString * xDouble) let display (Person (firstName, lastName, height)) = opConcat firstName (opConcat lastName (opShow height)) end module BasicPerson = MkPerson(struct type xString = string type xDouble = float let opConcat = (^) let opShow = string_of_float end) open BasicPerson let _ = let p = Person ("Stefan", "Wehr", 184.0) in display pNote how the

Similarely, the `open BasicPerson` makes the operations from that module avaiable, so that `Person` and `display` can be used in a simple way.

Wed, 10 Dec 2008 12:34:00 +0000

f :: forall a . a -> a f = \ x -> x b :: Bool b = f TrueThe way I like to think of this (and what happens in ghc) is that this is shorthand for something more explicit, namely the F

f :: forall (a::*) . a -> a f = /\ (a::*) -> \ (x::a) -> x b :: Bool b = f @Bool TrueI'm using

Not something a little more complicated (from my previous post)

class C a b where x :: a y :: b f :: (C a b) => a -> [a] f z = [x, x, z]The type of

x :: forall a b . (C a b) => aSo whenever

f :: forall a b . (C a b) => a -> [a] f = /\ (a::*) (b::*) -> \ (z::a) -> [ x @a @b1, x @a @b2, z]The reason for the ambiguity in type checking is that the type check cannot figure out that the

So I suggest that it should be possible to use explicit type application in Haskell when you want to. The code would look like this

f :: forall a b . (C a b) => a -> [a] f z = [ x @a @b, x @a @b, z]The order of the variables in the

class Ops t where data XString t :: * (+++) :: XString t -> XString t -> XString t instance Ops Basic where type XString Basic = String (+++) = (++)So the class declaration says I'm going to use data types (which was my final try and which works very nicely). But in the instance I provide a type synonym instead. This would be like using a

Wed, 10 Dec 2008 10:55:00 +0000

The abstraction continues I got several comments to my lament about my attempts at abstraction in my previous blog post. Two of the comments involve adding an extra argument to display. I dont regard this as an acceptable solution; the changes to the code should not be that intrusive. Adding an argument to a function is a change that ripples through the code to many places and not just the implementation of display. Reiner Pope succeeded where I failed. He split up the operations in Ops into two classes and presto, it works. data Person t = Person { firstName :: XString t, lastName :: XString t, height :: XDouble t } class (Show s, IsString s) => IsXString s where (+++) :: s -> s -> s class (Num d, IsXString s) => IsXDouble s d where xshow :: d -> s class (IsXDouble (XString t) (XDouble t)) => Ops t where type XString t :: * type XDouble t :: * instance IsXString String where (+++) = (++) instance IsXDouble String Double where xshow = show data Basic = Basic instance Ops Basic where type XString Basic = String type XDouble Basic = Double display :: Ops t => Person t -> XString t display p = firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1) That's neat, but a little fiddly if there are many types involved. Another problem Armed with this solution I write another function. incSpace :: (Ops t) => XDouble t -> XString t incSpace x = xshow x +++ " " It typechecks fine. But as far as I can figure out there is no way to use this function. Let's see what ghci says: > :t incSpace (1 :: XDouble Basic) :: XString Basic :1:0: Couldn't match expected type `[Char]' against inferred type `XString t' In the expression: incSpace (1 :: XDouble Basic) :: XString Basic :1:10: Couldn't match expected type `XDouble t' against inferred type `Double' In the first argument of `incSpace', namely `(1 :: XDouble Basic)' In the expression: incSpace (1 :: XDouble Basic) :: XString Basic Despite my best efforts at providing types it doesn't work. The reason being that saying, e.g., (1 :: XDouble Basic) is the same as saying (1 :: Double). And that doesn't match XDouble t. At least not to the typecheckers knowledge. In the example of display things work because the parameter t occurs in Person t which is a real type and not a type family. If a type variable only occurs in type family types you are out of luck. There's no way to convey the information what that type variable should be (as far as i know). You can "solve" the problem by adding t as an argument to incSpace, but again, I don't see that as a solution. In the paper ML Modules and Haskell Type Classes: A Constructive Comparison Wehr and Chakravarty introduce a notion of abstract associated types. That might solve this problem. I really want XDouble and XString to appear as abstract types (or associated data types) outside of the instance declaration. Only inside the instance declaration where I provide implementations for the operations do I really care what the type is. A reflection on type signatures If I write f x = x Haskell can deduce that the type is f :: a -> a. If I instead write f :: Int -> Int f x = x Haskell happily uses this type. The type checker does not complain as to say "Sorry dude, but you're wrong, the type is more general than what you wrote.". I think that's nice and polite. Now a different example. class C a b where x :: a y :: b f z = [x, x, z] What does ghc have to say about the type of f? f :: (C a b, C [...]Wed, 10 Dec 2008 00:10:00 +0000

A somewhat failed adventure in Haskell abstraction I usually blog about weird and wonderful things you can do in Haskell. Today I'm going to talk about something very plain and not wonderful at all. If you want to try out the code below, use these Haskell extensions: {-# LANGUAGE TypeFamilies, MultiParamTypeClasses, OverloadedStrings, FlexibleInstances, TypeSynonymInstances, ScopedTypeVariables, FunctionalDependencies, RecordWildCards, FlexibleContexts, GeneralizedNewtypeDeriving #-} The simple problem We want to define a type for a person which has a few fields and operations. Like this module Person(Person(..), display) where data Person = Person { firstName :: String, lastName :: String, height :: Double } display :: Person -> String display p = firstName p ++ " " ++ lastName p ++ " " ++ show (height p + 1) Very simple. To use it we can just import the module and the write something like print $ display $ Person { firstName = "Manuel", lastName = "Peyton Jones", height = 255 } But being efficiancy conscious I'm not happy with using String and Double. I'd like to experiment with different types for these. Maybe I should use ByteString and Int instead? Simple enough, let's abstract out the types and operations into a different module. module Ops(XString, XDouble, (+++), xshow) where import Data.String newtype XString = XString String deriving (Eq, Show, IsString) newtype XDouble = XDouble Double deriving (Eq, Show, Num) (+++) :: XString -> XString -> XString XString x +++ XString y = XString (x ++ y) xshow :: XDouble -> XString xshow (XDouble x) = XString (show x) module Person(Person(..), display) where import Ops data Person = Person { firstName :: XString, lastName :: XString, height :: XDouble } display :: Person -> XString display p = firstName p +++ " " +++ lastName p +++ " " +++ show (height p + 1) There, problems solved. By changing the import in the Person module you can try out different types for XString and XDouble. No, this is not problem solved. To try out different implementations I need to edit the Person module. That's not abstraction, that's obstruction. It should be possible to write the code for the Person module once and for all once you decided to abstract, and then never change it again. I also didn't really want to necessarily have newtype in my module. Maybe I'd want this: module Ops(XString, XDouble, (+++), xshow) where type XString = String type XDouble = Double (+++) :: XString -> XString -> XString (+++) = (++) xshow :: XDouble -> XString xshow = show You can define Ops that way, but then the implementation of Ops may leak into the Person module. What you really want is to type check Person against the signature of Ops, like interface Ops where type XString type XDouble (+++) :: XString -> XString -> XString xshow :: XDouble -> XString And later supply the actual implementation. Alas, Haskell doesn't allow this. In ML (SML or O'Caml) this would be solved by using a functor. The Person module would be a functor that takes the Ops module as an argument and yields a new module. And then you can just plug and play with different Ops implementations. This is where ML shines and Haskell sucks. Type classes But the defenders of Haskell superiority say, Haskell has type classes, that's the way to abstract! So let's make Ops into a type class. Let's do old style with multiple parameters first. Since Ops defines two types it will correspond to having two type parameters to the clas[...]Thu, 03 Jul 2008 11:38:00 +0000

Lost and found If I write 10^8 in Haskell, how many multiplications will be used to compute the power? A stupid question? Well, for this example, but if I was computing x^8 and x has 100000 digits then I'd care. So how can I find out? I can look at the definition of the exponentiation operator. Here it is, from the Haskell report and GHC 6.8: (^) :: (Num a, Integral b) => a -> b -> a _ ^ 0 = 1 x ^ n | n > 0 = f x (n-1) x where f _ 0 y = y f a d y = g a d where g b i | even i = g (b*b) (i `quot` 2) | otherwise = f b (i-1) (b*y) _ ^ _ = error "Prelude.^: negative exponent" It's a bit involved, but decipherable. Another way would be to insert some kind of debug trace message in the multiplication. Traced values I'd like to show a different way. Here's a ghci session: Prelude> :m +Debug.Traced Prelude Debug.Traced> let x = 10 :: Traced AsValue Integer Prelude Debug.Traced> let y = x ^ 8 Prelude Debug.Traced> y 100000000 Prelude Debug.Traced> :t y y :: Traced AsValue Integer Prelude Debug.Traced> asExp y 10 * 10 * (10 * 10) * (10 * 10 * (10 * 10)) Prelude Debug.Traced> asSharedExp y let _2 = 10 * 10; in _2 * _2 * (_2 * (10 * 10)) Prelude Debug.Traced> :t asSharedExp y asSharedExp y :: Traced AsExp Integer Prelude Debug.Traced> So what's going on? The value of x is Traced Integer, which means that there's some magic going on. The variable can be used as usual, for instance in computing x^8. A traced value can also be shown as an expression, which is what showAsExp does. So a traced value is somewhat like the symbolic values I had in an earlier post, but in addition to having a symbolic representation they also have a normal value. But the output from showAsExp doesn't really help in answering how many multiplications there are, since the shown expression has no sharing; it is totally flattened. The showAsShared function is the black magic here, it recovers the sharing, and we can see what happened. What we see is that there are actually five (5) multiplications involved in computing 10^8. This shows that the definition of exponentiation is suboptimal, since it can be done with three multiplications (three repeated squarings). The showAsShared really does have some black magic. It recovers information that is not part of the Haskell semantics, so from that we can conclude that it must contain the powerful incantation unsafePerformIO somewhere. How does it reveal implementation secrets? Look at this: Prelude Debug.Traced> asSharedExp $ let a = 1 + 2 in a * a let _1 = 1 + 2; in _1 * _1 Prelude Debug.Traced> asSharedExp $ (1+2) * (1+2) (1 + 2) * (1 + 2) Prelude Debug.Traced> The let expression and the expression where the variable has been expanded are semantically equal in Haskell, so no (pure) function can possibly be able to give different results for them. OK, so how does it work? I'll show a simplified version of the Traced module here that only deals with one traced type, but it can be extended. The (soon to be available) hackage package contains the extended version. In the Traced type we need to represent expressions. We only need constants and function applications. data Traced a = Con a | Apply a String [Traced a] The function application contains the value, the name of the function, and the arguments the function was applied to. In the exported interface from the module we want to be able to convert to and from the Trace[...]Sun, 09 Mar 2008 21:45:00 +0000

Simple reflections of higher order In a recent blog post by Twan van Laarhoven he showed how to reflect Haskell expressions so that we can actually see them symbolically. This has now been included in lambdabot on the Haskell IRC (source). Let's play with it for a moment: *SimpleReflect> x+y x + y *SimpleReflect> foldr f z [1,2,3] f 1 (f 2 (f 3 z)) *SimpleReflect> let swap (x,y) = (y, x) *SimpleReflect> swap (a,b) (b,a) *SimpleReflect> map swap [(a,b),(c,d)] [(b,a),(d,c)] *SimpleReflect> :t x x :: Expr That's very cool! Read Twan's post to find out how it works. All we need to know is on the last line, x :: Expr. So Expr is a special type with special instances. Let's try something else *SimpleReflect> \ x -> x + y :1:0: No instance for (Show (Expr -> Expr)) ... Oh, that's annoying, we can't print functions. But why can't we? If we made up some variable of type Expr and then applied the function we'd get something we could print. Let's add some code to Twan's module. I've just randomly picked the variable a. (To make this compile you need the extension language FlexibleInstances.) instance Show (Expr -> Expr) where show f = "\\ " ++ show a ++ " -> " ++ show (f a) And try again with the example *SimpleReflect> \ x -> x + y \ a -> a + y Pretty smooth. Except it doesn't really work. *SimpleReflect> \ x -> x + a \ a -> a + a Those two functions are not the same. We can't just arbitrarily pick the variable a since it might be free in the expression. So we need to pick a variable that is not free in the expression. How could we do that? No matter what we pick it could be used. And we have no idea what is being used. Or do we? If we just turn the function in to text we could look at the string, and pick some variable that is unused in that string. This is really a gruesome hack, but who cares? How do we print the function? Just invent some variable, I use _, turn it into an expression and tokenize the string. We will then have a string of tokens not to use. To find a variable to use, just pick an infinite supply of variables, remove the ones being used, and then pick one of the remaining ones. instance Show (Expr -> Expr) where show f = "\\ " ++ show v ++ " -> " ++ show (f v) where v = var (head vars) vars = supply \\ tokenize (show $ f $ var "_") supply = [ "x" ++ show i | i \ x -> x + y \ x1 -> x1 + y *SimpleReflect> \ x -> x + var "x1" \ x2 -> x2 + x1 OK, what about multiple arguments? *SimpleReflect> \ x y -> x + y + z :1:0: No instance for (Show (Expr -> Expr -> Expr)) ... Well, yeah, that's true. There is no such instance. But wait, why is the instance we have for the type Expr->Expr? No particular reason, that's just what I wrote. In fact it works equally well for Expr->r as long as we can show r, because that's the only thing we do with f v. So we change the first line of the instance: instance (Show r) => Show (Expr -> r) where And now *SimpleReflect> \ x y -> x + y + z \ x2 -> \ x1 -> x2 + x1 + z *SimpleReflect> foldr (.) id [f::Expr->Expr,g,h] -- a little type help needed \ x1 -> f (g (h x1)) *SimpleReflect> scanr (.) id [(*2),f::Expr->Expr,(+1)] [\ x1 -> f (x1 + 1) * 2,\ x1 -> f (x1 + 1),\ x1 -> x1 + 1,\ x1 -> x1] Well, that wasn't too hard. So let's try another example. *SimpleReflect> \ (x,y) -> x+y+z :1:0: No instance for (Show ((Expr, Expr) -> Expr)) ... Hmmm, yes, the argument must be an Expr. That's [...]Fri, 09 Nov 2007 18:26:00 +0000

Some lambda calculus examples Syntax In a previous blog entry I described a simple evaluator and type checker for the lambda cube, i.e., various forms of lambda calculus. Here I'm going to show some examples of code in pure typed λ-calculus. All the examples are typable in Fω; the full lambda cube is not necessary. Before doing any examples we'd better have some syntax that is not too painful, because writing λ-expression in raw Haskell is tedious. The syntax for variables, * kind, and application is easy, I'll just use Haskell syntax. For λ-expressions the Haskell syntax doesn't allow explicit type annotations, but various Haskell compiler implement that extension, so I'll just pick that. So a λ will be written, "\ (var::type) -> expr". And as in Haskell I'll allow multiple variables between the "\" and "->"; it's just a shorthand for multiple lambdas. So what about the dependent function type? The syntax (x:t)→u suggests (x::t)->u, so I'll use that. And when the variable doesn't occur we'll write t->u as usual. For type variables Haskell (well, not Haskell 98, but extensions) uses forall (a::*) . t, so I'll allow that too. An example, the identity function: \ (a::*) (x::a) -> x with type forall (a::*) . a->a And using it id Int 5 Writing a pretty printer and parser for this is pretty straight forward so I'll skip that and just point you to the code. BTW, instead of using Parsec for the parser like everyone else I used ReadP. The ReadP library is very nice, partly because the alternative operator is actually commutative (unlike Parsec). But the error messages suck. Enter let Now if we want to use, say, the identity function more than once we need to name it. There is a mechanism for that, namely λ. But it looks awkward. Look: (\ (id :: forall (a::*) . a->a) -> ... id ... id ... id ...) (\ (a::*) (x::a) -> x) What makes it awkward is that the name, id, is far away from the body, \ (a::*) (x::a) -> x. From Haskell we are more used the let and where expressions. So let's add that let. Instead of what we have above we'll write let id :: forall (a::*) . a->a = \ (a::*) (x::a) -> x in ... id ... id ... id ... The let construct could be just be syntactic suger for a lambda and an application, but I've decided to add it as a constructor to the expression type instead. | Let Sym Type Expr Expr Adding a new constructor means that we have to modify all the functions operating on Expr, and it's extra much work because Let is a variable binding construct. For substitution we just cheat a little and use the expansion into an application and λ sub (Let i t e b) = let App (Lam i' t' b') e' = sub (App (Lam i t b) e) in Let i' t' e' b' And for normal form we use the same trick. spine (Let i t e b) as = spine (App (Lam i t b) e) as And for now, let's extend the type checker in the same way. To make definitions a little less verbose I'll allow multiple bindings. E.g. let x :: Int = g a; y :: Int = f x in h x y It's just multiple nested Lets in the obvious way. Another shorthand. I'll allow the identity function to be written let id (a::*) (x::a) :: a = x in ... The translation is pretty easy let f (x1::t1) ... (xn::tn) :: t = b in e is let f :: forall (x1::t1) ... (xn::tn) . t = \ (x1::t1) ... (xn::tn) -> b in e And finally, to make it even more like Haskell, I'll allow the type si[...]Tue, 06 Nov 2007 16:28:00 +0000

Benchmarking ray tracing, Haskell vs. OCaml On his web site about OCaml Jon Harrop has a benchmark for a simple ray tracing program written in a number of languages. When I saw it I wondered why Haskell was doing so badly, especially since this benchmark was taken as some kind of proof that Haskell performs poorly in "real life". So I have rerun the benchmarks. I've also rewritten the Haskell versions of the programs. The Haskell versions on Jon's web site were OK, but they were too far from the OCaml versions for my taste. I prefer to keep the programs very similar in a situation like this. My rewrite of the benchmarks from OCaml to Haskell was done with a minimum of intelligence. Here are the only things I did that needed creative thought: Use the type Vector to represent vectors instead of a tuple. This allows the components to be strict. Use the type Scene instead of a tuple to represent a scene. The tuple used in the OCaml code uses the dubious feature of equi-recursive types (even Xavier thinks it's strange enough to have a flag to enable it). Rewrite the loop that computes a pixel's value using an accumulating updatable variable into a list comprehension that sums the list. Finally, the compiler flags needed a bit of tweaking to get good performance, even though "-O3 -funbox-strict-fields -fexcess-precision -optc-ffast-math" were pretty obvious. In addition to this I made the code look a little more Haskellish, e.g., using overloading to allow + and - on vectors. This is really just minor syntactic changes, but makes the code more readable. To make the program size comparison fair I removed some dead code from the OCaml code. I then reran the benchmarks using Haskell, OCaml, and C++. The benchmarks are five programs that starts from very simple ray tracing and specializing the program more and more to speed it up. The numbers are in the tables below. The time is execution time in second, the characters are non-white characters in the file, and the lines are the number of lines in the file. To ease comparison I also include the relative numbers compared to OCaml (smaller numbers are better). Interestingly, and unlike Jon's benchmark, the Haskell code is always smaller than the OCaml code. Furthermore, the Haskell code ranges from much faster to slightly faster than the OCaml code. Again, this is very unlike Jon's benchmark. I find the unoptimized version of the benchmark especially interesting since Haskell is 5 times(!) faster than OCaml. I've not investigated why, but I suspect laziness. Results The programs, ray1-ray5, are variations on the ray tracer as given on Jon's web site. I've used the same size metrics as Jon does. Haskell: My Haskell code compiled with ghc 6.8.1 Haskell old: Jon's Haskell code, compiled with ghc 6.8.1 Haskell old 6.6: Jon's Haskell code, compiled with ghc 6.1.1 OCaml: Jon's OCaml code C++: Jon's C++ code Time: execution time is second Char: number of non-white chracters in the program Lines: number of lines in the program Rel T: execution time relative to OCaml Rel C: non-white characters relative to OCaml Rel L: lines relative to OCaml Mem: Maximum resident memory ray1 Time Chars Lines Rel T Rel C Rel LMem Haskell 15.3 1275 51 0.202 0.990 1.0205M Haskell, old 15.8 1946 88 0.208 1.511 1.7609M [...]