Subscribe: nominolo's Blog
Added By: Feedage Forager Feedage Grade B rated
Language: English
based  bit  code  error  expr  haskell  instruction  instructions  interpreter  macros  new  package  result  syntax  version 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: nominolo's Blog

nominolo's Blog

Updated: 2015-11-27T16:08:44.224+01:00


Beyond Package Version Policies


When I read the announcement of the latest GHC release candidate, I did not feel excitement but rather annoyance. The reason is that now I have to go and check all my packages' dependency specifications and see if they require a version bump. I was not the only one. In the following I sketch an approach that I think could take out most of the pain we currently experience in Hackage ecosystem. The Problem The reason for this annoyance is the rather strict Hackage package versioning policy (PVP). The PVP specifies the format of the version number of a Hackage package as a sequence of four integers separated by dots: majorA.majorB.minor.patchlevel The top two digits denote the major version number, and must be incremented if a potentially breaking change is introduced in a new package release. The minor number must be incremented if new functionality was added, but the package is otherwise still compatible with the previous version. Finally, a patchlevel increment is necessary if the API is unchanged and only bugfixes or non-functional changes (e.g., documentation) were made. This sounds fairly reasonable and is basically the same as what is used for shared libraries in the C world. When specifying dependencies on other packages, authors are strongly encouraged to specify upper bounds on the major version. This is intended to avoid breaking the package if a new major version of the dependency is released (Cabal-install always tries to use the latest possible version of a dependency). If the package also works with a newer version of the dependency, then the author is expected to release a new version of his/her library with an increased upper bound for the dependency version.  In Haskell, unfortunately, this system doesn't work too well for a number of reasons: Say, my package P depends on package A-1.0 and I now want to test if it works with the newly released version A-1.1. My package also depends on package B-0.5 which in turn also depends on A-1.0. GHC currently cannot link two versions of the same package into the same executable, so we must pick one version that works with both -- in this case that's A-1.0. D'oh! I now have two options: (a) wait for the author of package B to test it against A-1.1, or (b) do it myself. If I choose option (b) I also have to send my patch to the author of B, wait for him/her to upload the new version to Hackage and only then can I upload my new version to Hackage. The problem is multiplied by the number of (transitive) dependencies of my package and the number of different authors of these packages. This process takes time (usually months) and the fast release rate of GHC (or many other Haskell packages, for that matter) doesn't make it any easier. Packages get major-version upgrades rather frequently. One reason is that many Haskell libraries are still in flux. Another is that if a package adds a new instance, a major version upgrade is required. We can protect against new functions/types being added to a package because we can use explicit import lists. New instances are imported automatically, and there's no way to hide them when importing a module. A package version is a very crude and conservative approximation that a dependent package might break. Generally, I think it's a good thing that Haskell packages are updated frequently and improved upon. The problem is that the current package infrastructure and tools don't work well with it. The PVP is too conservative. A Better Approach The key notion is to track dependencies at the level of individual functions, types, etc. rather than at the level of whole packages. When a package P depends on another package A it usually doesn't depend on the whole package. Most of the time P just depends on a few functions and types. If some other part of A is changed, that shouldn't affect P. We have so much static information available to us, it's a shame we're not taking advantage of it. Consider the following system:When I compile my code, the compiler knows exactly which functions, types, etc. my program uses and from [...]

Implementing Fast Interpreters: Discussion & Further Reading


My last article on "Implementing Fast Interpreters" was posted to both Hacker News and the Programming subreddit. Commenters pointed out some valid questions and some related work. This post aims to address some of the issue and collect the related work mentioned in the HN and Reddit comments and some more. Why not write a simple JIT? The main benefit of an interpreter usually is simplicity and portability. Writing the interpreter in assembly makes it a bit harder to write and you lose portability. So, if we have to go down to the level of assembly, why not write a simple template-based code generator? Wouldn't that be faster? The answer is: it depends. As Mike Pall (author of LuaJIT) pointed out, LuaJIT v1 was based on a simple JIT and does not consistently beat the LuaJIT v2 interpreter (which is written in assembly). Performance comparison of LuaJIT v1.1 (simple JIT) vs. LuaJIT v2 interpreter. A JIT compiler (even a simple one) needs to: Manage additional memory. If the code is short-lived (e.g., there is another optimisation level) then unused memory must be reclaimed and fragmentation may become an issue. Mark that memory as executable (possibly requiring flushing the instruction cache or operating system calls). Manage the transition between compiled code segments. E.g., we need a mapping from bytecode targets to corresponding machine code. If the target hasn't been compiled, yet, this can be done on-demand and the branch instruction can be updated to directly jump to the target. If the target may ever change in the future, we also need a way to locate all the branches to it in order to invalidate them if necessary. If the interpreter has multiple execution modes (e.g., a trace-based JIT has a special recoding mode which is only used for a short time), then code needs to be generated for each execution mode. All this is quite a bit more complicated than a pure interpreter, even one written in assembler. Whether it's a good reason to take on this complexity depends on how high the interpreter overhead actually is. If the bytecode format is under our control then we can optimise it for our interpreter as done in LuaJIT 2. This isn't always the case, though. The DynamoRIO system is a runtime instrumentation framework for x86. The x86 instruction format is very complex and thus expensive to decode. DynamoRIO therefore does not use an interpreter and instead decodes the host program into basic blocks. (No code generation is really needed, because DynamoRIO emulates x86 on x86, but you could imagine using the same technique to emulate x86 on, say, ARM.) The challenges (and solutions) for this approach are discussed in Derek Bruening's excellent PhD thesis on DynamoRIO. Bebenita et al. describe an interpreter-less JIT-compiler design for a Java VM in their paper "Trace-Based Compilation in Execution Environments without Interpreters" (PDF). DSLs for generating interpreters The PyPy project generates a C interpreter and a trace-based JIT based on a description in RPython. "Restricted Python", a language with Python-like syntax, but C-with-type-inference like semantics. Laurence Tratt described his experiences using PyPy in his article "Fast Enough VMs in ast Enough Time". PyPy's toolchain makes many design decisions for you; its usefulness therefore depends on whether these decisions align with your goals. The PyPy Status Blog contains many examples of language VMs written in PyPy (e.g. PHP, BF, regular expressions). A DSL aimed at developing a fast assembly interpreter can still be useful for other purposes. One problem in implementing a JIT compiler is ensuring that the semantics of the compiled code match the semantics of the interpreter (or baseline compiler). By having a slightly higher-level abstraction it could become possible to use the same specification to both generate the interpreter and parts of the compiler. For example, in a trace-based JIT, the code for recording and interpreting looks very similar: Interpreter: Trace Recorder: mov tmp, [BASE + 8[...]

ARM's New 64 Bit Instruction Set


You may have heard that ARM, whose CPUs are extremely popular for embedded devices, is trying to move into the low-power server market. One of the current main difficulties for using ARM processors in servers is that it is only a 32 bit architecture (A32). That means, that a single process can address at most 4GB of memory (and some of it is reserved for the OS kernel). That isn't a problem on current embedded devices, but it can be on large multi-threaded server applications. To address this issue ARM has been working on a 64 bit instruction set (A64). To my knowledge there is no commercially available hardware that implements this instruction set, yet, but ARM has already released patches to the Linux kernel to support it.

To my surprise, this new 64 bit instruction set is quite different from the existing 32 bit instruction sets. (Perhaps, I shouldn't be surprised since the two Thumb instruction sets were indeed quite different from the existing instruction sets.) It looks like a very clean RISC-style design. Here are my highlights:
  • All instructions are 32 bits wide (unlike the Thumb variants, but like the original A32).
  • 31 general purpose 64 bit wide registers (instead of 14 general purpose 32-bit registers in A32). The 32nd register is either hardwired to zero or the stack pointer. These registers can be accessed as 32 bit (called w0, w1, ..., w31) or 64 bit registers (called x0, x1, ..., x31).
  • Neither the stack pointer (SP) nor the program counter (PC) are general purpose registers. They are only read and modified by certain instructions.
  • In A32, most instructions could be executed conditionally. This is no longer the case.
  • Conditional instructions are not executed conditionally, but instead pick one of two inputs based on a condition. For example, the "conditional select" instruction CSEL x2, x4, x5, cond implements x2 = if cond then x4 else x5. This subsumes a conditional move: CMOV x1, x2, cond can be defined as a synonym for CSEL x1, x2, x1, cond. There are many more of these conditional instructions, but they all will modify the target register.
  • A conditional compare instruction can be used to implement C's short-circuiting semantics. In a conditional compare the condition flags are only updated if the previous condition was true.
  • There is now an integer division instruction. However, it does not generate an exception/trap upon division by zero. Instead (x/0) = 0. That may seem odd, but I think it's a good idea. A conditional test before a division instruction is likely to be cheaper than a kernel trap.
  • The virtual address space is 49 bits or 512TB. Unlike x86-64/AMD64, where the top 16 bits must all be zero or all one, the highest 8 bits may optionally be usable as a tag. This is configured using a system register. I'm not sure if that will require kernel support. It would certainly come in handy for implementing many higher-level programming languages.
  • A number of instructions for PC-relative addressing. This is useful for position independent code.
  • SIMD instruction support is now guaranteed. ARMv8 also support for crypto instructions. These are also available in A32.
All the existing ARM instruction sets (except perhaps Jazelle) will still be supported. I don't think you can dynamically switch between different instruction sets as was the case for A32/Thumb, though.

Further reading:

Implementing Fast Interpreters


Many modern virtual machines include either a fast interpreter or a fast baseline compiler. A baseline compiler needs extra memory and is likely slower for code that is only executed once. An interpreter avoids this memory overhead  and is very flexible. For example, it can quickly switch between execution modes (profiling, tracing, single-stepping, etc.). So what, then, is the state of the art of building fast interpreters? Modern C/C++ compilers are very good at optimising "normal" code, but they are not very good at optimising large switch tables or direct-threaded interpreters. The latter also needs the "Labels as Values" GNU extension. In a direct-threaded interpreter, each instructions takes the following form: op_ADD_INT: int op1 = READ_OP1; int op2 = READ_OP2; int result = op1 + op2; // The actual implementation. WRITE_RESULT(result); unsigned int opcode = pc->opcode; ++pc; goto *dispatch[opcode]; // Jump to code for next instruction. The variable dispatch (the dispatch table) is an array of labels, one for each opcode. The last three lines transfer control to the implementation of the next bytecode instruction. If we want to change the implementation of an instruction dynamically, we can just update the pointer in the dispatch table, or change the dispatch table pointer to point to a different table altogether. Note that everything except for the line "result = op1 + op2" is interpreter overhead and its cost should be minimised. The program counter (pc) and the dispatch table are needed by every instruction, so they should be kept in registers throughout. Similarly, we want a variable that points to the stack and possibly a variable that points to a list of literals — two more registers. We also need further registers for holding the operands. These registers better be the same for each instruction, since otherwise some instructions need to move things around, leading to memory accesses. On x86 we only have 7 general-purpose registers which makes it very hard for the compiler to optimally use them for all instructions. Interpreters Written in Assembly For this reason, most interpreters are hand-optimised assembly routines with a specially designed calling convention that keeps as much state as possible in registers. A particularly nice and fast example is the LuaJIT 2 interpreter. Both the standard Lua interpreter and the LuaJIT 2 interpreter use a register-based bytecode format. Compared to the more well-known stack-based bytecodes, a register-based bytecode has larger instructions, but requires fewer instructions overall. For example, the expression "s = x + (y * z)" in a stack-based bytecode would translate to something like: PUSHVAR 0 -- push variable "x" onto operand stack PUSHVAR 1 -- push variable "y" PUSHVAR 2 -- push variable "z" MUL -- top of operand stack is (y * z) ADD -- top of operand stack is x + (y * z) STOREVAR 3 -- write result into variable "s" With a few optimisations, this can be encoded as only 6 bytes. In a register-based bytecode this would translate to something like this: MUL 4, 1, 2 -- tmp = y * z ADD 3, 0, 4 -- s = x + tmp Each instruction takes a variable indices, the (virtual) registers, and reads from and writes directly to the variable (stored on the stack). In LuaJIT 2, each instruction requires 4 bytes, thus the overall bytecode size is a bit larger. However, executing these instructions incurs the interpreter overhead only twice instead of 6 times in the stack-based bytecode. It may also avoid memory traffic by avoiding the separate operand stack. The LuaJIT 2 interpreter also uses a very simple bytecode format that avoids further bit shuffling (increasing the cost of decoding). Instructions can only have one of two forms: OP A, B, C OP A, D Here OP, A, B, and C are 8-bit fields. OP is the opcode and A, B, C are usually register IDs or sometimes literals. D is a 16bit field and overlaps with B and C[...]

Haskell Tip: Redirect stdout in Haskell


Have you ever wanted to make sure that a call to a library cannot print anything to stdout? The following does this except that it redirects stdout globally and not just across a library call. This should be doable, but I haven't needed it yet.
import GHC.IO.Handle   -- yes, it's GHC-specific
import System.IO

main = do
  stdout_excl <- hDuplicate stdout
  hDuplicateTo stderr stdout  -- redirect stdout to stderr
  putStrLn "Hello stderr" -- will print to stderr
  hPutStrLn stdout_excl "Hello stdout" -- prints to stdout
The above code first creates a new handle to the standard output resource using hDuplicate. The call to hDuplicateTo redirects any output to the Haskell handle stdout to go to the handle stderr. The Haskell handle stdout_excl is now our only handle to the standard output resource.

The Thing That Should Not Be (Or: How to import 18500+ patches from Darcs into Git in less than three days)


I like Darcs. Really. It is easy to learn and use and for smallish projects I never had any real problems. Unfortunately, it still has some performance problems and it is likely that some operations will never be fast. An extreme example of where you run into those problems is the GHC repository. It consists of over 18500 patches and spans over 12 years of history. When I tried to build the latest version I ran into a linker error which I know I didn't get with the snapshot from one month ago. As GHC builds take quite a while I wanted to use an efficient way to find which exact change introduced the problem. More precisely I wanted git bisect. I know that Don had converted Darcs repositories to Git in order to get ohloh statistics, but he reported that this process was rather painful. It took four weeks(!) to convert the GHC repository. So, I looked what tools were out there, and how to improve them. I know that there is Tailor, but I looked at darcs-to-git by Steve Purcell first and found it very hackable. I didn't like that it saved the Darcs patch ID in the Git commit message, so I changed that and I extended it to properly translate Darcs escape sequences. I also added a parameter to only pull a number of patches at the same time, so that I can import a big repository in stages and I allowed custom mapping from committer names to other committer names. I used this to map various pseudonyms to (a unique) full name and email address. (I hope no one minds being credited with his or her full name. ;) ) It worked rather well for smallish repositories (a bit less than 2000 patches) but I had serious problems to get it to work with GHC. Darcs has a bug on case-insensitive volumes (which OS X uses by default), so Steve suggested using a case-sensitive sparse image. This works, but it is probably a bit slower. I tried running it on my FreeBSD home server, but it has only 256 MB of RAM (usually fine for a home file server) so Darcs ran out of space and eventually got killed by the OS. (Getting Darcs to compile on my server was an adventure in itself--first a few hours to update the ports tree, then one more to compile GHC 6.8 which then just failed to install...) Fortunately, my Laptop has 2 GB, so it works fine there. At startup darcs-to-git reads the full Darcs patch inventory. For such a big repo as GHC this takes over a minute (and lots of RAM). Caching it in a file didn't seem to help much. I could have lived with that, but there was a more serious problem: the approach used by darcs-to-git (and, it seems, also by Tailor) doesn't work! darcs-to-git pulls one patch at a time by giving it's ID to darcs pull --match 'hash', then git adds the changes on the Git side and git commits it with the appropriate commit message. The patches are pulled in the order in which they were applied in the source repo, so any dependencies should be fulfilled. Nevertheless, Darcs refused to apply some patches -- silently. Darcs just determined that I didn't want to pull any patches and didn't do anything. This is most likely a Darcs bug, but I heard it was only a known bug for some development version of Darcs 2 (I used Darcs 1.0.9 at that time). Anyways, that didn't work; it failed at about patch 30 of the GHC repository. OK. So instead of pulling patches by ID we could fake user interaction. Something like this: $ echo "yd" | darcs pull source-repo The input corresponds to "Yes, I want to pull this patch" and "Ok, I'm done and want to pull all the selected patches". This works reliably and also has the advantage that we don't have to read the whole history up front but instead can just retrieve the info for the last applied patch via darcs changes --last 1 --xml By now you might have guessed, though, that this still didn't work very well. It took about 60 seconds per patch (with about 1 second of this used by Git), resulting in estimated 13 days(!) CPU time[...]

A short reminder


Folds and maps are Haskell's default iteration combinators. Mapping is easy enough, but folds can often be rather messy, especially if nested. For example, given a map of sets of some values, we want to write a function to swap keys and values. The function's type will be:

type SetMap k a = Map k (Set a)

invertSetMap :: (Ord a, Ord b) => SetMap a b -> SetMap b a
The resulting map should contain a key for each value of type b, occurring in any set in the original map. The new values (of type Set a) are all those original keys for which the new key occurred in the value set. Intuitively, if the original set was representing arrows from values of type a to values of type b, this function should reverse all arrows. We can easily implement this function using two nested folds.

invertSetMap sm = 
      (\k as r -> 
          S.fold (\a r' -> M.insertWith S.union a (S.singleton k) r')
That's not pretty at all! I had written quite a bit of this kind of code (and hated it each time), until I finally remembered a fundamental Haskell lesson. Haskell uses lists to simulate iteration and specify other kinds of control flow. In particular list comprehensions are often extremely cheap, since the compiler can automatically remove many or all intermediate lists and generate very efficient code. So let's try again.

invertSetMap sm = M.fromListWith S.union
    [ (a, S.singleton k) | (k, as) <- M.assocs sm
                         , a       <- S.toList as ]
So much more readable! A quick benchmark also shows that it's slightly faster (a few percent for a very big map). Lesson to take home: If your folds get incomprehensible consider list comprehensions.

New Haskell Tutorial


Conrad Barski recently made a new Haskell Tutorial available. A while ago I stumbled upon Conrad's excellent Lisp Tutorial which was well-received in the community and actually lead to a pretty cool (Common) Lisp-logo. I haven't yet read the tutorial completely, but they are usually very well-written, newbie-friendly and based on interesting problems. So, check it out!

PS: Greetings from the second Hackathon 2007.

Network.HTTP + ByteStrings


Update: I mixed some numbers. I wrote about 375 MB, but it were 175 MB. (Noone seemed to have noticed though. Anyways, the argument still holds.) Haskell's Network.HTTP package isn't quite as good as it could be. Well, to be precise, it is not at all as good as it should be. In addition to API problems (for which I proposed a solution in my previous blog entry) there's also a major performance problem, due to strictness and use of regular list-based Strings. A simple wget-style program written in Haskell used like ./get http://localhost/file.big on a local 175 MB file almost locked my 1GB laptop due to constant swapping. I had to kill it, as it was using up more than 500 MB of RAM (still swapping). At this point it had run for 50 seconds at had written not a single byte to the output file. At the same time a normal wget completed after abound 10 seconds. Since the file was retrieved from a local server I assume overall performance was inherently limited by disk speed (or the operating system's caching strategy). The current implementation performed so badly for two reasons: Since it uses list-based strings each retrieved byte will take up (at least) 8 byte in program memory (one cons cell, or tag + data + pointer to tail). It implements custom, list-based buffering. The buffer size is 1000 characters/bytes, which is rather OK for line-based reading, but if the HTTP protocol requests to read a large block of data, this block will be read in 1000 byte chunks and then be appended to the part that has alrady been read. So if we read a block of 8000 bytes, the first block will be read and consequently be copied 8-times(!). Let's not think about reading a block of 175000000 bytes. Also because we already know the answer. But let's not flame the original author(s). It's better than nothing and it gave me and my project partner Jonas an interesting project topic. So we decided to overcome the evil at its root and replace Strings using ByteStrings--this way we would get buffering for free. To give you a taste for what this accomplishes: ProgramRuntimeMemory Use wget~10s~0.5MB ./get using strict ByteStrings~18s~175MB ./get using lazy ByteStrings~11s~3MB Adding strict ByteStrings was relatively straightforward. Network.HTTP already implements a simple Stream abstraction with a simple interface: class Stream x where readLine :: x -> IO (Result String) readBlock :: x -> Int -> IO (Result String) writeBlock :: x -> String -> IO (Result ()) close :: x -> IO () Implementing this for strict ByteStrings is just a matter of calling the corresponding functions from the ByteStrings module. With one small annoyance: The HTTP parsing functions expect readLine to return the trailing newline, which hGetLine does not include, so we have to append it manually, which in turn is an O(n) operation. For simplicity, we also didn't convert the header parsing and writing functions to use ByteStrings, but instead inserted the appropriate calls to pack and unpack. This could become a performance bottleneck if we have many small HTTP requests. OTOH, we might soon have a Parsec version that works on ByteStrings. As could be seen from the above benchmarks, using strict ByteStrings still forces us to completely load a packet into memory before we can start using it, which may result in unnecessary high memory usage. The obvious solution to this problem is to use lazy ByteStrings. For lazy ByteStrings things work a bit differently. Instead of calling hGet and hGetLine inside the stream API, we call hGetContents when we open the connection. This gives us a lazy ByteString which we store in the connection object and then use regular list functions on that string to implement the required API. openTCPPort uri port = do { s do { bs >= \conn -> case conn of [...]

Towards Better Error Handling


A while ago Eric Kidd wrote a rant about inconsistent error reporting mechanisms in Haskell. He found eight different idioms, none of which were completely satisfying. In this post I want to propose a very simple but IMO pretty useful and easy-to-use scheme, that works with standard Haskell. The Haskell HTTP Package is a good test case for such scheme. The most immediate requirements are: It should work from within any monad (not just IO). It should be possible to catch and identify any kind of error that happened inside a call to a library routine. It should be possible to ignore the error-handling (e.g., for simple scripts that just die in case of error) So far, the public API functions mostly have a signature like type Result a = Either ConnError a simpleHTTP :: Request -> IO (Result Response) This requires C-style coding where we have to check for an error after each call. Additionally, we might still get an IOException, and have to catch it somewhere else (if we want to). A simple workaround is to write a wrapper function for calls to the HTTP API. For example: data MyErrorType = ... | HTTPErr ConnError | IOErr IOException instance Error MyErrorType where noMsg = undefined -- who needs these anyways? strMsg _ = undefined instance MonadError MyErrorType MyMonad where ... -- | Perform the API action and transform any error into our custom -- error type and re-throw it in our custom error type. ht :: IO (Result a) -> MyMonad a ht m = do { r <- io m ; case r of Left cerr -> throwError (HTTPErr cerr) Right x -> return x } -- | Perform an action in the IO monad and re-throw possible -- IOExceptions as our custom error type. io :: IO a -> MyMonad a io m = do { r <- liftIO $ (m >>= return . Right) `catchError` (\e -> return (Left e)) ; case r of Left e -> throwError (IOErr e) Right a -> return a } We defined a custom error type, because we can have only one error type per monad. Exceptions in the IO monad and API error messages are then caught immediately and wrapped in our custom error type. But why should every user of the library do that? Can't we just fix the library? Of course we can! Now, that we have a specific solution we can go and generalize. Let's start by commenting out the type signatures of ht and io and ask ghci what it thinks about the functions' types: *Main> :t io io :: (MonadIO m, MonadError MyErrorType m) => IO a -> m a *Main> :t ht ht :: (MonadIO t, MonadError MyErrorType t) => IO (Either ConnError t1) -> t t1 Alright, this already looks pretty general. There's still our custom MyErrorType in the signature, though. To fix this we apply the standard trick and use a type class. data HTTPErrorType = ConnErr ConnError | IOErr IOException -- | An instance of this class can embed 'HTTPError's. class HTTPError e where fromHTTPError :: HTTPErrorType -> e Our wrapper functions now have a nice general type, that allows us to move them into the library. throwHTTPError = throwError . fromHTTPError ht :: (MonadError e m, MonadIO m, HTTPError e) => IO (Result a) -> m a ht m = do { r <- io m ; case r of Left cerr -> throwHTTPError (ConnErr cerr) Right a -> return a } -- | Perform an action in the IO monad and re-throw possible -- IOExceptions as our custom error type. io :: (MonadError e m, MonadIO m, HTTPError e) => IO a -> m a io m = do r <- liftIO $ (m >>= return . Right) `catchError` (\e -> return (Left e)) case r of Left e -> throwHTTPError (IOErr e) Right a -> r[...]

More on Syntax


My last post appeared on reddit--thanks dons! This induced some comments I'd like to respond to. First of all, there already is a macro system for Haskell, called (somewhat misleadingly) Template Haskell. It already provides the capabilities to generate arbitrary Haskell code. (More correctly, Haskell 98 code, since extensions like Generalized ADTs are not supported.) It also provides the given quasi-quotation mechanism I used in my last post's mock-ups: [| ... |]. However it has two problems. Firstly, macros are marked specially using the $(macro ...) syntax, which is not as seamless as it could be, although there might be good reasons to keep it, namely to make it easily recognizable, when macros are involved. Secondly, its quasi-quotation syntax is very limited, i.e., you cannot introduce new bindings and it's hard to modularize code--but I might be wrong with here since I might not have pushed it as far as possible. The problem is: when you cannot use the quasi-quotation syntax then you're left building up the quite complex Haskell parse tree yourself. Due to limited documentation and Haskell's syntax rules you usually write your macros by first getting the AST of some sample code, e.g. using: -- | print a human-readable representation of a given AST printAST :: ExpQ -> IO () printAST ast = runQ ast >>= putStrLn . show pp = printAST [| let x = $([|(4+)|]) in x 5 |] which then (reformatted) looks like this: $ pp LetE [ValD (VarP x_0) (NormalB (InfixE (Just (LitE (IntegerL 4))) (VarE GHC.Num.+) Nothing)) []] (AppE (VarE x_0) (LitE (IntegerL 5))) Then you try to customize this for your purposes. Not pretty. My actual attempt was to take a type name as a parameter, inspect it, and then generate some boilerplate code. Well, I tried but gave up after being unable to construct some type. Maybe I didn't try hard enough. Anyways, macro-writing should be that hard! My proposed solution certainly is just a sketch of an idea, essentially pointing to prior art. I don't claim that this will in fact work nicely or even that it will work at all. I am pretty confident that it might, though, and I am planning to give it a shot later on. Maybe extending Template Haskell with features similar t o Scheme's syntax-case might be enough, for a start. And yet, I don't consider this a high-priority project, since a lot of uses for macros in Lisp can be solved differently in Haskell, as has also been mentioned in the comments to my previous post: Controlling the order of evaluation is not necessary in Haskell since, due to lazyness. And if we have to control it somehow, we mostly use monads. The whole category of (with-something (locally bound vars) ...) can be implemented almost as conveniently using withFoo \locally bound vars -> do ... A lot of cases for special syntax can be achieved using clever operator and constructor naming. E.g., in wxHaskell: t <- timer f [interval := 20, on command := nextBalls vballs p], or, for an in-progress project of mine I simulate a convenient assembler syntax by allowing a notation like: res <-- a `imul` c. However, I was not able to use the <- notation, since I have different scoping rules than Haskell and I'm not in a monad. Many cases of boilerplate code generation can be covered using generic programming, e.g. using Scrap Your Boilerplate. So where would (more usable) macros still make sense? Allow more flexible syntax for domain-specific embedded languages (DSELs), e.g. an XML library or a parser library might profit from this right now. (Yes, I think Parsec could be more readable). Also, DSLs like Happy would be even nicer if embedded directly into Haskell. Arrows and Monads were considered general enough concepts to introduce new[...]



Coming to Haskell from Common Lisp, I soon started to miss macros, and tried out Template Haskell .. and gave up. Haskell's syntax parse tree is way too complex to be usefully manipulated by Haskell functions. You can get used to Lisp syntax, but you always have to justify it to outsiders and there certainly lies a lot of value in resembling mathematical syntax. I certainly agree with dons' post on this issue. If only it weren't for the disadvantages! Sure, syntactic extension have to be done carefully, and powerful abstraction mechanisms in the language always have to come first. But having macros as an integral part of the language definition would result in desugaring being a library instead of a part of the compiler. This is a very powerful way of syntactic abstraction. I know that ML, being designed as a meta-language, has some ways of syntactic abstraction, though I haven't took a closer look at these, yet. Let me however outline a way of achieving macros almost as powerful as general Lisp macros, which works for languages with less simple syntax. This system is (partly) implemented in the Dylan programming language and is called D-Expressions. It is essentially an extension of the Lisp syntax. Lisp in its most primitive form it looks like this: Expr ::= Atom | ( Expr* ) This represents a simple tree. However, why should we restrict ourselves to represent nested structure with parens? We might as well use brackets or braces or keywords. So: Expr ::= Atom | ( Expr* ) | [ Expr* ] | { Expr* } | def Expr* end This way we still retain the unambiguous nesting structure, but have a more varied syntax. Reader macros in Common Lisp (and probably Scheme, too) do this and already perform some sort of desugaring, by translating { ... } into (some-macro ...). This however has one big problem: You can only have one macro for { ... }. Common Lisp "fixes" this by using some prefix to the parens, e.g. #c(3 4) denotes a complex number. This isn't exactly beautiful and doesn't scale well, though. In fact we'd rather like to have the chance to use this more variable syntax inside our macros and we'd need a way to make it scalable. Dylan solves this by allowing three kinds of macros: Definition-style macros have the form: define modifiers* macro-name Expr* end Function-style macros have the form: macro-name(Expr*) Statement-style macros have the form: macro-name Expr* end Macros are defined using pattern matching rewrite rules, e.g.: define macro when { when ?cond:expr ?body:body end } => { if ?cond ?body else #f end } end Here ?cond:expr states that the pattern variable cond matches a form of the syntactic category of an expression. Similarly, for ?body:body. Multiple patterns are possible and extending this to Lisp-style abstract syntax tree transformations is possible, too, as shown in this paper on D-Expressions . Adding this feature to Haskell would probably require some small modifications to the syntax of Haskell, but I think we don't have to drop whitespace sensitivity. This way embedding DSLs in Haskell should be even more seemlessly and rants about the do-notation would not be needed. Here's how the implementation of the syntactic sugar to list expressions could look like: macro do { [| do { ?e:expr } |] => [| ?expr |] [| do { ?e:pat [| ?expr >>= (\?pat -> do { ??rest ... }) |] [| do { let ?pat = ?expr; ??rest:* } |] => [| let ?pat = ?expr in do { ??rest ... } |] ... } where ??rest:* matches anything up to the closing "}" (resulting the pattern variable rest to be bound to a sequence of tokens) and ??rest ... expands all tokens in rest. Sure, there are a lot of open issues to be solved--e.g. type specifications representing a seperate sub-language of Haskell, and how to actuall[...]

Specification-based Testing


I just had the seldom opportunity to hear a live talk by John Hughes given to a small group of Chalmers students which I happened to part of. He was re-giving his talk he held one week ago at the Erlang User Conference about specification-based testing of Ericsson software (written in Erlang). In the following I'll try to give a summary of what I consider the most essential parts of his very interesting talk. For further information see John's (et al.) paper or the LtU discussion. Users of QuickCheck know what nice advantages specification-based testing has: Instead of lots and lots of test cases one writes properties of functions, for example (in Erlang): prop_reverse() -> ?FORALL(Xs, list(int())), ?FORALL(Ys, list(int())), reverse(Xs++Ys) == reverse(Xs) ++ reverse(Ys). In fact this specification is wrong as a quick check (pun intended) shows us: Failed! After 12 tests. [-3,2] [3,0] Shrinking....(10 times) [0][1] QuickCheck provides two important features: It provides the interface to a controlled generation of test cases (as opposed to simply random ones, which usually are of little help). It generates and runs tests of the specified properties and--this might be new to users of the GHC package Test.QuickCheck--shrinks them to smaller test cases. This is important as it often hard to find the actual cause of a bug, especially if test cases are very large. (John mentioned that the Mozilla developers actually offer T-Shirts to people just reducing test cases as they found this to be an extremely time consuming task. In our examples the error was in the specification and can be fixed by substituting Xs and Ys in the last line. Advantages of Specification-based Testing The most obvious advantage of using QuickCheck over usual test cases is that it allows us to dramatically reduce the number of test cases. In fact it also does a great job in finding corner cases. In their field study at Ericsson, while spending only 6 days to implement all the test properties (and a library for state-machine-based testing), they found 5 bugs in an already well-tested soon-to-release product, one of which would have been really unlikely to be triggered by any test case and revealed 9 bugs in an older version of the project, of which only one has been documented at this time. (see the paper). Neil Mitchell also recently posted about the merits of QuickCheck can be. John added one note, though. The bugs they found were very small (due to shrinking) and in fact would have never been caused in the real system, because the command sequences that lead to the failures would have never been triggered by the real controller. However, they tested against the documented (400+ pages) specification. This leads us to one other important advantage of specification-based testing. QuickCheck forces (or allows) us to actually state the specification as executable properties. therefore we might not only find errors in the program but, as seen in our example, also in the specification. This can be very useful and he gave a nice demonstration of buggy (or incomplete) specification for the Erlang functions register/2, unregister/1, and whereis/1 that provide a sort of name server for Erlang processes. To do this he generated a sequence of Erlang calls and checked if their results corresponded to the model of a state machine based on the specification. That is our current state consisted of a simple map, representing the (assumed) state of the name server. Using preconditions he controlled the allowable sequences of generated commands and checked their outcome using postconditions. This small experiment showed quite a couple of incompletenesses in the Erlang specification, e.g., it does [...]

Being Lazy


Lazy? Me? No. Noo. Never! But here's a Reddit discussion (warning: long!) that--even though blown up by an obvious troll--has some nice statements about performance, usability, and composability implications of lazy evaluation. Quite interesting (if you filter out the noise).



Everything has a beginning and an ending. So, this is the beginning of my humble blog--let's hope it's not going to see its end soon. This blog will (presumably) be about programming languages, compilers, concurrency, maybe a bit about a human-computer-interaction, and about some general life-related stuff. With time I'll also try to adopt to some blog usability...