Subscribe: Notes on Haskell
http://notes-on-haskell.blogspot.com/feeds/posts/default
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
balance  closures  dynamic languages  dynamic  language  languages  mercuri  programming language  programming  static  typing  variable 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Notes on Haskell

Notes on Haskell





Updated: 2018-01-19T06:43:01.050-05:00

 



Celebrating Ada Lovelace Day - Dr. Rebecca Mercuri

2010-03-24T22:50:36.105-05:00

In honor of Ada Lovelace Day, I want to shine a light on a very important woman in modern computing: Dr. Rebecca Mercuri.

I was lucky enough to have Dr. Mercuri (while she was working on her Ph. D.) as my professor for Programming Language Concepts, the first serious comp sci course in the curriculum when I was in school. This was the course after the intro programming courses where you learned the mechanics and basic data structures, but before the serious topics like system architecture, operating systems and artificial intelligence.

The course had a rather large footprint. It was the first time we used C as the language of instruction, and the first time we had to work on a project that spanned the entire term. The course was about all about parsing (specifically writing recursive descent parsers by hand), and slowly moved towards writing a small Scheme interpreter by the end of it all. That's a lot of material to cover in 10 weeks, especially for a sophomore.

Although the class was daunting, the goal wasn't to learn Scheme, C or basic parsing theory. Dr. Mercuri told us in as many words that this was "a class about learning how to learn a programming language". A good computer science curriculum isn't about learning today's set of fashionable skills, it's about learning the fundamentals. Languages come and go (how many have you used over the past 5 years? 10 years?), but the fundamentals don't change nearly as much.

Sometimes, I think I owe my career to Dr. Mercuri. Whenever I go back to the well and pull something out of my undergrad education, more often than not it's something I learned in that one course. Thank you, Dr. Mercuri.

While I feel privileged to have been one of her many students over the years, that's probably not why you should know about her work. Dr. Mercuri is one of the few people who have spent the last few decades speaking out about the evils of electronic voting machines. When shrouded in secrecy, these inscrutable proprietary systems subvert the very democratic processes they claim to promote. (HBO's documentary Hacking Democracy illustrates some of the problems with actual systems in use in the field.)

The issue isn't that electronic voting machines are an inherently bad idea. Electronic voting machines can simplify balloting and speed tabulation in a secure manner, if they offer voter-verified balloting (also known as the "Mercuri Method"). Current systems merely speed tabulation with no way to detect tampering or perform a recount, making their results worth less than the paper they're printed on.

The world needs more people like Rebecca Mercuri. And we all need to join her to help voter verified ballot systems sweep away egregiously bad electronic voting machines.




Closures and Scoping Rules

2008-09-16T23:04:11.971-05:00

Although the concept of a closure in a programming language construct is fairly clear, some of the details are a little fuzzy around the edges. The central ideas are that (a) closures are anonymous functions (b) created as first class values that (c) capture variables in the enclosing lexical scope. Unfortunately, the notions of "variable" and "enclosing lexical scope" vary subtly across languages. These lead to odd edge cases that can be demonstrated with contrived examples, but can also lead to unexpected behaviors in legitimate code in some languages.Take, for example, the much maligned for loop in C: if (x) { for (int i = 0; i < 10; i++) { if (i == 2) break; /* ... */ } if (i < 10) { /* ... */ } }This snippet raises two important questions: What is the scope of int i? And what should it be? Early C compilers avoided the question entirely by forcing variables to be declared before the first statement. C++ and ANSI C relaxed those rules to allow variables to be declared within the body of a function.In a language that doesn't support closures or variable capture, these questions are mostly meaningless, unless you're writing a compiler or trying to out-geek the nerd in the next cube. Now that closures are on the table, the finer points of language design have important and subtle consequences.There are a few ways to interpret this snippet of C code. One way is to consider the for loop a lexical block, and the lifespan of int i is limited to the loop itself; the precondition, postcondition and increment clauses are merely syntactic sugar that are still tied to the loop itself.Another way to look at the code is to interpret the int i declaration as appearing immediately before the loop, making it visible to any subsequent point in the same block (the if block in this example). This is useful if you want to inspect the value of the loop index after the loop terminates to check whether the loop exited prematurely.A third way to look at the code is to have the declaration of int i be visible to any subsequent point in the body of the function that declares it. This would mean that although the declaration may appear within the if statement, the variable could be used after the if statement, even if the body of the if statement were never executed.A fourth way would be to use the declaration of int i as a kind of guidance for the parser. That is to say, the parser would enforce the desired scoping rules, but the compiler would always allocate space for the variable that is visible for the entire body of the enclosing function.There are other possible interpretations. Sometimes they are actively chosen as the desired way to handle variable scoping. Other times, they are merely a quirk of a buggy implementation. Ruby (pre-1.9), lets code from an inner block leak into an enclosing block: $ ruby --version ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0] $ irb irb> i=0; [1,2,3].each {|i|}; i => 3(from Sam Ruby's presentation on Ruby 1.9 from OSCon 2008)There are many ways to get lexical scoping wrong. But how should it work?One of the best examples comes from SICP, using Scheme, of course: (define (make-account balance) (define (withdraw amount) (if (>= balance amount) (begin (set! balance (- balance amount)) balance) "Insufficient funds")) (define (deposit amount) (set! balance (+ balance amount)) balance) (define (dispatch m) (cond ((eq? m 'withdraw) withdraw) ((eq? m 'deposit) deposit) (else (error "Unknown request -- MAKE-ACCOUNT" m)))) dispatch)Assigning a variable to the result of (make-account 100) creates a new object that contains a single variable balance with the value of 100, and two local functions which reference that same value. Passing the token 'deposit or 'withdraw to that object updates the shared referenc[...]



The Closures are Coming

2008-09-07T21:46:01.949-05:00

Last month, Chris Lattner at Apple announced support for closures in C for Clang, the llvm-based C compiler Apple has been working on as a potential replacement for the venerable gcc. Because Apple is working on this extension, it means that closures are coming to both C and Objective-C. This is a sorely needed feature C has been missing for quite a number of years. The need may not have been pressing two decades ago, in the era of smaller programs on early 32-bit CPUs with kilobytes or maybe a couple of megabytes of RAM. But those days are long behind us, and constructing programs without closures feels clunky, cumbersome, and a throwback to the era of 8-bit home "computers" running some quirky BASIC dialect. Nearly every major general purpose programming language in use today supports closures or soon will. There are a few exceptions, like PHP, Tcl, Cobol and Fortran that don't have them, and it's debatable whether these languages are either "major", or will get closures "soon". (Update: PHP 5.3 has/will have closures. See below.) C# has them, and has made them easier to use in the most recent version the language by providing lambda expressions. Java is likely to get them, and the BGGA proposal looks pretty strong. And even if these languages didn't, F# (which is basically OCaml by another name) and Scala are merging the functional and object oriented disciplines on the CLR and JVM, bringing features like closures and currying to these platforms. The only major language that doesn't have closures is C, and thanks to Chris Lattner and the llvm team, this is changing. Hopefully their changes will be accepted widely, and this feature will be added to the next C standardization. Adding closures to C introduces exposes a few leaky abstractions, though. Local variables in C (including parameter values) live on the stack, which doesn't do much good if a closure doesn't live long enough to be invoked. Therefore, this change adds a few new features that allow closures to be copied to the heap (via persistent = _Block_copy(stack_closure)), reference counted while on the heap, and freed when no longer in use (via _Block_release(persistent)). There are a few other wrinkles, and the usage is slightly cumbersome, but, well, it's C, and you should be used to that by now. However, the implications of this change are much more profound. Adding closures to C is tantamount to saying it is unacceptable for a general purpose programming language to omit support for closures. All of the old arguments for withholding them are now void: it can be done in a compiled language. It can be done without using a virtual machine. It can be done in a mainstream language. Closures are useful beyond the functional ghetto where the started (and offering just pointers to functions simply doesn't cut it). If closures are available in C, one of the oldest and most widely used languages without closures, what's your excuse for not providing them in your pet language? It may be hard to do, but the techniques are proven, well known, and open for inspection. The utility of providing programmers with the means to use idioms like map, filter and fold (or reduce) is too overwhelming to claim it's not worth the time, cost or complexity to implement. The writing is on the wall. If you intend to develop software in the 21st century, you will be using closures, or going to great lengths to avoid them. This may sound like a bold statement, but it really isn't. Programming languages have co-evolved in fits and starts for decades. Some feature, like recursion, proves itself in academia, but is wildly regarded as unnecessary in industry. Slowly, industry adapts, and that feature becomes a necessity in all future programming languages. Recursion may sound like an esoteric feature to require in a programming language. After all, it's not something that you expect to use every day. But what about lexical scoping? Believe it or not, block scoped variables weren't standard practice until the mid[...]



Arrows in Javascript

2008-08-24T12:21:12.925-05:00

Last year, I mentioned a lesson I learned from one of my CS professors many years ago:

He began by recommending every project start by hiring a mathematician for a week or two to study the problem. If a mathematician could find an interesting property or algorithm, then it would be money well spent, and drastically reduce the time/money/effort needed to develop a system.

A few weeks ago, I came across a paper about Arrows in Javascript. It's an interesting adaptation of arrows to do asynchronous, event driven programming in Javascript. It makes an excellent example of this dictum.

Here's the problem in a nutshell. There's a very interesting language lurking underneath Javascript. It's an object oriented language with namespaces, closures, and dynamic typing. Doug Crockford has written and presented these concepts over the last few years.

As good as the Javascript is on its own merits (albeit abused and misunderstood), it is generally used in a pretty horrendous environment. Browsers have compatibility bugs in their language support and the cross platform APIs they expose. Most of all, browsers execute Javascript in a single-threaded manner. This makes asynchronous programming difficult, especially when there are two interwoven threads of execution, like drag and drop and dynamic page updates.

This paper by a team at UMD starts by bringing continuation passing to Javascript, and implementing arrows on top of that. Once arrows are available, building a library for asynchronous event programming is much easier than brute force chaining of event callbacks. For example, drag and drop programming can be specified common method chaining:


ConstA(document.getElementById(”dragtarget”))
.next(EventA(”mouseover”))
.next(setupA)
.next(dragDropCancelA)
.run ()

The plumbing inside the library can be a little hairy, especially if you find CPS or arrows befuddling. But the exposed API is quite elegant and appears to be easy to use. It'll be interesting to watch the Javascript library space to see when and where it pops up.




Thought of the Day

2008-05-14T17:54:41.974-05:00

Also from Steve Yegge's talk:

[W]hen we talk about performance, it's all crap. The most important thing is that you have a small system. And then the performance will just fall out of it naturally.



Static vs. Dynamic Languages

2008-05-15T13:34:28.838-05:00

One permatopic across programming blogs is the good ol' static-vs-dynamic languages debate. Static languages (like C, C++, C#, C--, Java, etc.) require variables to have type declarations primarily to generate compiled code and secondarily to catch errors or emit warnings at compile time. Dynamic languages (like Perl, Python, PHP, Ruby, Tcl, Lisp, Scheme, Smalltalk, etc.) do away with all those needless keystrokes, allow programmers to be more productive, and generally aren't as dangerous as the C-programmers would have you believe. Furthermore, dynamic language programmers are quick to point out that any real program in a static language requires typecasting, which throws all of the "guarantees" out the window, and bring in all of the "dangers" of dynamic languages with none of the benefits.However, this debate isn't limited to these opposite points of view. There's another perspective, which I heard from Mark-Jason Dominus about a decade ago: static typing in languages like C and Pascal isn't really static typing! (Correction: Mark actually said that "strong" typing in C and Pascal isn't really strong typing.)The kind of static typing supported by C, its ancestors and its descendants, dates back to the 1950s and Fortran (or, rather FORTRAN, since lower case letters hadn't been invented yet). Type annotations in these languages are guides to the compiler in how much storage to allocate for each variable. Eventually, compilers began to emit warnings or compiler errors mixing variables of incompatible types (like floating point numbers and strings), because those kinds of operation don't make sense. If the irksome compiler gets in your way, you can always tell the it to shut up and do the operation anyway and maybe let the program crash at runtime. Or maybe silently corrupt data. Or maybe something completely different. None of which makes much sense, but there you go.Modern static typing, on the other hand, uses a strong dose of the Hindley-Milner type inference algorithm to determine (a) what values a variable may contain and (b) prevent the use of incompatible values in expressions. Furthermore, Hindley-Milner prevents values being cast from one type to another on a whim (unlike C), and thus prevents odd runtime errors through abuse of the type system. This leads to an strong, unbreakable type system that is enforced by the compiler, and prevents a whole class of loopholes allowed by lesser type systems. Because types can be inferred, explicit type annotations aren't always needed, which leads to an interesting property: languages that are safer than "safe" static languages like C and Java, and programs that are free of those troublesome, noisy type declarations, just like dynamic languages.Use of the Hindley-Milner type system comes to us through ML, and descendants like Haskell, OCaml, SML, F# and the like. It's such a good idea, that it's slowly making its way into new programming languages like Perl 6, Fortress, Factor, Scala, and future versions of C++, C#, and VB.Net.So, really, the whole static-vs-dynamic languages debate is both misleading and moot. It's misleading, because it's not about static vs. dynamic typing, it's about weak compile-typing vs. runtime-typing. It's moot, because the writing is on the wall, and weakly typed languages are on the way out, to be replaced (eventually) by strongly-typed languages of one form or another. (It also tends to be a meaningless "debate", because it's typically a shouting match between C/C++/Java/etc. acolytes who are uncomfortable with Perl/Python/Ruby/etc., and Perl/Python/Ruby/etc. acolytes who have a strong distaste for C/C++/Java/etc., and neither group has a hope of persuading the other.)Normally, I would avoid topic entirely, but today, I couldn't resist. Steve Yegge posted a transcription of his talk on dynamic languages, where he covers both sides of the debate, and moves beyond the two-sides-talking-past-e[...]



Finger Exercises

2008-03-31T21:08:56.894-05:00

Wow. Three months, no update. Hm.

I've been working on a bunch of little projects recently that individually don't amount to much, yet. I should have something to post about that soon.

In the mean time, here is a cute little problem (via Uri to the fun with Perl list):

What is an easy way to get all possible combinations of Y/N of length 5? (i.e., "YYYYY", "YYYYN", etc.)

Coming up with a solution to this exact problem seemed to cry out for a list comprehension:

[[a,b,c,d,e] | let l <- "YN", 
a <- l, b <- l, c <- l, d <- l, e <- l]

but as soon as I wrote that, I saw how unsatisfactory it was. A better solution would produce all possible combinations of Y and N for a list of any integer n > 0:

yn i = (iterate f [""]) !! i
where f l = map ('Y':) l ++ map ('N':) l

This problem itself isn't very interesting, but it's good to have little diversions like this when you want to take a break while the coffee is steeping, waiting for the Guinness to settle, or whatnot. (Something that's more than a function from the Prelude, but less involved than something from Project Euler.)

Do you have any cute little "finger exercise" problems like these? I'd love to hear about them!




Taxicab Numbers

2007-12-25T16:20:13.747-05:00

Futurama is one of my favorite TV shows. It's the quintessential show about nerds, produced by nerds, for nerds, and it's chock full of inside jokes in math, science, sci-fi and pop culture. I'm happy to note that although Futurama, the series, was cancelled after the fourth season, the show is back with a set of direct-to-DVD features. The first one, Bender's Big Score, is available now, and includes lecture by Sarah Greenwald as a special feature on the various math jokes that have appeared in the show. Some are silly, like π-in-one oil, or Klein's Beer (in a Klein bottle, of course). Others are subtle, like the repeated appearance of taxicab numbers. Taxicab numbers are an interesting story in and of themselves. The story comes to us from G. H. Hardy, who was visiting Srinivasa Ramanujan: I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. "No", he replied, "it is a very interesting number; it is the smallest number expressible as the sum of two [positive] cubes in two different ways." Many of the staff involved in creating Futurama received advanced degrees in math, computer science and physics, so they enjoy making these obscure references, because they know their fans will find them. It shouldn't be surprising that the number 1729 makes many appearances in Futurama, or that the value 87539319 appears as an actuall taxicab number (87539319 is the sum of 2 cubes in 3 different ways). Of course, the idea of taxicab numbers is interesting, the details less so. After watching the new feature, I wondered what Hardy's taxicab number was, and what two pairs of cubes add up to that value. If I were more mathematically inclined, I'd probably sit down with pencil and paper and just do the math until I found something. If I were truly lazy, I'd just Google for the answer. Instead, I sat down, fired up vim and typed this code up, pretty much as you see it here: cube x = x * x * xtaxicab n = [(cube a + cube b, (a, b), (c, d)) | a [...]



More thoughts on rewriting software

2007-12-20T01:22:12.708-05:00

I started writing about when it is acceptable to consider rewriting a software project a few months back (part one and two). I remain convinced that it is occasionally economically sound to toss a problematic codebase and start anew. There are dozens of pesky little details to consider, like:Is the code horrifically overcomplicated, or just aesthetically unpleasing?How soon before the new system can replace the old?How soon before there's a net benefit to the rewrite?Are you rewriting to address fundamental flaws, or adopt this week's fashionable language/library/architecture/buzzword?Do you understand the problem the code attempts to solve? Really? Really?Since the project began, is there a new piece of off-the-shelf piece of software that makes a lot of this pain simply go away?Steve Yegge looks at the problem from a different angle in his essay Code's Worst Enemy. (See Reginald Braithwaite's synopsis if you want to read just the choice quotes.) Steve argues that code size is easily the riskiest element in a project:My minority opinion is that a mountain of code is the worst thing that can befall a person, a team, a company. I believe that code weight wrecks projects and companies, that it forces rewrites after a certain size, and that smart teams will do everything in their power to keep their code base from becoming a mountain. Tools or no tools. That's what I believe.Steve's essay focuses on behaviors endemic among many Java programmers that lead to large codebases that only get larger. As codebases grow, they become difficult to manage, both in terms of the tools we use to build software from the code (IDEs and the like), as well as the cognitive load needed to keep track of what 20,000 pages of code do on a 1 million line project.The issues aren't specific to Java, or even Java programmers. Rather, they stem from an idea that the size of a code base has no impact on the overall health of a project.The solution? Be concise. Do what it takes to keep a project small and manageable.Of course, there are many ways to achieve this goal. One way is to rewrite large, overgrown projects and rebuild them from scratch. This could be done in the same language (building upon the skills of the developers currently on the project), or porting the project to a new language. Another strategy involves splitting a large project into two or more isolated projects with clearly defined interfaces and responsibilities. Many more alternatives exist that don't involve ignoring the problem and letting a 1 million line project fester into a 5 million line project.Interestingly, Jeff Atwood touched upon the issue as well, in his essay Nobody Cares What Your Code Looks Like. Jeff re-examines the great Bugzilla rewrite, when the team at Netscape decided in 1998 to convert the code from Tcl to Perl. The goal was ostensibly to encourage more contributions, since few people would be interested in learning Tcl to make contributions to Bugzilla. Nine years later, and the Bugzilla team considers Perl to be its biggest liability, because Perl is stagnating, and Perl culture values the ability of every Perl programmer to speak a slightly different dialect of Perl. Newer projects written in PHP, Python, Java and Ruby outcompete Bugzilla because (in a couple cases) they are comparable to Bugzilla today (without taking nine years of development to reach that plateau), and can deliver new features faster than the Bugzilla team can.Nevertheless, Jeff stresses that although it may be ugly, harder to customize and extend, Bugzilla works. And it has worked for nearly a decade. And numerous large projects use it, and have no intentions to switch to the new bug tracker of the week anytime soon. So what if the code is ugly?There's a lesson here, and I don't think it's Jeff's idea that nobody cares what your code looks like. Jeff[...]



Defanging the Multi-Core Werewolf

2007-10-04T00:06:53.688-05:00

Many developers are have a nagging sense of fear about the upcoming “multi-core apocalypse”. Most of the software we write and use is written in imperative languages, which are fundamentally serial in nature. Parallelizing a serialized program is painful, and usually involves semaphores, locks, and comes with problems like deadlock, livelock, starvation and irreproducible bugs. If we’re doomed to live in a future where nearly every machine uses a multi-core architecture, then either (a) we have a lot of work ahead of us to convert our serialized programs into parallelized programs, or (b) we’re going to be wasting a lot of CPU power when our programs only use a single processor element on a 4-, 8-, or even 64-core CPU.At least that’s the received wisdom. I don’t buy it.Yes, software that knows how to exploit parallelism will be different. I don’t know that it will be much harder to write, given decent tools. And I certainly don’t expect it to be the norm. Here is some evidence that the multi-core monster is more of a dust bunny than a werewolf.First, there’s an offhand remark that Rob Pike made during his Google Tech Talk on Newsqueak, a programming language he created 20 years ago at Bell Labs to explore concurrent programming. It’s based on Tony Hoare’s CSP, the same model used in Erlang. During the talk, Rob mentioned that the fundamental concurrency primitives are semaphores and locks, which are necessary when adding concurrency to an operating system, but horrible to deal with in application code. A concurrent application really needs a better set of primitives that hide these low level details. Newsqueak and Erlang improve upon these troublesome primitives by offering channels and mailboxes, which make most of the pain of concurrent programming go away. Then, there’s Timothy Mattson of Intel, who says that there are just too many languages, libraries and environments available today for writing parallelized software. Timothy is a researcher in the field of parallel computing, and when someone with such a deep background in the field says the tools are too complicated, I’ll take his word for it. The good news is that very few programmers work on the kinds of embarrassingly parallel problems that require these tools. Working on parallel machines isn’t going to change that for us, either. In the future, shell scripts will continue to execute one statement at a time, on a single CPU, regardless of how many CPUs are available, with or without libraries like Pthreads, PVM or MPI. Parallel programmers are still in a world of hurt, but at least most of us will continue to be spared that pain.Then there’s Kevin Farnham, who posted an idea of wrapping existing computationally intensive libraries with Intel’s Thread Building Blocks, and loading those wrapped libraries into Parrot. If all goes well and the stars are properly aligned, this would allow computationally intensive libraries to be used from languages like Perl/Python/Ruby/etc. without the need to port M libraries to N languages. (Tim O’Reilly thought it was an important enough meme that he drew attention to it on the O’Reilly Radar.)This sounds like a hard problem, but adding Parrot to the equation feels like replacing one bad problem with five worse problems. If we’re going to live in a world where CPUs are cheap and parallelism is the norm, then we need to think in those terms. If we need Perl/Python/Ruby/etc. programs to interact with parallelized libraries written in C/C++/Fortran, where’s the harm in spawning another process? Let the two halves of the program communicate over some IPC mechanism (sockets, or perhaps HTTP + REST). That model is well known, well tested, well-unders[...]



Rewriting Software, Part 2

2007-09-30T17:36:13.545-05:00

When I wrote my previous post about rewriting software, I had a couple of simple goals in mind. First was answering Ovid’s question on when rewriting software is wise, and when it is foolish. Second was to re-examine Joel Spolsky’s dictum, never rewrite software from scratch, and show how software is so complex that no one answer fits all circumstances. Finally, I wanted to highlight that there are many contexts to examine, ranging from the “small here and short now” to the “big here and long now” (to use Brian Eno’s terminology).

When I wrote that post, I though there would be enough material for two or three followup posts that would meander around the theme of “yes, it’s OK to rewrite software”, and move on. The more I wrote, the more I found to write about, and the longer it took to condense that material into something worth reading.

Rather than post long rambling screeds on the benefits of rewriting software, I decided to take some time to plan out a small series of articles, each limited to a few points. Unfortunately, I got distracted and I haven’t posted any material to this blog in over a month. My apologies to you, dear reader.


Of course, there’s something poetic about writing a blog post about rewriting software, and finishing about a month late because I couldn’t stop rewriting my post. There’s a lesson to learn here, also from Joel Spolsky. His essay Fire and Motion is certainly worth reading in times like these. I try to re-read it, or at least recall his lessons, whenever I get stuck in a quagmire and can’t see my way out.

In that spirit, here’s a small nugget to ponder.

If you are a writer, or have ever taken a writing class, you’ve probably come across John R. Trimble’s assertion that “all writing is rewriting.” Isn’t it funny that software is something developers write yet fear rewriting?

There’s a deep seated prejudice in this industry against taking a working piece of software and tinkering with it, except when it involves fixing a bug or adding a feature. It doesn’t matter if we’re talking about some small-scale refactoring, rewriting an entire project from scratch, or something in between. The prejudice probably comes from engineering — there’s no good reason to take a working watch or an engine apart because it looks “ugly” and you want to make it more “elegant.”

Software sits at the intersection of writing and engineering. Unlike pure writing, there are times when rewriting software is simply a bad idea. Unlike pure engineering, there are times when it is necessary and worthwhile to rewrite working code.

As Abelson and Sussman point out, “programs must be written for people to read, and only incidentally for machines to execute.” Rewriting software is necessary to keep code concise and easy to understand. Rewriting software to follow the herd or track the latest trend is pointless and a wasted effort.




Rewriting Software

2007-08-21T01:03:47.796-05:00

Ovid is wondering about rewrite projects. It’s a frequent topic in software, and there’s no one answer that fits all situations.One of the clearest opinions is from Joel Spolsky, who says rewrites are “the single worst strategic mistake that any software company can make”. His essay is seven years old, and in it, he takes Netscape to task for open sourcing Mozilla, and immediately throwing all the code away and rewriting it from scratch. Joel was right, and for a few years Mozilla was a festering wasteland of nothingness, wrapped up in abstractions, with an unhealthy dose of gratuitous complexity sprinkled on top. But this is open source, and open source projects have a habit of over-estimating the short term and under-estimating the long term. Adam Wiggins revisited the big Mozilla rewrite issue earlier this year when he said:[W]hat Joel called a huge mistake turned into Firefox, which is the best thing that ever happened to the web, maybe even the best thing that’s ever happened to software in general. Some “mistake.”What’s missing from the discussion is an idea from Brian Eno about the differences between the “small here” vs. the “big here”, and the “short now” vs. the “long now”. Capsule summary: we can either live in a “small here” (a great apartment in a crappy part of town) or a “big here” (a beautiful city in a great location with perfect weather and amazing vistas), and we can live in a “short now” (my deadline is my life) or a “long now” (how does this project change the company, the industry or the planet?).On the one hand, Joel’s logic is irrefutable. If you’re dealing with a small here and a short now, then there is no time to rewrite software. There are revenue goals to meet, and time spent redoing work is retrograde, and in nearly every case poses a risk to the bottom line because it doesn’t deliver end user value in a timely fashion.On the other hand, Joel’s logic has got more holes in it than a fishing net. If you’re dealing with a big here and a long now, whatever work you do right now is completely inconsequential compared to where the project will be five years from today or five million users from now. Requirements change, platforms go away, and yesterday’s baggage has negative value — it leads to hard-to-diagnose bugs in obscure edge cases everyone has forgotten about. The best way to deal with this code is to rewrite, refactor or remove it.Joel Spolsky is arguing that the Great Mozilla rewrite was a horrible decision in the short term, while Adam Wiggins is arguing that the same project was a wild success in the long term. Note that these positions do not contradict each other. Clearly, there is no one rule that fits all situations.The key to estimating whether a rewrite project is likely to succeed is to first understand when it needs to succeed. If it will be evaluated in the short term (because the team lives in a small here and a short now), then a rewrite project is quite likely to fail horribly. On the other hand, if the rewrite will be evaluated in the long term (because the team lives in a big here and a long now), then a large rewrite project just might succeed and be a healthy move for the project.Finally, there’s the “right here” and “right now” kind of project. Ovid talks about them briefly:If something is a tiny project, refactoring is often trivial and if you want to do a rewrite, so be it.In my experience, there are plenty of small projects discussed in meetings where the number of man hours discussing a change or rewrite far exceeds the amount of time to perform the work,[...]



Universal Floating Point Errors

2007-08-20T21:11:38.430-05:00

Steve Holden writes about Euler’s Identity, and how Python can’t quite calculate it correctly. Specifically,
e i π + 1 = 0
However, in Python, this isn’t quite true:
>>> import math
>>> math.e**(math.pi*1j) + 1
1.2246063538223773e-16j
If you note, the imaginary component is quite small: -1 x 10-16.

Python is Steve’s tool of choice, so it’s possible to misread his post and believe that python got the answer wrong. However, the error is fundamental. Witness:
$ ghci
Prelude> :m + Data.Complex
Prelude Data.Complex> let e = exp 1 :+ 0
Prelude Data.Complex> let ipi = 0 :+ pi
Prelude Data.Complex> e
2.718281828459045 :+ 0.0
Prelude Data.Complex> ipi
0.0 :+ 3.141592653589793
Prelude Data.Complex> e ** ipi + 1
0.0 :+ 1.2246063538223773e-16
As I said, it would be possible to misread Steve’s post as a complaint against Python. It is not. As he says:
I believe the results would be just as disappointing in any other language
And indeed they are, thanks to irrational numbers like π and the limitations of IEEE doubles.

Updated: corrected uses of -iπ with the proper exponent, .



Does Syntax Matter?

2007-08-15T22:09:21.178-05:00

An anonymous commenter on yesterday’s post posits that Haskell won’t become mainstream because of the familiar pair of leg irons:I think one of the biggest problems in Haskell, aside from it not being very easy (whats a monad?), is syntax.There are many reasons why Haskell may not become mainstream, but syntax and monads aren’t two of them. I’m a populist, so I get offended when a language designer builds something that’s explicitly designed to drag masses of dumb, lumbering programmers about half way to Lisp, Smalltalk, or some other great language. I want to use a language built by great language designers that they themselves not only want to use, but want to invite others to use.I could be wrong here. Maybe being a ‘mainstream programming language’ is means designing something down to the level of the great unwashed. I hope not. I really hope not. But it could be so. And if it is, that’s probably the one and only reason why Haskell won’t be the next big boost in programming language productivity. That would also disqualify O’Caml, Erlang and perhaps Scala as well. Time will tell.But syntax? Sorry, not a huge issue.Sure, C and its descendants have a stranglehold on what a programming language should look like to most programmers, but that’s the least important feature a language provides. Functional programmers, especially Lisp hackers have been saying this for decades. Decades. A language’s syntax is a halfway point between simplifying the job of the compiler writer and simplifying the job of the programmer. No one is going back and borrowing syntax from COBOL, because it’s just too damn verbose and painful to type. C is a crisp, minimal, elegant set of constructs for ordering statements and expressions, compared to its predecessors. Twenty years ago, the clean syntax like C provided made programming in all caps in Pascal, Fortran, Basic or COBOL seem quaint. Twenty years from now, programming with curly braces and semicolons could be just as quaint. Curly braces and semicolons aren't necessary, they're just a crutch for the compiler writer.To prove that syntax doesn’t matter, I offer 3 similar looking languages: C, Java (or C#, if you prefer) and JavaScript. They all use a syntax derives from C, but they are completely separate languages. C is a straight forward procedural language, Java is a full blown object oriented language (with some annoying edge cases), and JavaScript is a dynamic, prototype-based object oriented language. Just because a for loop looks the same in these three languages means absolutely nothing. Knowing C doesn’t help you navigate the public static final nonsense in Java, nor does it help you understand annotations, inner classes, interfaces, polymorphism, or design patterns. Going backward from Java to C doesn’t help you write const-correct code, or understand memory allocation patterns. Knowing C or Java doesn’t help much when trying to use JavaScript to its full potential. Neither language has anything resembling JavaScript’s dynamic, monkeypatch everything at runtime behavior. And even if you have a deep background in class-based object oriented languages, JavaScript’s use of prototypes will strike you as something between downright lovely and outright weird.If that doesn’t convince you, consider the fact that any programmer worthy of the title already uses multiple languages with multiple syntaxes. These typically include their language of choice, some SQL, various XML vocabularies, a few config file syntaxes, a couple of template syntaxes, some level of perl-compatible regular expressions, a she[...]



More SPJ

2007-08-15T00:30:26.173-05:00

If, like me, you couldn’t make it to OSCon this year and missed Simon Peyton Jones’ presentations, then you may want to catch up with these videos:Enjoy!



Haskell: more than just hype?

2007-08-15T00:14:05.673-05:00

Sometimes I wonder if the Haskell echo chamber is getting louder, or if programmers really are fed up with the status quo and slowly drifting towards functional programming. My hunch is that this is more than a mere echo chamber, and interest in Haskell and functional programming is for real.I’m hardly an objective observer here, since I’m solidly within the echo chamber (note the name of this blog). Functional programming has been on my radar for at least a decade, when I first started playing with DSSSL. Coding with closures now feels more natural than building class hierarchies and reusing design patterns, regardless of the language I am currently using. If you still aren’t convinced that Haskell is more than a shiny bauble lovingly praised by a lunatic fringe, here are some recent data points to consider, all involving Simon Peyton Jones:Bryan O’Sullivan pointed out last week that the Simon’s talks are the most popular videos from OSCon: “Simon’s Haskell language talks are the most popular of the OSCON videos, and have been viewed over 50% more times than the next ten most popular videos combined.”In Simon’s keynote presentation at OSCon last month, he points out that threading as a concurrency model is decades old, easy enough for undergraduates to master in the small, but damn near impossible to scale up to tens, hundreds or thousands of nodes. The research on threads has stalled for decades, and it’s time to find a better concurrency model that can help us target multiprocessor and distributed systems. (Nested data parallelism and software transactional memory both help, and are both available for experimentation in Haskell today.)An interview with Simon and Erik Meijer that introduces the topic of nirvana in programming. Start by separating programming languages along two axes: pure/impure and useful/useless. The top left has impure and useful languages like C, Java and C#. The bottom right has pure and useless languages like Haskell (at least before monadic I/O). The sweet spot is the top right, where a pure, useful language would be found, if it existed.However, the C# team is borrowing heavily from Haskell to produce LINQ, and the Haskell research community is folding the idea back into Haskell in the form of comprehensive comprehensions. Both languages are slowly converging on nirvana: C# is getting more pure, and Haskell is getting more useful.The subtext in all of these tidbits is that computing is not a static discipline. We know that hardware improves at a predictable pace, in the form of faster CPUs, cheaper memory, denser storage and wider networks. Software, or at least the way we construct software, also improves over time. One unscientific way to express this is through relative differences in productivity between programming languages. It’s reasonable to expect a project to take less time to create when using Java instead of C, and even less time when using Ruby or Python instead of Java or C#. By extension, there should be a new language that is even more productive than Python or Ruby. I happen to think that language will be Haskell, or at least very similar to Haskell. But it might also be OCaml, Erlang or Scala. In any case, Simon Peyton Jones is right — the way forward involves functional programming, whether it means choosing a language like Haskell, or integrating ideas from Haskell into your language of choice.[...]



Intro to Haskell, Parts 1, 2 and 3

2007-07-24T11:56:00.126-05:00

It's been a while since I've posted here (too long, in fact!).

Readers of this blog would probably be interested in a set of three articles I'm writing for the O'Reilly Network. The first two are available now:My goals here are to reach the core audience for ONLamp: Perl, Python, PHP and Ruby programmers. Haskell is a very big step away from dynamic languages, and it's also a big change from statically typed languages in the ALGOL family (C, and its descendants C++, Java and C#). The first article makes the case that, although Haskell is strange and different, it's worth the effort to learn -- even if you never manage to use it on a job or a side project. The second article shows how it is possible to write programs using pure functions in the absence of all the familiar tools -- classes, objects, globals and mutable variables.

I'm working on part 3, which will be an introduction to monads. This is something of a rite of passage, because writing monad intros is something of a cottage industry in the Haskell world -- either you've written one, or you're still a dabbler, beginner, or casually interested in Haskell (and not committed to delving into it).

Articles on ONLamp should be article length, oddly enough. That means roughly 2000 words or so. Given that tutorials like All About Monads run dozens of pages, condensing a full tutorial into a single article is hopeless.

Rather than attempt to outdo what many other fine tutorials are going very well already, I'm taking a different approach. The one thing I want to focus on is the mechanics of monads. That is, how they work. Maybe that will be the missing link that the next wave of programmers need to see before they can study Haskell.

Part 3 should be published in a couple of weeks.



Solving Collatz Sequences

2007-06-06T22:36:23.308-05:00

Steve, over at cod3po37ry, is learning Haskell and working through some contest problems, starting with the Collatz sequence. Steve doesn't mention it explicitly, but this is an interesting unsolved problem in mathematics. The problem starts out with a simple function over a single positive integer n: if n is even, halve it if n is odd, triple it and add one A sequence can be generated by starting with some positive integer n, generating the result of this function, and continually applying this function on the series of results until it produces a value of 1: f 2 = [2,1]f 3 = [3,10,5,16,8,4,2,1]f 4 = [4,2,1]f 5 = [5,16,8,4,2,1]f 6 = [6,3,10,5,16,8,4,2,1]f 7 = [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1] It's an open question as to whether or not this function always converges on a value of 1 for any positive integer n. Wikipedia claims that this property has been proven up through at least 10 × 258, but it's unclear if it's true for all positive integers.Steve's solution is a little, um, cumbersome, and he makes it clear that his goal is to learn: This post follows the Law of Usenet: if you post a question, nobody will answer it. If you post an answer, they'll all rush to correct it. I welcome corrections. In that spirit, here is my response. :-)Unlike most languages I have used, Haskell has a very interesting property -- if you find yourself writing a lot of code, chances are you're doing something wrong. Generally, this is because the Prelude and the standard library have a rich set of tools for building solutions from the bottom up. Writing a lot of code usually means that you're ignoring a key abstraction somewhere. Haskell programs tend to be short because they can be short; many common problems have generalized solutions in the library already. If you find a general problem that's not solved in the standard library, implementing a general solution typically involves writing one missing function.The first step to solving this problem involves the collatz function which implements the interesting 3n + 1 property:collatz :: Int -> Intcollatz 1 = 1collatz n = if (odd n) then (3 * n + 1) else n `div` 2Next, there's the function to produce a "collatz sequence" starting with a number n and (hopefully) terminating with the number 1. Fortunately, this behavior is precisely what the iterate function in the Prelude provides:iterate :: (a -> a) -> a -> [a]That is, iterate takes a function and a seed value, and returns a list that contains the seed, and an infinite sequence of values produced by applying the function to the previous value.Therefore, expanding this simple collatz function to a function that produces a collatz sequence is simply:collatzSequence :: Int -> [Int]collatzSequence = iterate collatzActually, that's not quite correct, since a collatz sequence terminates with a value of 1:collatzSequence :: Int -> [Int]collatzSequence = terminate . iterate collatz where terminate (1:_) = [1] terminate (x:xs) = x:terminate xsSure enough, this function works as expected:*Main> collatzSequence 7[7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]That solves the problem of generating collatz sequences. But the contest problem was to find the longest collatz sequence within a range of numbers. Given:1 10100 200201 210900 1000produce this output:1 10 20100 200 125201 210 89900 1000 174Generating these results is simple enough. First, convert a collatz sequence to its length:*Main> length $ collatzSequence 717Next, convert a range of integers into their collatz lengths:*Main> map (length . collatzSequence) [1..10][1,2,8,3,6,9,17,4[...]



Lessons from the White-bearded Professor

2007-05-24T16:54:29.853-05:00

The first part of my Introduction to Haskell series came out on ONLamp.com today. As always, there was a lot of material I wanted to mention that didn't fit. This first part is an exploration of why haskell deserves more widespread attention. (The next two parts cover functions and monads.)

One topic I wanted to cover dates back to when I was an undergrad. One of my professors, Jim Maginnis, was something of a village elder in computing. He wasn't one of the pioneers in computing that won a Turing Award, or wrote prolifically about his favorite pet trends in computing, or a discoverer of anything fundamental.

This white-bearded, gentle professor was happy teaching decades' worth of students the skills they needed to go out in the world and solve real problems. He felt great joy in both the topics and the students he taught, and that feeling was heartfelt and infectious.

One of the stories Prof. Maginnis told dates back to when he consulted with big businesses as they started computerizing their operations in the 50s, 60s and 70s. He began by recommending every project start by hiring a mathematician for a week or two to study the problem. (Heck, hire two grad students -- they're cheaper!) His point was that if a mathematician could find an interesting property or algorithm, then it would be money well spent, and drastically reduce the time/money/effort needed to develop a system. If they didn't find anything, well, it was only a week or two, and mathematicians are cheap, anyway.

That always struck me as sound advice. Certainly more practical in the early days of computing when everything was new. Today, most projects feel a lot more mundane and predictable, and maybe it isn't as necessary.

But there's always room for good research and deep thought. That's the kind of thinking that gave us relational database engines, regular expressions, automatic garbage collection and compiler generators.

I keep thinking about this anecdote when I describe Haskell to someone for the first time. Instead of taking two grad students for two weeks, get a few dozen of PhDs around the world focused on a problem for a couple of decades. Haskell is one of the things you might end up with. So are Unix, Erlang and Plan9, for that matter.

I wonder what Prof. Maginnis would think of the world of computing if he were alive today. I can't say for sure, but more than a few brilliant computer scientists have been working for more than a few weeks on solving some really hard problems. And I think he would be very happy with the results.



Haskell: Ready for Prime Time

2007-05-23T17:21:44.909-05:00

Bryan O’Sullivan, Don Stewart and John Goerzen announced today that they are working together on a book for O'Reilly that covers Haskell programming. Their mission is to cover the topics that are left uncovered in Haskell texts that focus on being undergraduate textbooks. The draft outline reads like a set of diffs that a Java/Perl/Python/Ruby programmer needs to understand to solve the problems they already know in a new language. And that is a very good thing indeed.

Last week, Eric also announced that he is working on a Haskell book for the Pragmatic Programmers.

It sounds like at least two publishers think there is pent-up demand for a decent, practical Haskell book that could sell more than "30 units per month". And that is certainly a good thing.

Good luck, everyone. I can't wait to read your books. :-)



Analyzing Book Sales

2007-05-17T23:32:32.909-05:00

O'Reilly does the tech community a great service by publishing their quarterly analyses of tech book sales. Yesterday, Mike Hendrickson posted part 4 of the Q1 07 analysis, which details programming languages.The book sales statistics aren't meaningful in and of themselves, because they simultaneously overstate and understate what's actually happening within the tech world. The general opinion is that Java is important, Ruby is hot, and Fortran is deader than dead, and the book sales statistics seem to back this up. Other data, like counting job postings on the web, can lead to similar conclusions. However, this isn't the entire story -- Fortran is still an incredibly useful language in certain circles (physics and climate simulations, for example), even if the people who rely on it don't buy books or hire frequently.Although book sales don't paint the whole story, they do provide show some interesting trends and point to questions that deserve further investigation.For example, Java and C# books each sell at a combined rate of over 50,000 units per quarter, which reinforces (but does not prove) the view that most of the 'developer mass' is focused on these languages. JavaScript, Ruby, and PHP are among the languages that are now shipping over 25,000 units per language per quarter.In Mike's 'second tier' grouping are languages like Perl and Python, which are now shipping roughly 10,000 units per quarter. That probably deserves some additional analysis; Perl is somewhat stagnant, due to Perl 5 appearing to be in maintenance mode as Perl hackers wait (and wait, and wait) for Perl 6. Also, many recent Perl titles have been hyper-focused on specific modules that aren't widely used, or are only marginally better than the online documentation. So perhaps this indicates that Perl programmers don't buy many Perl books, or perhaps it means that Perl programmers are moving to Java/C#/Ruby to pay the mortgage. Either way, this data is inconclusive.In the penultimate tier are languages that sell less than 1,000 units per quarter, and include Tcl, Lisp, Scheme, Lua and Haskell. Interestingly, 345 Haskell books sold in Q1 07, compared to 47 in Q1 06, an over 600% increase year-on-year. However, Mike also points out: "[t]he four Haskell titles are averaging fewer than 30 copies per month".[1]Again, this data is interesting, but inconclusive. Ruby titles experienced a meteoric rise a couple of years ago, when the Pragmatic Programmers released two bestsellers around Ruby. Before that, Ruby titles sold poorly; recently, they are selling over 25,000 units per quarter, and the trend is increasing.Maybe the answer here is that Haskell is poised to make a similar meteoric rise, and leave the functional programming ghetto. Maybe the answer is that people interested in learning Haskell aren't buying expensive new books, but rather buying used copies, borrowing books, or learning from online references. Maybe the answer is that the four English-language titles on Haskell aren't delivering what the book-buying public needs, indicating that the time is right for a Pragmatic Haskell to repeat the success of Programming Ruby.Somehow, I think the answer lies in a combination of these three possibilities.I know that when I was going through Haskell: The Craft of Functional Programming and The Haskell School of Expression, I read through them to figure out what the compiler was trying to do with the code I was trying to write. Once I got comfortable, I didn't refer back to them at all. Instead, I [...]



Parsing JSON

2007-05-03T22:25:17.063-05:00

Here is the JSON parser I referred to in the previous post. It's not heavily documented because it's pretty close to a 1:1 translation of the specification.There are some rough edges in this parser. The test suite includes the number 23456789012E666, which is out of range of IEEE doubles, and is read in as Infinity. While this value can be read in as something meaningful, it cannot be emitted, since there is no provision in JSON to express values like Infinity, -Infinity or NaN. The pretty-printer does not re-encode strings containing Unicode characters or escaped characters into a JSON-readable format. Finally, malformed inputs cause exceptions (error).module JSON whereimport Data.Charimport Data.Map hiding (map)import Text.ParserCombinators.Parsec hiding (token)--------------------------------------------------------------------------data JsonValue = JsonString String | JsonNumber Double | JsonObject (Map String JsonValue) | JsonArray [JsonValue] | JsonTrue | JsonFalse | JsonNull deriving (Show, Eq)---------------------------------------------------------------------------- Convenient parse combinatorstoken :: Parser a -> Parser atoken p = do r JsonValueparseJSON str = case (parse jsonFile "" str) of Left s -> error (show s) Right v -> vjsonFile :: Parser JsonValuejsonFile = do contents return '\n') (char 'r' >> return '\r') (char 't' >> return '\t') do char 'u' hex [...]



Namespaces Confusion

2007-05-03T22:12:22.670-05:00

I haven't said much about my talk for FringeDC in March. I've been meaning to write this up for a while.

This is a story about a horrific blunder. Thankfully, no Bothans died bringing this information to you.

As I mentioned previously, I wrote a JSON parser to demonstrate how to write a real, live working Haskell program. I started by working off of the pseudo-BNF found on the JSON homepage. From the perspective of the JSON grammar, the constructs it deals with are objects (otherwise known as Maps, hashes, dicts or associative arrays), arrays, strings, numbers, and the three magic values true, false and null.

My first task was to create a data type that captures the values that can be expressed in this language:
data Value = String String
| Number Double
| Object (Map String Value)
| Array [Value]
| True
| False
| Null
deriving (Eq)
With the datatype in place, I then started writing parsing functions to build objects, arrays, and so on. Pretty soon, I had a JSON parser that passed the validation tests.

I used this piece of working Haskell code during my presentation, highlighting how all the parts worked together -- the parsers that returned specific kinds of Value types, those that returned String values, and so on.

Pretty soon I got tongue tied, talking about how Value was a type, and why String was a type in some contexts, and a data constructor for Value types in other contexts. And how Number wasn't a number, but a Value.

I'm surprised anyone managed to follow that code.

The problem, as I see it, is that I was so totally focused on the JSON domain that I didn't think about the Haskell domain. My type was called Value, because that's what it's called in the JSON grammar. It never occurred to me as I was writing the code that a type called Value is pretty silly. And, because types and functions are in separate namespaces, I never noticed that the data constructor for strings was called String.

Thankfully, the code was in my editor, so I changed things on the fly during the presentation to make these declarations more (ahem) sane:
data JsonValue = JsonString String
| JsonNumber Double
| JsonObject (Map String JsonValue)
| JsonArray [JsonValue]
| JsonTrue
| JsonFalse
| JsonNull
deriving (Show, Eq)
I think that helped to clarify that String is a pre-defined type, and JsonString is a value constructor that returns something of type JsonValue.

When I gave this presentation again a couple of weeks ago, the discussion around this JSON parser was much less confusing.

Lesson learned: let the compiler and another person read your code to check that it makes sense. ;-)



Fight the next battle

2007-04-12T00:42:42.935-05:00

Tim O'Reilly posted some book sales data that indicates PHP is going mainstream:We've noticed that one of the signs that a language is becoming mainstream (and perhaps being abandoned by the cutting edge developers) is that the For Dummies book becomes the top seller. [...]In Q1 of 2005, the Dummies book was #7; in Q1 2006, #5; in Q1 2007 it's #1. Not only is the Dummies book now #1, four of the top five titles are now introductory.Tim raises an important point: PHP is going mainstream, and cutting edge developers are looking elsewhere.Many of the want ads I've seen for PHP recently read like we're looking for someone young and cheap to work like mad and crank out web pages. This is a little depressing, but not entirely surprising. The web has been the dominant development platform for over a decade; no one needs to pay top dollar to just build web apps (or worse, maintain and configure them). Why would a professional developer want to use PHP and work on a project where someone who picked up PHP For Dummies last month is nearly as qualified? The whole point behind being a professional developer is accepting progressively more challenging tasks, not switching from one entry level skill to the next.None of this reflects on PHP itself, just PHP's status as a mainstream tool suitable for beginners. Sure, there's interesting work for PHP experts, but the bulk of PHP work doesn't need an expert of any stripe.Tim's point is an expression of Paul Graham's Python Paradox, but in reverse:if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.Because PHP skews to beginners and is becoming mainstream, professional programmers are moving away from it. The problem with the Python Paradox is that it's both vague and fuzzy. By Paul's logic, if I'm starting a company, I should be using Factor or Io, because those languages are extremely esoteric.Somehow, I don't think esoteric-ness is the important quality.People who program professionally and truly care about their craft want to solve increasingly hard problems. They want to walk away from the old drugery and focus on what matters: solving the problem. Using the same tools over and over again means each project starts by fighting the last battle over again. Only then can the real work start: fighting the next battle, and solving the new problem.The "fight the last battle" syndrome helps explain why languages like Python and Ruby are on the rise with professional programmers, while languages like Java and PHP are on the decline. Java uses a heavyweight, cumbersome, static object model, while Python and Ruby use a lightweight, dynamic object model, supplemented with powerful core types like lists and dictionaries. PHP makes it possible to build web applications, while Ruby and Python offer a variety of frameworks to that make web applications easier to build with much less effort. Thus, Java and PHP are on the decline because they use old ideas and enforce older, more cumbersome styles of programming to solve problems than those offered by Python and Ruby.As Ruby and Python make dynamic languages respectable, the whole "static vs. dynamic" language debate sou[...]



Feedback on H:ACT

2007-04-11T22:06:18.332-05:00

I haven't posted anything since I gave my talk to FringeDC last month. Here's how it went.First, I want to thank Tom Moertel for his comment on limiting the amount of material to cover. I had outlined a lot of stuff that I wanted to cover, and it didn't really fit or flow very well. So I just started cutting stuff out. That actually worked very well.It was an informal talk, and mostly extemporaneous. The talk took about 90 minutes, with a decent amount of time for questions and discussion throughout. The slides were about the first half, and hit the highlights: program at the highest layer of abstraction possibleformal mathematical models for programming aren't scaryyou understand lambda calculus, even if you didn't know you understand ittype inferencing is goodexplicit typing is better, and not odiousmonads are wonderfulThe second half was a walk through of two Parsec parsers I wrote -- one to parse JSON in about a page of code (including a pretty printer), and a roman numeral parser. Both are trivial programs that do something more meaningful than the overused examples of fibonacci and factorial. There was some interesting discussion in this part of the talk.At the end of it all, Conrad Barski presented his demo of a project he's working on to simplify tagging. It was pretty interesting demo of something that he whipped up in MzScheme.About 20-25 people showed up for the talk, which is pretty impressive in DC. Many had some Haskell experience, dabbled in it, or were in the process of learning it. (I saw a few copies of various Haskell texts sitting around the room.)The most embarrassing part of the meeting came during the end of my slides, when I was discussing one benefit of the type system. Programming using the standard types is useful, but doesn't let the type checker know what you're trying to do. The true benefit is defining domain specific types and letting the type checker determine whether your program is sane.The example I used was the Mars Climate Orbiter, the mission that crashed into Mars because of a confusion between lbf and Newtons. Then I threw up some handwavy examples to show operations could be defined in terms of either Newtons or pounds of force, and let the compiler prevent this kind of mixup. The examples were handwavy because I wanted to refer people to the dimensional library, which automatically converts units of force, and prevents stupid things like adding feet to seconds and coming up with nonsense.Why was this embarassing? Because I said that this was Bjorn Bringert's library, when actually it's Bjorn Buckwalter's library. And, unbeknownst to me, Bjorn Buckwalter was sitting right in front of me during the entire presentation. What does he use this library for? Making sure things like the Mars Climate Orbiter don't happen again on his watch. :-)Overall, the talk went rather well. I got a lot of good feedback, and I think I covered just enough material.[...]