Added By: Feedage Forager Feedage Grade B rated
ansi  code  cube  data  detail  drupal  make  map  normal  point  search  sphere  surface  text  time  unicode  utf  work 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics

Updated: 2016-05-12T20:06:53+02:00


On TermKit


I've been administering Unix machines for many years now, and frankly, it kinda sucks. It makes me wonder, when sitting in front of a crisp, 2.3 million pixel display (i.e. a laptop) why I'm telling those pixels to draw me a computer terminal from the 80s. And yet, that's what us tech nerds do every day. The default Unix toolchain, marked in time by the 1970 epoch, operates in a world where data is either binary or text, and text is displayed in monospace chunks. The interaction is strictly limited to a linear flow of keystrokes, always directed at only one process. And that process is capable of communicating only in short little grunts of text, perhaps coalescing into a cutesy little ASCII art imitation of things that grown-ups call "dialogs", "progress bars", "tables" and "graphs". The Unix philosophy talks about software as a toolset, about tiny programs that can be composed seamlessly. The principles are sound, and have indeed stood the test of time. But they were implemented in a time when computing resources were orders of magnitude smaller, and computer interaction was undiscovered country. In the meantime, we've gotten a lot better at displaying information. We've also learned a lot of lessons through the web about data interchange, network transparency, API design, and more. We know better how small tweaks in an implementation can make a world of difference in usability. And yet the world of Unix is rife with jargon, invisible processes, traps and legacy bits. Every new adept has to pass a constant trial by fire, of not destroying their system at every opportunity it gives them. So while I agree that having a flexible toolbox is great, in my opinion, those pieces could be built a lot better. I don't want the computer equivalent of a screwdriver and a hammer, I want a tricorder and a laser saw. TermKit is my attempt at making these better tools and addresses a couple of major pain points. I see TermKit as an extension of what Apple did with OS X, in particular the system tools like Disk Utility and Activity Monitor. Tech stuff doesn't have to look like it comes from the Matrix. Rich Display It's 2011, and monospace text just doesn't cut it anymore. In the default ANSI color palette, barely any of the possible color combinations are even readable. We can't display graphs, mathematical formulas, tables, etc. We can't use the principles of modern typography to lay out information in a readable, balanced way. So instead, I opted for a front-end built in WebKit. Programs can display anything that a browser can, including HTML5 media. The output is built out of generic widgets (lists, tables, images, files, progress bars, etc.). The goal is to offer a rich enough set for the common data types of Unix, extensible with plug-ins. The back-end streams display output to the front-end, as a series of objects and commands. I should stress that despite WebKit it is not my intent to make HTML the lingua franca of Unix. The front-end is merely implemented in it, as it makes it instantly accessible to anyone with HTML/CSS knowledge. Pipes Unix pipes are anonymous binary streams, and each process comes with at least three: Standard In, Standard Out and Standard Error. This corresponds to the typical Input > Processing > Output model, with an additional error channel. However, in actual usage, there are two very different scenarios. One is the case of interactive usage: a human watches the program output (from Std Out) on a display, and types keystrokes to interact with it (into Std In). Another case is the data processing job: a program accepts a data stream in a particular format on Std In, and immediately outputs a related data stream on Std Out. These two can be mixed, in that a chain of piped commands can have a human at either end, though usually this implies non-interactive operation. These two cases are shoehorned into the same pipes, but happen quite differently. Human input is spontaneous, sporadic and error prone. Data input is strictly formatted and continuous. Hu[...]

My JS1K Demo - The Making Of


If you haven't seen it yet, check out the JS1K demo contest. The goal is to do something neat in 1 kilobyte of JavaScript code. I couldn't resist making one myself, so I pulled out my bag of tricks from my Winamp music visualization days and started coding. I'm really happy with how it turned out. And no, it won't work in Internet Explorer 8 or less. Edit: OH SNAP! I just rewrote the demo to include volumetric light beams and still fit in 1K: Original Version frameborder='0' height='345' id='js1kjs' src='about:blank' style='background:#000;border:0' width='460' /> Stop DemoStart DemoView Source Improved Version frameborder='0' height='345' id='js1kjs2' src='about:blank' style='background:#000;border:0' width='460' /> Stop DemoStart DemoView Source Now, whenever size is an issue, the best way to make a small program is to generate all data on the fly, i.e. procedurally. This saves valuable storage space. While this might seem like a black art, often it just comes down to clever use of (high school) math. And as is often the case, the best tricks are also the simplest, as they use the least amount of code. To illustrate this, I'm going to break down my demo and show you all the major pieces and shortcuts used. Unlike the actual 1K demo, the code snippets here will feature legible spacing and descriptive variable names. Initialization JS1K's rules give you a Canvas tag to work with, so the first piece of code initializes it and makes it fill the window. From then on, it just renders frames of the demo. There are four major parts to this: Animating the wires Rotating and projecting the wires into the camera view Coloring the wires Animating the camera All of this is done 30 times per second, using a normal setInterval timer: setInterval(function () { ... }, 33); Drawing Wires The most obvious trick is that everything in the demo is drawn using only a single primitive: a line segment of varying color and stroke width. This allows the whole drawing process to be streamlined into two tight, nested loops. Each inner iteration draws a new line segment from where the previous one ended, while the outer iteration loops over the different wires. The lines are blended additively, using the built-in 'lighten' mode, which means they can be drawn in any order. This avoids having to manually sort them back-to-front. To simplify the perspective transformations, I use a coordinate system that places the point (0, 0) in the center of the canvas and ranges from -1 to 1 in both coordinates. This is a compact and convenient way of dealing with varying window sizes, without using up a lot of code: with (graphics) {  ratio = width / height;  globalCompositeOperation = 'lighter';  scale(width / 2 / ratio, height / 2);  translate(ratio, 1);  lineWidthFactor = 45 / height;  ... I add a correction ratio for non-square windows and calculate a reference line width lineWidthFactor for later. Here, I'm using the with construct to save valuable code space, though its use is generally discouraged. Then there's the two nested for loops: one iterating over the wires, and one iterating over the individual points along each wire. In pseudo-code they look like: For (12 wires => wireIndex) {  Begin new wire  For (45 points along each wire => pointIndex) {    Calculate path of point on a sphere: (x,y,z)    Extrude outwards in swooshes: (x,y,z)    Translate and Rotate into camera view: (x,y,z)    Project to 2D: (x,y)    Calculate color, width and luminance of this line: (r,g,b) (w,l)    If (this point is in front of the camera) {      If (the last point was visible) {        Draw line segment from last point to (x,y)      }    }    else {      Mark this point as invisible    }    Mark beginning of n[...]

Making Worlds 4 - The Devil's in the Details


Last time I'd reached a pretty neat milestone: being able to render a somewhat realistic rocky surface from space. The next step is to add more detail, so it still looks good up close. Adding detail is, at its core, quite straightforward. I need to increase the resolution of the surface textures, and further subdivide the geometry. Unfortunately I can't just crank both up, because the resulting data is too big to fit in graphics memory. Getting around this will require several changes. Strategy Until now, the level-of-detail selection code has only been there to decide which portions of the planet should be drawn on screen. But the geometry and textures to choose from are all prepared up front, at various scales, before the first frame is started. The surface is generated as one high-res planet-wide map, using typical cube map rendering: This map is then divided into a quad-tree structure of surface tiles. It allows me to adaptively draw the surface at several pre-defined levels of detail, in chunks of various sizes. Source This strategy won't suffice, because each new level of detail doubles the work up-front, resulting in exponentially increasing time and memory cost. Instead, I need to write an adaptive system to generate and represent the surface on the fly. This process is driven by the Level-of-Detail algorithm deciding if it needs more detail in a certain area. Unlike before, it will no longer be able to make snap decisions and instant transitions between pre-loaded data: it will need to wait several frames before higher detail data is available. Uncontrolled growth of increasingly detailed tiles is not acceptable either: I only wish to maintain tiles useful for rendering views from the current camera position. So if a specific detailed portion of the planet is no longer being used—because the camera has moved away from it—it will be discarded to make room for other data. Generating Individual Tiles The first step is to be able to generate small portions of the surface on demand. Thankfully, I don't need to change all that much. Until now, I've been generating the cube map one cube face at a time, using a virtual camera at the middle of the cube. To generate only a portion of the surface, I have to narrow the virtual camera's viewing cone and skew it towards a specific point, like so: This is easy using a mathematical trick called homogeneous coordinates, which are commonly used in 3D engines. This turns 2D and 3D vectors into respectively 3D and 4D. Through this dimensional redundancy, we can then represent most geometrical transforms as a 4x4 matrix multiplication. This covers all transforms that translate, scale, rotate, shear and project, in any combination. The right sequence (i.e. multiplication) of transforms will map regular 3D space onto the skewed camera viewing cone. Given the usual centered-axis projection matrix, the off-axis projection matrix is found by multiplying with a scale and translate matrix in so-called "screen space", i.e. at the very end. The thing with homogeneous coordinates is that it seems like absolute crazy talk until you get it. I can only recommend you read a good introduction to the concept. With this in place, I can generate a zoomed height map tile anywhere on the surface. As long as the underlying brushes are detailed enough, I get arbitrarily detailed height textures for the surface. The normal map requires a bit more work however. Normals and Edges As I described in my last entry, normals are generated by comparing neighbouring samples in the height map. At the edges of the height map texture, there are no neighbouring samples to use. This wasn't an issue before, because the height map was a seamless planet-wide cube map, and samples were fetched automatically from adjacent cube faces. In an adaptive system however, the map resolution varies across the surface, and there's no guarantee that those neighbouring tiles will be available at the desired resolution. [...]

Making Worlds 3 - That's no Moon...


It's been over two months since the last installment in this series. Oops. Unfortunately, while trying to get to the next stage of this project, I ran into some walls. My main problem is that I'm not just creating worlds, but also learning to work with the Ogre engine and modern graphics hardware in particular. This presents some interesting challenges: between my own code and the pixels on the screen, there are no less than three levels of indirection. First, there's Ogre, a complex piece of C++ code that provides me with high-level graphics tools (i.e. objects in space). Ogre talks to OpenGL, which abstracts away low-level graphics operations (i.e. commands necessary to draw a single frame). The OpenGL calls are handed off to the graphics driver, which translates them into operations on the actual hardware (processing vertices and pixels in GPU memory). Given this long dependency chain, it's no surprise that when something goes wrong, it can be hard to pinpoint exactly where the problem lies. In my case, an oversight and misunderstanding of an Ogre feature lead to several days of wasted time and a lot of frustration that made me put aside the project for a while. With that said, back to the planets... Normal mapping Last time, I ended with a bumpy surface, carved by applying brushes to the surface. The geometry was there, but the surface was still just solid white. To make it more visually interesting, I'm going to apply light shading. The most basic information you need for shading a surface is the surface normal. This is the vector that points straight away from the surface at a particular point. For flat surfaces, the normal is the same everywhere. For curved surfaces, the normal varies continuously across the surface. Typical materials reflect the most light when the surface normal points straight at the light source. By comparing the surface normal with the direction of incoming light (using the vector dot product), you can get a good measure of how bright the surface should be under illumination: Lighting a surface using its normals. To use normals for lighting, I have two options. The first is to do this on a geometry basis, assigning a normal to every triangle in the planet mesh. This is straightforward, but ties the quality of the shading to the level of detail in the geometry. A second, better way is to use a normal map. You stretch an image over the surface, as you would for applying textures, but instead of color, each pixel in the image represents a normal vector in 3D. Each pixel's channels (red, green, blue) are used to describe the vector's X, Y and Z values. When lighting the surface, the normal for a particular point is found by looking it up in the normal map. The benefit of this approach is that you can stretch a high resolution normal map over low resolution geometry, often with almost no visual difference. Lighting a low-resolution surface using high-resolution normals. Here's the technique applied to a real model: (Source - Creative Commons Share-alike Attribution) Normal mapping helps keep performance up and memory usage down. Finding Normals So how do you generate such a normal map, or even a single normal at a single point? There are many ways, but the basic principle is usually the same. First you calculate two different vectors which are tangent to the surface at the point in question. Then you use the cross product to find a vector perpendicular to the two. This third vector is unique and will be the surface normal. For triangles, you can pick any two triangle edges as vectors. In my case, the surface is described by a heightmap on a sphere, which makes things a bit trickier and requires some math. I asked my friend Djun Kim, Ph.D. and teacher of mathematics at UBC for help and he recommended Calculus on Manifolds by Michael Spivak. This deceptively small and thin book covers all the basics of calculus in a dense and compact way, and quickly became my new favo[...]

Making Worlds 2 - Scaling Heights


Last time, I had a working, smooth sphere mesh. The next step is to create terrain. Scale Though my goal is to render at a huge range of scales, I'm going to focus on views from space first. That strongly limits how much detail I need to store and render. Aside from being a good initial sandbox in terms of content generation, it also means I can comfortably keep using my current code, which doesn't do any sophisticated memory or resource management yet. I'd much rather work on getting something interesting up first rather than work on invisible infrastructure. That said, this is not necessarily a limitation. The interesting thing about procedural content is that every generator you build can be combined with many others, including a copy of itself. In the case of terrain, there are definite fractal properties, like self-similarity at different levels of scale. This means that once I've generated the lowest resolution terrain, I can generate smaller scale variations and combine them with the larger levels for more detail. This can be repeated indefinitely and is only limited by the amount of memory available. Perlin Noise is a celebrated classic procedural algorithm, often used as a fractal generator. Height To build terrain, I need to create heightmaps for all 6 cube faces. Shamelessly stealing more ideas from Spore, I'm doing this on the GPU instead of the CPU, for speed. The GPU normally processes colored pixels, but there's no reason why you can't bind a heightmap's contents as a grayscale (one channel) image and 'draw' into it. As long I build my terrain using simple, repeated drawing operations, this will run incredibly fast. In this case, I'm stamping various brushes onto the sphere's surface to create bumps and pits. Each brush is a regular PNG image which is projected onto the surface around a particular point. The luminance of the brush's pixels determines whether to raise or lower terrain and by how much. Three example brushes from Spore. (source) However, while the brushes need to appear seamless on the final sphere, the drawing area consists only of the straight, square set of cube map faces. It might seem tricky to make this work so that the terrain appears undistorted on the curved sphere grid, but in fact, this distortion is neatly compensated for by good old perspective. All I need to do is set up a virtual scene in 3D, where the brushes are actual shapes hovering around the origin and facing the center. Then, I place a camera in the middle and take a snapshot both ways along each of the main X, Y and Z directions with a perfect 90 degree field of view. The resulting 6 images can then be tiled to form a distortion-free cube map. Rendering two different cube map faces. The red area is the camera's viewing cone/pyramid, which extends out to infinity. To get started I built a very simple prototype, using Ogre's scene manager facilities. I'm starting with just a simple, smooth crater/dent brush. I generate all 6 faces in sequence on the GPU, pull the images back to the CPU to create the actual mesh, and push the resulting chunks of geometry into GPU memory. This is only done once at the beginning, although the possibility is there to implement live updates as well. Here's a demo showing a planet and the brushes that created it, hovering over the surface. I haven't implemented any shading yet, so I have to toggle back and forth to wireframe mode so you can see the dents made by the brushes: allowfullscreen='allowfullscreen' frameborder='0' height='380' src='' style='margin: 0 auto;' width='560' /> The cubemap for this 'planet' looks like this when flattened. You can see that I haven't actually implemented real terrain carving, because brushes cause sharp edges when they overlap: The narrow dent on the left gets distorted and angular where it crosses the cube edge. This is a normal consequence of the cubemapping[...]

Making Worlds 1 - Of Spheres and Cubes


Let's start making some planets! Now, while this started as a random idea kind of project, it was clear from the start that I'd actually need to do a lot of homework for this. Before I could get anywhere, I needed to define exactly what I was aiming for. The first step in this was to shop around for some inspirational art and reference pictures. While there is plenty space art to be found online, in this case, nothing can substitute for the real thing. So I focused my search on real pictures, both of landscapes (terran or otherwise) as well as from space. I found classy shots like these: Hopefully I'll be able to render something similar in a while. At the same time, I eagerly devoured any paper I could find on rendering techniques from the past decade, some intended for real-time rendering, some old enough to be real-time today. Out of all this, I quickly settled on my goals: Represent spherical or lumpy heavenly bodies from asteroids to suns. With realistic looking topography and features. Viewable across all scales from surface to space. At flight-simulator levels of detail. Rendered with convincing atmosphere, water, clouds, haze. For most of these points, I found one or more papers describing a useful technique I could use or adapt. At the same time, there are still plenty of unknowns I'll need to figure out along the way, not to mention significant amounts of fudging and experimentation. The Spherical Grid To get started I needed to build some geometry, and to do that I needed to figure out what geometry I should use. After reviewing some options, I quickly settled on a regular spherical displacement map (AKA a heightmap). That is, starting with a smooth sphere, move every surface point up or down, perpendicular to the surface, to create terrain on the surface. If these vertical displacements are very small compared to the sphere radius, this can represent the surface of a typical planet (like Earth) at the levels of detail I'm looking for. If the displacements are of the same order as the sphere radius, you can deform it into very irregular potato-like shapes. The only thing heightmaps can't do is caves, tunnels, overhang and other kinds of holes, which is fine for now. The big question is, how should the spherical surface be divided up and represented? With a sphere, this is not an easy question, because there is no single obvious way to divide a spherical surface into regular sections or grids. Various techniques exist, each with their own benefits and specific use cases, and I spent quite some time looking into them. Here's a comparison between four different tesselations: Source Note that the tesselation labeled ECP is just the regular geographic latitude-longitude grid. The main features I was looking for were speed and simplicity, so I settled on the 'quadcube'. This is where you start with a cube whose faces have been divided into regular grids, and project every surface point out from the middle to an enclosing sphere. This results in a perfectly smooth sphere, built out of 6 identical round shells with curved edges. This arrangement is better known as the 'cube map' and often used for storing arbitrary 360 degree panorama views. Here's a cube and its spherical projection: The projected cube edges are indicated in red. Note that the resulting sphere is perfectly smooth and round, even though the grid has a bulgy appearance. Cube maps are great, because they are very easy to calculate and do not require complicated trigonometry. In reverse, mapping arbitrary spherical points back onto the cube is even simpler and in fact natively supported by GPUs as a texture mapping feature. This is important, because I'll be generating the surface terrain and texture dynamically and will need to index and access each surface 'pixel' efficiently. Using a cube map, I simply identify the corr[...]

Making Worlds: Introduction


For the past year or so I've been reacquainting myself with an old friend: C++.

More specifically, I've been exploring graphics programming again, this time with the luxurious flexibility of the modern GPU at my fingertips. To get me started, I shopped around for an open source engine to play with. After trying Irrlicht and finding its promises to be a bit lacking, Ogre turned out to be a really good choice. Though its architecture is a bit intimidating at first, it is all the more sound. More importantly, it seems to have a relatively healthy open-source community around it.

So with Ogre as my weapon of choice, I've started a new project: Making Worlds. More specifically, I want to procedurally generate a 3D planet, viewable from outer space as well as the ground (at flight-sim levels of detail), which can be rendered real-time on recent graphics hardware.

Why? Because I really like procedural content generation. It's an odd discipline where anything goes, and techniques from across mathematics, engineering and physics are applied. Then, you add a good dose of creativity and artistic sense, and perhaps mix in some real-world data too, until you find something that looks right.

Plus, far from being an exercise in pointlessness, procedural content is gaining in popularity, especially for video games.

So, in the style of Shamus Young's excellent Procedural city series, I'm going to start blogging about Making Planets. Unlike him however, I'm not going to adhere to a strict schedule.


Here's a teaser for the first installment.

JavaScript audio synthesis with HTML 5


HTML5 gives us a couple new toys to play with, such as

Enter the JavaScript audio synth. It generates a handful of samples using very basic time-domain synthesis, wraps them up in a WAVE file header and embeds them in

My final attempt was to generate tons of periodic audio loops only a couple of ms long, and to play them back with looping turned on while altering each tag's volume in real time, hence doing a sort of additive wavetable synthesis. Unfortunately, looping is not a fully supported feature, and the only browser I found that does it (Safari) doesn't loop seamlessly at all.

All in all, my first brush with the

Sadly, undeniably true


... at least the last bit:

“when the world ends, the only things left will be cockroaches, rats, Keith Richards, and mangled text that has been escaped one-too-many or one-too-few times” — Dave Walker

(found this little gem via Sam Ruby)

Cocoa, Lemons and Geeks


Greetings from Amsterdam. I'm here for the second day of CocoaDevHouse! About 20 geeks have been camping out in the Post C.S. building to gather, discuss, develop and generally have fun.

I've mainly come to work on my first Cocoa app (LemonJuice) and benefit from the expertise of people who actually know the API ;). I've gone from "typing a bunch of random crap" to "doing cool stuff with ridiculously small amounts of code". Cocoa is definitely interesting, and far more powerful than the Windows API I've used for a couple of years.

More details inside...

The application I've been coding is essentially, a WebKit based Terminal. I dislike the fact that a typical bash session in a GUI terminal is not a very consistent experience. For example, bash does not understand the normal home/end keys. And because you're not typing in a normal textbox, you can't select text by holding shift, have spell checking, and non-destructive autocompletion. What's also important to me is that the typical shell command now outputs a very ugly, very UNIXy stream of text. Whereas usually, that information is quite structured (a list of files, a list of key/value pairs, a timestamp, a media file, ...). Files and hyperlinks should be clickable. Tables should be sortable. Etc. etc.

So, I want to use WebKit (the engine behind Safari) to make a Terminal that outputs nicey styled HTML. A directory listing would emerge as a nicely formatted table with icons, for example. But you could just as well pipe that list into another application, without losing the structure (sort of what Microsoft's Monad is supposed to be like).

(image) Of course, I have no clue if I can build such a thing. I'm essentially going blind now, coding until I have a good prototype ;). So far I've got "cd", "ls" and "quit" working. Still, most of the work now has been trying to figure out how I can achieve things in a way that is Cocoa-ish and clean. At least I'm having fun...

Current prototype:


Handy Drupal Core Development


Some quick tips for better productivity when developing Drupal core:

  • Alias your editor to e. If you use a GUI editor see if it comes with a command-line shortcut to use. TextMate by default has mate. Not nearly short enough ;).
  • Set up a d command to perform diffs. I use the following: #!/bin/bash
    cvs diff -u -N -F^f . | grep -v -e ^\? > $1.patch
    e $1.patch
    This opens up my editor afterwards so I can review the patch before submitting. The grep strips out unnecessary junk (unknown files).
  • Set up a p command to apply patches. I use the following: #!/bin/bash
    wget -O - $1 | patch -p0
    This will take a patch URL and apply it locally.

Anyone else have anymore ideas?

The Cocoa Journey Begins


Life is full of nice surprises.

Last week I decided I would start learning Cocoa (for the unaware, that's one of the two APIs/Frameworks for creating applications for OS X). Partially, because I have some cool ideas I want to try out. Mostly, because I want to be able to make applications that are just as nice as all those other sweet programs I've come to depend on since I joined the cult.

So when I was doing some undirected browsing yesterday I found out that Andy 'Termie' Smith is helping organise CocoaDevHouse Amsterdam. I'm definitely going to go there. It'll be interesting to be a total newbie at a geek event for once ;).

It's only 2.5 hours by regular train, and I have a laptop and a copy of the Apple Developer Docs to keep me occupied on the way. Interesting for those of you on the other side of the pond: the typical Belgian (and generally Western-European) mentality would consider such a trip a significant adventure. That's what happens when everyone is so focused on their own little patch of land. For me, it's now just 'popping over'. I might be back later for Barcamp Amsterdam II too.

Degradable Javascript Widget Fun


At BarCamp Amsterdam, I worked with Adrian Rossouw on a UI for styling a website. The result is a pretty cool color picker like in Gimp or Photoshop, but without Flash or Java. It just uses Javascript, CSS and transparent PNGs. It degrades to regular textboxes where you type/paste an HTML color code.

A bit later, Chris Messina suggested a slider control. Not much later, it was finished. It degrades to a plain select box, which is where the slider values are taken from. Its main purpose is to be used to select between options, and not for arbitrary continuous ranges.

These will soon be coming to a Drupal site near you after some more polishing and bug testing. Whether they will be used in Drupal 4.7 remains to be seen (though I can already think of a few spots where they would be useful).

Yay for open-source developer cross pollination :).

Summer of Code - Ajax Functionality for Drupal


This last summer I was sponsored by Google as part of their Summer of Code progam to work on Drupal. My goal was to introduce various AJAX functionalities to Drupal.

The official project description was:

"Drupal has recently begun to find meaningful ways to introduce AJAX functionality with the goal of improving the user experience. Work with Drupal's usability experts to identify the next steps and help implement new dynamic functions based on interaction with the XMLHttpRequest object."

I focused on the following Ajax-powered features:

  • Inline Editing of posts: Though I built a working prototype module, I decided not to develop this feature further because it is not flexible enough to work as a generic Drupal module. It would break on too many configurations and has limited usefulness anyhow.
  • Uploading of files: allows you to attach files to Drupal nodes (with upload.module) without having to reload the page.
  • Sorting tables inside a page: this changes the sort order of a table without reloading the entire page. It is not client-side sorting as you'd expect at first sight: because most tables in Drupal are spread across multiple pages, client-side sorting is not very useful.
  • Switching between multiple pages: this was implemented on top of the sorting functionality, and only works on paged tables (this covers most of the useful pagers though).
  • Progressbar widget: a typical progressbar that fetches the status from the server through Ajax.

The resulting code can be found in my sandbox in the Drupal contributions repository. Note however that most of the code is in patches against the (rapidly changing) Drupal HEAD, so they are likely to go out of date soon.

The file uploader is now already part of the Drupal HEAD, and at least the tablesorter is sitting in the patch queue being reviewed. I will try and keep them up to date.

A big thanks goes to Google for organising the Summer of Code!

PHP, Unicode and ostriches.


Update: I've written a follow-up post that describes how I would like PHP's encoding support to be.

As the resident encoding geek on the Drupal team, it's usually my job to make sure Drupal handles encodings and Unicode correctly. I don't mind doing this, but PHP doesn't exactly make it easy. With the new search.module for Drupal 4.6 being Unicode-aware, this has become very obvious, as we've had to bump up the minimum required version of PHP to 4.3.3. The UTF-8 support in the Perl-compatible regular expressions in PHP 4.3.2 and earlier is completely broken. And now I've had a bug report about someone on PHP 4.3.8 who still had problems getting it to work.

I don't know why exactly, but as far as encodings go PHP is still in the stone-age. This is odd, as you'd expect a web-oriented scripting language to have excellent support for sharing and exchanging textual information. There is a multi-byte string extension available, but it's not available on 90% of PHP hosts out there, and it's more of a black-box library anyway: it does not present you your strings as Unicode character codepoints, but still as an array of bytes. Furthermore, if you actually enable the mbstring overrides, you lose the ability to work with bytes at will. Apparently, the PHP team still hasn't figured out that bytes and characters are not the same. The other extensions which deal with encodings (iconv, recode) are also unavailable on the majority of PHP installs out there.

This means that if you want to make a PHP application which supports any language and runs on the average PHP host out there, that there's only one option: use UTF-8 internally, and write your own functions for string truncation, email header encoding, validation, etc. Using UTF-8 ensures that you only have one encoding to worry about and because it's Unicode it is guaranteed to be able to represent any language. Of course, you will no longer be able to do something simple as upper/lowercasing a string, as these PHP functions don't take UTF-8 at all.

What PHP needs is Unicode string support in the core, along with a good library of useful functions for handling the very large Unicode character range efficiently. ASP, Perl, Python, Java all have it... for me, it's the only thing that would've made PHP5 worth to upgrade to.

It's as if the entire PHP team has stuck their head in the ground, hoping that all this Unicode stuff will somehow blow over. It won't.

Sprankle Character Map


It hit me a while ago that entering characters which are not available on your keyboard or through your IME is much too complicated. Usually it involves opening up some character map, scrolling through hundreds of symbols to find the one you need and copy/pasting it into the application of your choice.

Not very handy. Enter Sprankle Character Map. The idea is to hit a special key combination when typing (WIN + S for Sprankle) which pops up a character map where you are typing. You then type a symbol to find similar characters and choose one from the list using either numbers or arrows + space. Here's how it looks.


This is just a prototype, but it demonstrates the idea nicely and it's actually pretty usable. Certainly better than firing up a full character map every time.


  • Sprankle is a Unicode-application and only runs on Windows 2000/XP.
  • The map appears on top of the current text field. For large, multi-line text fields this is far from ideal. It would be better to have it appear at the current caret position.
  • Sprankle doesn't work on Mozilla Firefox (or other applications that do special keyboard processing). If anyone has an idea on how to fix this, please tell.
  • It might be better to implement Sprankle as a real IME so it integrates completely with the text field. I have no idea how to do this though, but I'm sure MSDN has some documentation about it. The downside would be that it might not work in combination with existing IMEs (e.g. for Japanese).
  • Many of the symbols in the character set are not present in most fonts. Sprankle currently looks for Arial Unicode MS, the universal font that comes with XP and Office.
  • It might be cool to make a JavaScript version of this, so it can be integrated on websites with CMSes like Drupal.
  • You can customize Sprankle's character sets by editing sprankle.txt (UTF-16LE encoded). Right now it covers most of the Latin characters, basic Greek plus some math symbols.

Download Sprankle (source + win32 binary).


Drupal search improvements


I've finished up my search improvements patch for Drupal. It is now ready to be committed to core (if approved).


  • Clean up the text analyser: make it handle UTF-8 and all sorts of characters. The word splitter now does intelligent splitting into words and supports all Unicode characters. It has smart handling of acronyms, URLs, dates, ...
  • It now indexes the filtered output, which means it can take advantage of HTML tags. Meaningful tags (headers, strong, em, ...) are analysed and used to boost certain words scores. This has the side-effect of allowing the indexing of PHP nodes.
  • Link analyser for node links. The HTML analyser also checks for links. If they point to a node on the current site (handles path aliases) then the link's words are counted as part of the target node. This helps bring out commonly linked FAQs and answers to the top of the results.
  • Index comments along with the node. This means that the search can make a difference between a single node/comment about 'X' and a whole thread about 'X'. It also makes the search results much shorter and more relevant (before this patch, comments were even shown first).
  • We now keep track of total counts as well as a per item count for a word. This allows us to divide the word score by the total before adding up the scores for different words, and automatically makes noisewords have less influence than rare words. This dramatically improves the relevancy of multiword searches. This also makes the disadvantage of now using OR searching instead of AND searching less problematic.
  • Includes support for text preprocessors through a hook. This is required to index Chinese and Japanese, because these languages do not use spaces between words. An external utility can be used to split these into words through a simple wrapper module. Other uses could be spell checking (although it would have no UI).
  • Indexing is now regulated: only a certain amount of items will be indexed per cron run. This prevents PHP from running out of memory or timing out. This also makes the reindexing required for this patch automatic. I also added an index coverage estimate to the search admin screen.
  • Code cleanup! Moved all the search stuff from into search.module, rewired some hooks and simplified the functions used. The search form and results now also use valid XHTML and form_ functions. The search admin was moved from search/configure to admin/search for consistency.
  • Improved search output: we also show much more info per item: date, author, node type, amount of comments and a cool dynamic excerpt à la Google. The search form is now much more simpler and the help is only displayed as tips when no search results are found.
  • By moving all search logic to SQL, I was able to add a pager to the search results. This improves usability and performance dramatically.

UFPDF: Unicode/UTF-8 extension for FPDF


Note: I wrote UFPDF as an experiment, not as a finished product. If you have problems using it, don't bug me for support. Patches are welcome though, but I don't have much time to maintain this.

FPDF is a PHP class for generating PDF files on-the-fly. Unfortunately it does not support Unicode. So I've coded UFPDF, an extension of FPDF which accepts input in UTF-8.

Only TrueType fonts are supported for now. To embed .TTF files, you need to extract the font metrics and build the required tables using the provided utilities (see README.txt). Included is a modified version of TTF2PT1 which extracts the Unicode glyph info.

UFPDF works the same as FPDF, except that all text is in UTF-8, so consult the FPDF documentation for usage.

Download UFPDF Example PDF

UTF-8 conversion support for mIRC


mIRC's lack of UTF-8 support has been an issue for quite some time. The author promised to 'look at it', but in the meantime, chatting in UTF-8 is not possible. This is problematic for any language that uses more than the occasional accented letter.

So I decided to make a temporary fix myself. The result is a flexible conversion mechanism between UTF-8 and the ANSI codepages. The user sees and types regular ANSI characters, but all data which is sent to and received from the IRC server is UTF-8 encoded. You are still limited to one ANSI codepage though: making mIRC support real Unicode is not possible without an mIRC rewrite.

The script performs a real UTF-8 encoding/decoding, so unlike a simple 'find and replace' approach, characters which do not fit into the current codepage are indicated as such.

I included conversion tables for all of the Windows ANSI codepages:

  • 1250 (ANSI - Central Europe)
  • 1251 (ANSI - Cyrillic)
  • 1252 (ANSI - Western Europe / Latin I)
  • 1253 (ANSI - Greek)
  • 1254 (ANSI - Turkish)
  • 1255 (ANSI - Hebrew)
  • 1256 (ANSI - Arabic)
  • 1257 (ANSI - Baltic)
  • 1258 (ANSI/OEM - Viet Nam)

There is also a little utility (with source) for generating conversion tables for more codepages.

For instructions on how to use it, check the top of the utf-8.mrc file. You can download the script here (19 KB).

Important: This script is provided as-is without any guarantees. Use it if you like it, but don't bug me if you can't get it to work. If you find bugs, feel free to report them, but try to give a little more information than just 'it doesn't work'.

Drupal filter system updating


For a long time people have been complaining about the filter system in Drupal. This is the part that handles the transformation from user-supplied input into the HTML output, and takes such responsibilities like HTML tag stripping, code tags, auto-links, etc. Like most parts of Drupal, it's very modular and pluggable. Still, it doesn't do what most people want. In fact, it lacks some features which are present in most other CMS. To address these issues I've been thinking about a major filter system upgrade for a couple of months, but I haven't had time to actually do it, until now. The root of the problem is that Drupal only has one global filtering profile: the same settings and rules are applied to all input, regardless of who posted it or where it was posted. Administrators cannot have looser filters than anonymous visitors. In some cases (blocks, book and site pages), some customizability is available through a module-specific selector for text, HTML or PHP code, which is then only available to administrators, but this is independent of the filter system. My solution basically consists of multiple filter profiles. Instead of one global profile, administrators will be free to define as many profiles as they want. Each profile contains its own filter configuration: which filters are enabled, in what order and with what settings. Access to filter profiles is configurable with roles. In addition to this, some small extra filters will be created out of current pieces of code. For example, one for evaluating PHP code. Instead of block, book and page.module each having a PHP type, the admin can simply set up a PHP filtering profile, restricted to admins, and enter content with that type in the blocks and pages to be run as PHP code. For anonymous users, only one profile is likely to be available, and in that case nothing changes for them. Only when multiple profiles are enabled do you get a selector (dropdown or radio) below a textarea to choose the format. Now, the idea sounds nice, but how do we implement it? 1) Filters need to be made profile-aware. Since the filter-ordering changes in 4.4, filters have grown already from simple hooks to registered things. We simply expand the filter hook and require modules to store information per-profile. This is not a problem because most configuration is done with Drupal variables anyway: simple prefixing will work. For complex filters which have extra setting pages, the module can decide to have global settings or per-profile settings itself. For example, smileys.module will probably not require separate sets of smileys per profile: you either have smileys enabled or you don't. 2) How to store type information Secondly, and this is the biggest problem, is where and how to store the information about which profile a particular piece of content uses. Either we provide a function to output a profile form selector and put the responsibility for using it in modules, or we simply include the selector with form_textarea, and pick a standard format for handling metadata about pieces of text (a fieldname_meta column for fieldname? change textfields into arrays with 'text' 'type' members?). I prefer the form_textarea method because it fits in with how we now handle tips about filtering below textareas. 3) Updating modules that display content When a module has to display a piece of user-supplied text and passes it to check_output, it would also have to pass the profile used. This is all that is needed, so it keeps the hassle minimal. Checking which pro[...]