Subscribe: blog
http://keithp.com/blog/index.rss
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
bit  code  display  drm master  drm  frame  heap  kernel  lease  master  new  resources  time  vulkan  window system 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: blog

blog



keithp.com



Published: Sat, 01 Jul 2017 01:45:31 -0700

 



DRM-lease-4

Sat, 01 Jul 2017 01:45:31 -0700

DRM leasing part three (vblank) The last couple of weeks have been consumed by getting frame sequence numbers and events handled within the leasing environment (and Vulkan) correctly. Vulkan EXT_display_control extension This little extension provides the bits necessary for applications to track the display of frames to the user. VkResult vkGetSwapchainCounterEXT(VkDevice device, VkSwapchainKHR swapchain, VkSurfaceCounterFlagBitsEXT counter, uint64_t *pCounterValue); This function just retrieves the current frame count from the display associated with swapchain. VkResult vkRegisterDisplayEventEXT(VkDevice device, VkDisplayKHR display, const VkDisplayEventInfoEXT *pDisplayEventInfo, const VkAllocationCallbacks *pAllocator, VkFence *pFence); This function creates a fence that will be signaled when the specified event happens. Right now, the only event supported is when the first pixel of the next display refresh cycle leaves the display engine for the display. If you want something fancier (like two frames from now), you get to do that on your own using this basic function. drmWaitVBlank drmWaitVBlank is the existing interface for all things sequence related and has three modes (always nice to have one function do three things, I think). It can: Query the current vblank number Block until a specified vblank number Queue an event to be delivered at a specific vblank number This interface has a few issues: It has been kludged into supporting multiple CRTCs by taking bits from the 'type' parameter to hold a 'pipe' number, which is the index in the kernel into the array of CRTCs. It has a random selection of 'int' and 'long' datatypes in the interface, making it need special helpers for 32-bit apps running on a 64-bit kernel. Times are in microseconds, frame counts are 32 bits. Vulkan does everything in nanoseconds and wants 64-bits of frame counts. For leases, figuring out the index into the kernel list of crtcs is pretty tricky -- our lease has a subset of those crtcs, so we can't actually compute the global crtc index. drmCrtcGetSequence int drmCrtcGetSequence(int fd, uint32_t crtcId, uint64_t *sequence, uint64_t *ns); Here's a simple new function — hand it a crtc ID and it provides the current frame sequence number and the time when that frame started (in nanoseconds). drmCrtcQueueSequence int drmCrtcQueueSequence(int fd, uint32_t crtcId, uint32_t flags, uint64_t sequence, uint64_t user_data); struct drm_event_crtc_sequence { struct drm_event base; __u64 user_data; __u64 time_ns; __u64 sequence; }; This will cause a CRTC_SEQUENCE event to be delivered at the start of the specified frame sequence. That event will include the frame when the event was actually generated (in case it's late), along with the time (in nanoseconds) when that frame was started. The event also includes a 64-bit user_data value, which can be used to hold a pointer to whatever data the application wants to see in the event handler. The 'flags' argument contains a combination of: #define DRM_CRTC_SEQUENCE_RELATIVE 0x00000001 /* sequence is relative to current */ #define DRM_CRTC_SEQUENCE_NEXT_ON_MISS 0x00000002 /* Use next sequence if we've missed */ #define DRM_CRTC_SEQUENCE_FIRST_PIXEL_OUT 0x00000004 /* Signal when first pixel is displayed */ These are similar to the values provided for the drmWaitVBlank function, except I've added a selector for when the event should be delivered to align with potential future additions to Vulkan. Right now, the only time you can ask for is first-pixel-out, which says that the event should correspond to the display of the first pixel on the screen. DRM events → Vulkan fences With the kernel able to deliver a suitable event at the next frame, all the Vulkan code needed was a to create a fence and hook it up to such [...]



DRM-lease-3

Tue, 30 May 2017 01:41:22 -0700

DRM leasing part three (Vulkan) With the kernel APIs off for review, and the X RandR bits looking like they're in reasonable shape, I finally found some time to sit down and figure out how I wanted to integrate this into Vulkan. Avoiding two DRM file descriptors Given that a DRM lease is represented by a DRM master file descriptor, we want to use that for all of the operations in the driver, including rendering and mode setting. Using the vulkan driver render node and the lease master node together would require passing buffer objects between the kernel contexts using even more file descriptors. The Mesa Vulkan drivers open the device nodes while enumerating devices, not when they are created. This seems a bit early to me, but it makes sure that the devices being enumerated are actually available for use, and not just present in the system. To replace the render node fd with the lease master fd means hooking into the system early enough that the enumeration code can see the lease fd. And that means creating an instance extension as the instance gets created before devices are enumerated. The VK_KEITHP_kms_display instance extension This simple instance extension provides the necessary hooks to get the lease information from the application down into the driver before the DRM node is opened. In the first implementation, I added a function that could be called before the devices were enumerated to save the information in the Vulkan loader. That worked, but required quite a bit of study of the Vulkan loader and its XML description of the full Vulkan API. Mark Young suggested that a simpler plan would be to chain the information into the VkInstanceCreateInfo pNext field; with no new APIs added to Vulkan, there shouldn't be any need to change the Vulkan loader -- the device driver would advertise the new instance extension and the application could find it. That would have worked great, except the Vulkan loader 'helpfully' elides all instance extensions it doesn't know about before returning the list to the application. I'd say this was a bug and should be fixed, but for now, I've gone ahead and added the few necessary definitions to the loader to make it work. In the application, it's a simple matter of searching for this extension, constructing the VkKmsDisplayInfoKEITHP structure, chaining that into the VkInstanceCreateInfo pNext list and passing that in to the vkCreateInstance call. typedef struct VkKmsDisplayInfoKEITHP { VkStructureType sType; /* VK_STRUCTURE_TYPE_KMS_DISPLAY_INFO_KEITHP */ const void* pNext; int fd; uint32_t crtc_id; uint32_t *connector_ids; int connector_count; drmModeModeInfoPtr mode; } VkKmsDisplayInfoKEITHP; As you can see, this includes the master file descriptor along with all of the information necessary to set the desired video mode using the specified resources. The driver just walks the pNext list from the VkInstanceCreateInfo structure looking for any provided VkKmsDisplayInfoKEITHP structure and pulls the data out. To avoid questions about file descriptor lifetimes, the driver dup's the provided fd. The application is expected to close their copy at a suitable time. The VK_KHR_display extension Vulkan already has an API for directly accessing the raw device, including code for exposing video modes and everything. As tempting as it may be to just go do something simpler, there's a lot to be said for using existing APIs. This extension doesn't provide any direct APIs for acquiring display resources, relying on the VK_EXT_acquire_xlib_display extension for that part. And that takes a VkPhysicalDisplay parameter, which is only available after the device is opened, which is why I created the VK_KEITHP_kms_display extension instead of just using the VK_EXT_acquire_xlib_display extension -- we can't increase the capabilities of the render node opened by the driver, and we don't want to keep two file descriptors around. With the i[...]



TeleMini3

Tue, 25 Apr 2017 09:01:19 -0700

TeleMini V3.0 Dual-deploy altimeter with telemetry now available

TeleMini v3.0 is an update to our original TeleMini v1.0 flight computer. It is a miniature (1/2 inch by 1.7 inch) dual-deploy flight computer with data logging and radio telemetry. Small enough to fit comfortably in an 18mm tube, this powerful package does everything you need on a single board:

  • 512kB on-board data logging memory, compared with 5kB in v1.

  • 40mW, 70cm ham-band digital transceiver for in-flight telemetry and on-the-ground configuration, compared to 10mW in v1.

  • Transmitted telemetry includes altitude, speed, acceleration, flight state, igniter continutity, temperature and battery voltage. Monitor the state of the rocket before, during and after flight.

  • Radio direction finding beacon transmitted during and after flight. This beacon can be received with a regular 70cm Amateur radio receiver.

  • Barometer accurate to 100k' MSL. Reliable apogee detection, independent of flight path. Barometric data recorded on-board during flight. The v1 boards could only fly to 45k'.

  • Dual-deploy with adjustable apogee delay and main altitude. Fires standard e-matches and Q2G2 igniters.

  • 0.5” x 1.7”. Fits easily in an 18mm tube. This is slightly longer than the v1 boards to provide room for two extra mounting holes past the pyro screw terminals.

  • Uses rechargeable Lithium Polymer battery technology. All-day power in a small and light-weight package.

  • Learn more at http://www.altusmetrum.org/TeleMini/

  • Purchase these at http://shop.gag.com/home-page/telemini-v3.html

(image)

(image)

I don't have anything in these images to show just how tiny this board is—but the spacing between the screw terminals is 2.54mm (0.1in), and the whole board is only 13mm wide (1/2in).

This was a fun board to design. As you might guess from the version number, we made a couple prototypes of a version 2 using the same CC1111 SoC/radio part as version 1 but in the EasyMini form factor (0.8 by 1.5 inches). Feed back from existing users indicated that bigger wasn't better in this case, so we shelved that design.

With the availability of the STM32F042 ARM Cortex-M0 part in a 4mm square package, I was able to pack that, the higher power CC1200 radio part, a 512kB memory part and a beeper into the same space as the original TeleMini version 1 board. There is USB on the board, but it's only on some tiny holes, along with the cortex SWD debugging connection. I may make some kind of jig to gain access to that for configuration, data download and reprogramming.

For those interested in an even smaller option, you could remove the screw terminals and battery connector and directly wire to the board, and replace the beeper with a shorter version. You could even cut the rear mounting holes off to make the board shorter; there are no components in that part of the board.




DRM-lease-2

Fri, 31 Mar 2017 17:25:22 -0700

DRM leasing part deux (kernel side) I've stabilized the kernel lease implementation so that leases now work reliably, and don't require a reboot before running the demo app a second time. Here's whats changed: Reference counting is hard. I'm creating a new drm_master, which means creating a new file. There were a bunch of reference counting errors in my first pass; I've cleaned those up while also reducing the changes needed to the rest of the DRM code. Added a 'mask_lease' value to the leases -- this controls whether resources in the lease are hidden from the lessor, allowing the lessor to continue to do operations on the leased resources if it likes. Hacked the mutex locking assertions to not crash in normal circumstances. I'm now doing: BUG_ON(__mutex_owner(&master->dev->mode_config.idr_mutex) != current); to make sure the mutex is held by the current thread, instead of just making sure some thread holds the mutex. I have this memory of a better way to do this, but now I can't dig it up. Suggestions welcome, of course. I'm reasonably pleased with the current state of the code, although I want to squash the patches together so that only the final state of the design is represented, rather than the series of hacks present now. Comments on #dri-devel I spent a bit of time on the #dri-devel IRC channel answering questions about the DRM-lease design and the changes above reflect that. One concern was about mode setting from two masters at the same time. Mode setting depends on a number of shared 'hidden' resources; things like memory fifos and the like. If either lessee or lessor wants to change the displayed modes, they may fail due to conflicts over these resources and not be able to recover easily. A related concern was that the current TEST/render/commit mechanism used by compositors may no longer be reliable as another master could change hidden resource usage between the TEST and commit operations. Daniel Vetter suggested allowing the lessor to 'exclude' lessee mode sets during this operation. One solution would be to have the lessor set the mode on the leased resources before ceding control of the objects, and once set, the lessee shouldn't perform additional mode setting operations. This would limit the flexibility in the lessee quite a bit as it wouldn't be able to pop up overlay planes or change resolution. Questions Let's review the questions from my last post, DRM-lease: What should happen when a Lessor is closed? Should all access to controlled resources be revoked from all descendant Lessees? Answer: The lessor is referenced by the lessee and isn't destroyed until it has been closed and all lessees are destroyed. How about when a Lessee is closed? Should the Lessor be notified in some way? Answer: I think so? Need to figure out a mechanism here. CRTCs and Encoders have properties. Should these properties be automatically included in the lease? Answer -- no, userspace is responsible for constructing the entire lease. Remaining Kernel Work The code is running, and appears stable. However, it's not quite done yet. Here's a list of remaining items that I know about: Changing leases should update sub-leases. When you reduce the resources in one lease, the kernel should walk any sub-leases and clear out resources which the lessor no longer has access to. Sending events when leases are created/destroyed. When a lease is created, if the mask_lease value is set, then the lessor should get regular events describing the effective change. Similarly, both lessor and lessee should get events when a lease is changed. Refactoring the patch series to squash intermediate versions of the new code. Remaining Other Work Outside of the kernel, I'll be adding X support for this operation. Here's my current thinking: Extend RandR to add a lease-creation request that returns a file descriptor. This will take a mode and the server will set that mode before returning. Provide some EDID-based matching for HMD di[...]



DRM-lease

Tue, 28 Mar 2017 15:22:22 -0700

DRM display resource leasing (kernel side) So, you've got a fine head-mounted display and want to explore the delights of virtual reality. Right now, on Linux, that means getting the window system to cooperate because the window system is the DRM master and holds sole access to all display resources. So, you plug in your device, play with RandR to get it displaying bits from the window system and then carefully configure your VR application to use the whole monitor area and hope that the desktop will actually grant you the boon of page flipping so that you will get reasonable performance and maybe not even experience tearing. Results so far have been mixed, and depend on a lot of pieces working in ways that aren't exactly how they were designed to work. We could just hack up the window system(s) and try to let applications reserve the HMD monitors and somehow removing them from the normal display area so that other applications don't randomly pop up in the middle of the screen. That would probably work, and would take advantage of much of the existing window system infrastructure for setting video modes and performing page flips. However, we've got a pretty spiffy standard API in the kernel for both of those, and getting the window system entirely out of the way seems like something worth trying. I spent a few hours in Hobart chatting with Dave Airlie during LCA and discussed how this might actually work. Goals Use KMS interfaces directly from the VR application to drive presentation to the HMD. Make sure the window system clients never see the HMD as a connected monitor. Maybe let logind (or other service) manage the KMS resources and hand them out to the window system and VR applications. Limitations Don't make KMS resources appear and disappear. It turns out applications get confused when the set of available CRTCs, connectors and encoders changes at runtime. An Outline for Multiple DRM masters By the end of our meeting in Hobart, Dave had sketched out a fairly simple set of ideas with me. We'd add support in the kernel to create additional DRM masters. Then, we'd make it possible to 'hide' enough state about the various DRM resources so that each DRM master would automagically use disjoint subsets of resources. In particular, we would. Pretend that connectors were always disconnected Mask off crtc and encoder bits so that some of them just didn't seem very useful. Block access to resources controlled by other DRM masters, just in case someone tried to do the wrong thing. Refinement with Eric over Swedish Pancakes A couple of weeks ago, Eric Anholt and I had breakfast at the original pancake house and chatted a bit about this stuff. He suggested that the right interface for controlling these new DRM masters was through the existing DRM master interface, and that we could add new ioctls that the current DRM master could invoke to create and manage them. Leasing as a Model I spent some time just thinking about how this might work and came up with a pretty simple metaphor for these new DRM masters. The original DRM master on each VT "owns" the output resources and has final say over their use. However, a DRM master can create another DRM master and "lease" resources it has control over to the new DRM master. Once leased, resources cannot be controlled by the owner unless the owner cancels the lease, or the new DRM master is closed. Here's some terminology: DRM Master Any DRM file which can perform mode setting. Owner The original DRM Master, created by opening /dev/dri/card* Lessor A DRM master which has leased out resources to one or more other DRM masters. Lessee A DRM master which controls resources leased from another DRM master. Each Lessee leases resources from a single Lessor. Lessee ID An integer which uniquely identifies a lessee within the tree of DRM masters descending from a single Owner. Lease The contract between the Lessor and Lessee which identifies which resources which m[...]



Valve

Tue, 14 Mar 2017 12:10:17 -0700

Consulting for Valve in my spare time

Valve Software has asked me to help work on a couple of Linux graphics issues, so I'll be doing a bit of consulting for them in my spare time. It should be an interesting diversion from my day job working for Hewlett Packard Enterprise on Memory Driven Computing and other fun things.

First thing on my plate is helping support head-mounted displays better by getting the window system out of the way. I spent some time talking with Dave Airlie and Eric Anholt about how this might work and have started on the kernel side of that. A brief synopsis is that we'll split off some of the output resources from the window system and hand them to the HMD compositor to perform mode setting and page flips.

After that, I'll be working out how to improve frame timing reporting back to games from a composited desktop under X. Right now, a game running on X with a compositing manager can't tell when each frame was shown, nor accurately predict when a new frame will be shown. This makes smooth animation rather difficult.




FOSDEM-2017

Mon, 30 Jan 2017 16:40:46 -0800

FOSDEM 2017 -- The Machine and ChaosKeys

Yay! I get to go to FOSDEM this year. I'll be speaking about Free Software for The Machine on Sunday afternoon at 14:00 in K.1.105 (La Fontaine).

I'll also be bringing along a number of ChaosKeys and will have them available for either $40 US or €40. You can either bring cash or pre-pay at the Altus Metrum shop.

Hope to see you there.




embedded-arm-libc

Sat, 07 Jan 2017 23:32:10 -0800

Finding a Libc for tiny embedded ARM systems You'd think this problem would have been solved a long time ago. All I wanted was a C library to use in small embedded systems -- those with a few kB of flash and even fewer kB of RAM. Small system requirements A small embedded system has a different balance of needs: Stack space is limited. Each thread needs a separate stack, and it's pretty hard to move them around. I'd like to be able to reliably run with less than 512 bytes of stack. Dynamic memory allocation should be optional. I don't like using malloc on a small device because failure is likely and usually hard to recover from. Just make the linker tell me if the program is going to fit or not. Stdio doesn't have to be awesomely fast. Most of our devices communicate over full-speed USB, which maxes out at about 1MB/sec. A stdio setup designed to write to the page cache at memory speeds is over-designed, and likely involves lots of buffering and fancy code. Everything else should be fast. A small CPU may run at only 20-100MHz, so it's reasonable to ask for optimized code. They also have very fast RAM, so cycle counts through the library matter. Available small C libraries I've looked at: μClibc. This targets embedded Linux systems, and also appears dead at this time. musl libc. A more lively project; still, definitely targets systems with a real Linux kernel. dietlibc. Hasn't seen any activity for the last three years, and it isn't really targeting tiny machines. newlib. This seems like the 'normal' embedded C library, but it expects a fairly complete "kernel" API and the stdio bits use malloc. avr-libc. This has lots of Atmel assembly language, but is otherwise ideal for tiny systems. pdclib. This one focuses on small source size and portability. Current AltOS C library We've been using pdclib for a couple of years. It was easy to get running, but it really doesn't match what we need. In particular, it uses a lot of stack space in the stdio implementation as there's an additional layer of abstraction that isn't necessary. In addition, pdclib doesn't include a math library, so I've had to 'borrow' code from other places where necessary. I've wanted to switch for a while, but there didn't seem to be a great alternative. What's wrong with newlib? The "obvious" embedded C library is newlib. Designed for embedded systems with a nice way to avoid needing a 'real' kernel underneath, newlib has a lot going for it. Most of the functions have a good balance between speed and size, and many of them even offer two implementations depending on what trade-off you need. Plus, the build system 'just works' on multi-lib targets like the family of cortex-m parts. The big problem with newlib is the stdio code. It absolutely requires dynamic memory allocation and the amount of code necessary for 'printf' is larger than the flash space on many of our devices. I was able to get a cortex-m3 application compiled in 41kB of code, and that used a smattering of string/memory functions and printf. How about avr libc? The Atmel world has it pretty good -- avr-libc is small and highly optimized for atmel's 8-bit avr processors. I've used this library with success in a number of projects, although nothing we've ever sold through Altus Metrum. In particular, the stdio implementation is quite nice -- a 'FILE' is effectively a struct containing pointers to putc/getc functions. The library does no buffering at all. And it's tiny -- the printf code lacks a lot of the fancy new stuff, which saves a pile of space. However, much of the places where performance is critical are written in assembly language, making it pretty darn hard to port to another processor. Mixing code together for fun and profit! Today, I decided to try an experiment to see what would happen if I used the avr-libc stdio bits within the newlib environment. There were only t[...]



AltOS-Lisp-2

Sat, 19 Nov 2016 01:20:28 -0800

Updates to Altos Lisp I wrote a few days ago about a tiny lisp interpreter I wrote for AltOS Really, it's almost "done" now, I just wanted to make a few improvements Incremental Collection I was on a walk on Wednesday when I figured out that I didn't need to do a full collection every time; a partial collection that only scanned the upper portion of memory would often find plenty of free space to keep working for a while. To recap, the heap is in two pieces; the ROM piece and the RAM piece. The ROM piece is generated during the build process and never changes afterwards (hence the name), so the only piece which is collected is the RAM piece. Collection works like: chunk_low = heap base new_top = heap base For all of the heap Find the first 64 live objects above chunk_low Compact them all to new_top Rewrite references in the whole heap for them Set new_top above the new locations Set chunk_low above the old locations top = new_top The trick is to realize that there's really no need to start at the bottom of the heap; you can start anywhere you like and compact stuff, possibly leaving holes below that location in the heap. As the heap tends to have long-lived objects slowly sift down to the beginning, it's useful to compact objects higher than that, skipping the compaction process for the more stable area in memory. Each time the whole heap is scanned, the top location is recorded. After that, incremental collects happen starting at that location, and when that doesn't produce enough free space, a full collect is done. The collector now runs a bunch faster on average now. Binary Searches I stuck some linear searches in a few places in the code, the first was in the collector when looking to see where an object had moved to. As there are 64 entries, the search is reduced from 32 to 6 compares on average. The second place was in the frame objects, which hold the list of atom/value bindings for each lexical scope (including the global scope). These aren't terribly large, but a binary search is still a fine plan. I wanted to write down here the basic pattern I'm using for binary searches these days, which avoids some of the boundary conditions I've managed to generate in the past: int find (needle) { int l = 0; int r = count - 1; while (l <= r) { int m = (l + r) >> 1; if (haystack[m] < needle) l = m + 1; else r = m - 1; } return l; } With this version, the caller can then check to see if there's an exact match, and if not, then the returned value is the location in the array where the value should be inserted. If the needle is known to not be in the haystack, and if the haystack is large enough to accept the new value: void insert(needle) { int l = find(needle); memmove(&haystack[l+1], &haystack[l], (num - l) * sizeof (haystack[0])); haystack[l] = needle; } Similarly, if the caller just wants to know if the value is in the array: bool exists(needle) { int l = find(needle); return (l < count && haystack[l] == needle); } Call with Current Continuation Because the execution stack is managed on the heap, it's completely trivial to provide the scheme-like call with current continuation, which constructs an object which can be 'called' to transfer control to a saved location: > (+ "hello " (call/cc (lambda (return) (setq boo return) (return "foo "))) "world") "hello foo world" > (boo "bar ") "hello bar world" > (boo "yikes ") "hello yikes world" One thing I'd done previously is dump the entire state of the interpreter on any error, and that included a full stack trace. I adopted that code for printing of these continuation objects: boo [ expr: (call/cc (lambda (return) (set (quote boo) return) (return "foo "))) state[...]



AltOS-Lisp

Mon, 14 Nov 2016 23:11:23 -0800

A Tiny Lisp for AltOS I took a bit of a diversion over the last week or so when I wondered how small a lisp interpreter I could write, and whether I could fit that into one of the processors that AltOS runs on. It turns out, you can write a tiny lisp interpreter that fits in about 25kB of ram with a 3kB heap for dynamic data. I decided to target our ChaosKey boards; they're tiny, and I've got a lot of them. That processor offers 28kB of usable flash space (after the 4kB boot loader) and 6kB of ram with the processor running at a steaming 48MHz. I'm not at all sure this is useful, but I always enjoy doing language implementations, and this one presented some 'interesting' challenges: Limited RAM. I don't have space to do a classic stop/copy collector. Limited stack. A simple lisp implementation uses the C stack for all recursion in execution and memory collection. I don't have enough ram for that. Iterative Compacting Allocator I'm betting someone has built one of these before, but I couldn't find one, so I wrote my own. The basic strategy is to walk the heap to find a subset of the active objects which are allocated sequentially in memory with only unused storage between them. These objects are then compacted in-place, and then the heap is walked again to update all references to the moved objects. Then, the process is restarted to find another subset and move them. By looking for these subsets starting at the bottom of the heap, and working upwards towards the top, the whole heap can be compacted into a contiguous chunk at the bottom of memory. Allocation involves moving a pointer along at the top of active memory; when it gets to the top of the heap, collect and see if there's space now. As always, the hardest part was to make sure all active memory was tied down. The second hardest part was to make sure that all active pointers were updated after any allocation, in case a collect moved the underlying object. That was just bookkeeping, but did consume much of the development time. One additional trick was to terminate the recursion during heap walking by flagging active cons cell locations in a global bitmap and then walking that separately, iterating until that bitmap is empty. Nested lambdas form another recursion which should probably get the same approach, but I haven't done that yet. An unexpected "benefit" of the tiny heap is that the collector gets called a lot, so any referencing bugs will have a good chance of being uncovered in even a short program execution. ROM-able Lisp Instead of implementing all of the language in C, I wanted to be able to implement various pieces in Lisp itself. Because of the complex nature of the evaluation process, adding things like 'let' or even 'defun' turn out to be dramatically simpler in Lisp. However, I didn't want to consume bunches of precious RAM to hold these basic functions. What I did was to create two heaps, one in ROM and the other in RAM. References are be tagged as to which heap they're in. 16-bit Values Lisp programs use a pile of references. Using a full 32 bits for each one would mean having a lot less effective storage. So, instead, I use an offset from the base of the heap. The top bit of the offset is used to distinguish between the ROM heap and the RAM heap. I needed a place to store type information, so I settled on using the bottom two bits of the references. This allows for four direct type values. One of these values is used to indicate an indirect type, where the type is stored in the first byte of the object. The direct types are: ValueType 0Cons cell 114-bit int 2String 3Other With 2 tag bits, the allocator needs to work in 32-bit units as the references couldn't point to individual bytes. Finally, I wanted 0 to be nil, so I add four to [...]