Subscribe: benedict makes things start faster
http://www.blogger.com/feeds/7470623142037376753/posts/default/-/planet
Added By: Feedage Forager Feedage Grade C rated
Language: English
Tags:
buffer  cache  client  clients  complicated  data  document  fastload file  fastload  file  map  pointers  startup cache  startup  stream 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: benedict makes things start faster

benedict makes things start faster





Updated: 2014-10-02T22:03:08.618-07:00

 



streams in yer cache

2010-03-03T13:19:17.223-08:00

Hi everyone. Today, I'm going to talk a little about the startup cache, which I've been working on for longer than I like to think about.startupcache, yay! fastload, boooThe startup cache is meant to replace Mozilla's fastload architecture. The purpose of these caches is to preserve intermediate results like compiled javascript and compiled xul documents, so that we don't need to load and parse them every time the browser starts. The main goals of this project are to improve the performance of the cache and to simplify the API. Performance improvement is kind of black magic, especially when doing hard drive access, so I'm going to ignore that part for now and talk about API simplification and some of the pitfalls we pretty much fell directly into.One of the main problems with fastload is that it can be hard to understand, even as a client. It's implemented as a number of streams of data, multiplexed into a file. There are multiple fastload files, and each of them contains dozens of different streams of data, which are called 'documents' in fastload terminology. The fastload service is responsible for switching back and forth between documents within a file. Clients read from a stream called the 'fastload file', which is also manipulated by the fastload service behind the scene to make sure they are reading the right bytes for a document.So a client has to understand how to make a fastload file, how to make a document within the fastload file, and then how to tell the service to seek to that document within the file. After a client is done reading, he has to restore the file to point to the previous document (and sometimes restore the service to point to the previous file).Complicated. So when Taras and I designed the startup cache, our initial design was way over at the other end of the spectrum -- basically just a map, where the keys are c-strings and the values are blobs of data. You just call getBuffer(string), and putBuffer(string, buffer, buffer length), and that's all there is to it. We didn't even offer input/output streams, just plain old buffers. This is backed by a JAR, which does some checksumming and remembers the offset of different entries for us.This works surprisingly well. Clients can serialize complicated data structures into a stream, and just dump the stream into a buffer and hand it to the cache. Next startup, take that buffer, make it into a stream, and read your complicated data representation back out. (We made the cache architecture-dependent, so clients don't even have to worry about endianness and word-size in whatever serialization format they choose. If a user moves to a machine with a different architecture, we just start up a new cache file for that machine.)serializing yer pointersThere's a fundamental problem with this, though, which is probably familiar to anyone who has tried to do serialization. The problem pops up when you try to serialize a pointer. If two serialized structures A and B have pointers to the same structure C, they'd better both point to C again when you deserialize them. But if A and B got serialized to different buffers, someone trying to deserialize A will have a hard time finding out that there's this structure B in a different buffer with a pointer to the same object C.It's possible, of course. The client has to maintain a map of objects and the pointers which refer to those objects (only the pointers he is serializing), and then the client can serialize that map as well. So in our example, when the client sees a reference to C, that is mapped to a structure that tells the client what else is pointing to C and where to find that other pointer. So who cares, the client can take care of it -- the startup cache is meant for blobs, not for complicated object cycles.But it turns out that both of our initial clients would need to implement this kind of map, and a third one we had in mind would need it as well. So it's probably better just to have an interface where they can call readObject(...), and if A and B both readObject(C), they magically come out with pointers to[...]