Subscribe: VaporWarning » Python
Added By: Feedage Forager Feedage Grade C rated
Language: English
abc foo  abc  blog  entries  entry  foo  microclog  new string  restructuredtext  spidermonkey  string  strings  wordpress  write  writing 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: VaporWarning » Python

Honest to a Segfault: Python category syndication

Updated: 2012-01-06T15:45:00-08:00


String representation in SpiderMonkey


I'm back from holiday break and I need to limber up my tech writing a bit. Reason 1337 of my ever-growing compendium, Nerdy Reasons to Love the Internet, is that there are always interesting discussions going on. [*] I came across Never create Ruby strings longer than 23 characters the other day, and despite the link-bait title, there's a nice discussion of string representation within MRI (a Ruby VM).My recap will be somewhat abbreviated, since I've only given myself a chunk of the morning to write this, so feel free to ask for clarification / follow up in the comments.Basic language overviewAt the language level JavaScript strings are pretty easy to understand. They are immutable, same as in Python:>>> foo = 'abc' >>> foo[2] = 'd' Traceback (most recent call last): File "", line 1, in TypeError: 'str' object does not support item assignment js> options('strict') "" js> foo = 'abc' "abc" js> foo[2] = 'd' typein:3: strict warning: foo[2] is read-only "d" But you can (without mutating any of the original values) compare them for equality, concat them, slice them, regexp replace/split/match on them, trim whitespace from them, slap them to chop vegetables, and so forth. (See the MDN docs for String.prototype methods.) In the VM, we need to make those operations fast, with an emphasis on the operations that the web uses heavily, which are ideally [†] the ones reflected in benchmarks.AbstractlyIn an abstract sense, a primitive string in SpiderMonkey is a GC cell (i.e. small header that is managed by the garbage collector) that has a length and refers to an array of UCS2 (uniformly 16-bit) characters. [‡]Recall that, in many dynamic language implementations, type tagging is used in order to represent the actual type of an statically-unknown-typed value at runtime. This generally allows you to work on integers (and, in SpiderMonkey, doubles) without allocating any space on the heap. Primitive strings are very important to distinguish quickly and they are subtly distinct from (non-primitive) objects, so they have their own type tag in our value representation, as you can see in the following VM function:/* * Convert the given value to a string. This method includes an inline * fast-path for the case where the value is already a string; if the value is * known not to be a string, use ToStringSlow instead. */ static JS_ALWAYS_INLINE JSString * ToString(JSContext *cx, const js::Value &v) { if (v.isString()) return v.toString(); return ToStringSlow(cx, v); } AsideIn JavaScript there's an annoying distinction between primitive strings and string objects that you may have seen:js> foo = new String('abc') (new String("abc")) js> foo.substr(0, 2) "ab" js> foo[2] "c" js> foo.toString() "abc" For simplicity and because they're uninteresting, let's pretend those new String things don't exist.AtomsThe simplest string form to describe is called an "atom", which is somewhat similar to an interned string in Python. When you write a literal string or identifier in your JavaScript code, SpiderMonkey's parser turns it into one of these atoms.(function() { // Both 'someObject' and 'twenty' are atomized at parse time! return someObject['twenty']; })() Note that the user has no overt control over which strings get atomized (i.e. there is no intern builtin). Also, there are a bunch of "primordial" atoms that the engine creates when it starts up: things like the empty string, prototype, apply, and so on.The interesting property of atoms is that any two atoms can be compared in O(1) time (via pointer comparison). Some work is required on behalf of the runtime to guarantee that property.To get an atom within the VM, you have to say, "Hey SpiderMonkey runtime, atomize these characters for me!" In the general case the runtime then does a classic "get or c[...]

Blog engine hot swap: from Wordpress to MicroClog


This entry marks the transition from my use of Wordpress software to a small monster of my own creation, which I have named MicroClog.Of all the things I've lost...I first switched to Wordpress, from Blogger, about three years ago.For a long time, I felt that I wasn't crazy enough to write my own blog software. So, in a totally sane fashion, I:Meticulously wrote all of my blog entries in reStructuredText, complete with metadata, in my own Mercurial repositoryConverted that reStructuredText to HTML with my own custom extension to Docutils' rst2html capabilityCopied and pasted the HTML from the rendered file into WordpressUpdated the metadata in Wordpress by handOf course, with every edit, I repeated all of these steps.Now, this wouldn't be so bad, if writing weren't such a damn perfectionist art. I'm not sure the average number of cycles I took around this loop of automation apathy for each blog entry, but I would guess it was around five. Each trip around the loop I hated it more.They say that the definition of insanity is doing the same thing over and over again, but expecting different results.Eventually, I snapped, and decided something had to be done. [*]I think I've successfully channeled my gripes into the implementation of MicroClog. I hope that, ultimately, greasing the wheels on this process will help flush the 83 entries in my drafts folder (along with a small handful of unfulfilled promises to write something) out to the internet.The idea behind MicroClogWriting about code is a total pain in most blog engines. Writing in reStructuredText rocks.MicroClog chooses reStructuredText over WYSIWYG/HTML editing and existing distributed revision control systems over a in-blog-engine revision control system. The current workflow for MicroClog is:Write a blog entry in reStructuredText on your local machineCommit and push the changes to a repository on the host serverThe host server's repository hook renders entries that have changedEntries designated for publishing are publicly visibleThere's also ways to share drafts in a restricted fashion. I'm currently hacking together a "live preview" on the server side for the reST entries you're editing on the client side, using the fancy new server-sent events API.Ultimately, there are a few simple tasks that I want to optimize for:Start an entry and dump a stream-of-conscious text in itShare draft entries with proofreadersConverge on a publication by iterating a read-and-tweak cycleI love writing in my text editor — especially when writing about code — but I also want to marginalize the advantages WYSIWYG has over markup by getting live previews as smooth as possible.Feature creepThere are some more sordid incentives for me to have all my blog data easily queried and manipulated in a Django app. A few of the features I'd like to try adding in the future:First class updatesI would really like to support the idea of an "update" or "followup" as a first class feature — manually hacking old entries to point at newer ones with followup content is lame, and engine support for that kind of workflow isn't difficult.More widgetsI've always wanted to have a widget where I could select a handful of my hundred-odd drafts and generate a poll where users could select the title/intro blurb that was most interesting to them. Knowing what people are interested in reading gives me additional motivation.Decoupling syndication and entry labelingI find that tying planet syndication directly to the feed generated for a label has been bothersome. Sometimes I feel like I want to syndicate an entry to a planet but that label isn't appropriate, or sometimes I don't want to syndicate an entry to a planet but I do want to use the label.Statistics pr0nBecause data is fun to look at. Some ideas I've had:tf/idf style analysis to suggest tags automaticallyPlot of entries correlated against start/publish date/timeStart-to-publish duration versus word countThe good left undoneI'm still writing the software out of a private repo, because [...]