Subscribe: The Hand of FuManChu
Preview: The Hand of FuManChu

The Hand of FuManChu

The Hand of FuManChu

Updated: 2018-03-24T14:08:39Z


PyData 2013 Slides


The presentation deck from my talk at PyData 2013 is up! Thanks to everyone for their interest and feedback.

Addictive to check out


From 37 Signals, about their Basecamp iPhone app launch:

Our top priority was fast access to news. You’ll find the app makes it addictive to check in and feel the pulse of your projects throughout the day. You can quickly bounce in and out of projects. Project screens on the phone show the latest news first rather than static project contents.

Cool. As a manager, that's exactly what I want: to feel the pulse.

As an architect and designer and developer, I want the opposite. Now, can someone make an app that makes it addictive to get in the flow instead of to be interrupted all the time?

There's got to be a name for this

2012-07-17T10:27:34Z know, the difference between "what is the least we can do to alleviate our current pain?" versus "where do we want to be and how do we get there?" I see this distinction again and again. I've seen both called "strategy" and that can't be good. I would say the former is a product of "management" and the latter of "leadership", but that distinguishes the attitudes or processes, not the results. Lazyweb? Little help?

Wow. Does isinstance blow up with ABC's?


Python 2.6.1. Here's a call to "isinstance(value, basestring)": --[ (_cprequest:782) --] (_cprequest:782) 0.044ms versus "isinstance(value, io.IOBase)": --[ (_cprequest:791) ----> __instancecheck__ (abc:117) ----. __instancecheck__ (abc:120) ------[ (abc:120) ------] (abc:120) 0.046ms ----. __instancecheck__ (abc:121) ----. __instancecheck__ (abc:123) ----. __instancecheck__ (abc:124) ----. __instancecheck__ (abc:125) ----. __instancecheck__ (abc:126) ----. __instancecheck__ (abc:127) ----. __instancecheck__ (abc:130) ------> __subclasscheck__ (abc:134) ------. __subclasscheck__ (abc:137) ------. __subclasscheck__ (abc:140) ------. __subclasscheck__ (abc:144) ------. __subclasscheck__ (abc:147) --------[ ABCMeta.__subclasshook__ (abc:147) --------] ABCMeta.__subclasshook__ (abc:147) 0.043ms ------. __subclasscheck__ (abc:148) ------. __subclasscheck__ (abc:156) --------[ (abc:156) --------] (abc:156) 0.043ms ------. __subclasscheck__ (abc:160) ------. __subclasscheck__ (abc:165) --------[ ABCMeta.__subclasses__ (abc:165) --------] ABCMeta.__subclasses__ (abc:165) 0.045ms ------. __subclasscheck__ (abc:166) --------[ (abc:166) ----------> __subclasscheck__ (abc:134) ----------. __subclasscheck__ (abc:137) ----------. __subclasscheck__ (abc:140) ----------. __subclasscheck__ (abc:144) ----------. __subclasscheck__ (abc:147) ------------[ ABCMeta.__subclasshook__ (abc:147) ------------] ABCMeta.__subclasshook__ (abc:147) 0.043ms ----------. __subclasscheck__ (abc:148) ----------. __subclasscheck__ (abc:156) ------------[ (abc:156) ------------] (abc:156) 0.046ms ----------. __subclasscheck__ (abc:160) ----------. __subclasscheck__ (abc:165) ------------[ ABCMeta.__subclasses__ (abc:165) ------------] ABCMeta.__subclasses__ (abc:165) 0.043ms ----------. __subclasscheck__ (abc:166) ------------[ (abc:166) --------------> __subclasscheck__ (abc:134) --------------. __subclasscheck__ (abc:137) --------------. __subclasscheck__ (abc:140) --------------. __subclasscheck__ (abc:144) --------------. __subclasscheck__ (abc:147) ----------------[ ABCMeta.__subclasshook__ (abc:147) ----------------] ABCMeta.__subclasshook__ (abc:147) 0.043ms --------------. __subclasscheck__ (abc:148) --------------. __subclasscheck__ (abc:156) ----------------[ (abc:156) ----------------] (abc:156) 0.043ms --------------. __subclasscheck__ (abc:160) --------------. __subclasscheck__ (abc:165) ----------------[ ABCMeta.__subclasses__ (abc:165) ----------------] ABCMeta.__subclasses__ (abc:165) 0.042ms --------------. __subclasscheck__ (abc:170) ----------------[ set.add (abc:170) ----------------] set.add (abc:170) 0.043ms --------------. __subclasscheck__ (abc:171) --------------< __subclasscheck__ (abc:171): False 1.690ms ------------] (abc:166) 1.887ms ----------. __subclasscheck__ (abc:165) ----------. __subclasscheck__ (abc:170) ------------[ set.add (abc:170) ------------] set.add (abc:170) 0.042ms ----------. __subclasscheck__ (abc:171) ----------< __subclasscheck__ (abc:171): False 3.745ms --------] (abc:166) 3.952ms ------. __subclasscheck__ (abc:165) ------. __subclasscheck__ (abc:166) --------[ (abc:166) ----------> __subclasscheck__ (abc:134) ----------. __subclasscheck__ (abc:137) ----------. __subclasscheck__ (abc:140) ----------. __subclasscheck__ (abc:144) ----------. __subclasscheck__ (abc:147) ------------[ ABCMeta.__subclasshook__ (abc:147) ------------] ABCMeta.__subclasshook__ (abc:147) 0.044ms ----------. __subclasscheck__ (abc:148) ----------. __subclasscheck__ (abc:156) ------------[ (abc:156) ------------] (abc:156) 0.044ms ----------. __subclasscheck__ (abc:160) ----------. __subclasscheck__ (abc:165) ------------[ ABCMeta.__subclasses__ (abc:165) ------------] ABCMeta.__subclasses__ (abc:165) 0.045ms ----------. __subclasscheck__ (abc:166) ------------[ (abc:166) --------------> __subclasscheck__ (abc:134) --------------. __subclasscheck__ (abc:137) --------------. __subclasscheck__ (abc:140) ---------[...]



Statistics about program operation are an invaluable monitoring and debugging tool. How many requests are being handled per second, how much of various resources are in use, how long we've been up. Unfortunately, the gathering and reporting of these critical values is usually ad-hoc. It would be nice if we had 1) a centralized place for gathering statistical performance data, 2) a system for extrapolating that data into more useful information, and 3) a method of serving that information to both human investigators and monitoring software. I've got a proposal. Let's examine each of those points in more detail. Data Gathering Just as Python's logging module provides a common importable for gathering and sending messages, statistics need a similar mechanism, and one that does not require each package which wishes to collect stats to import a third-party module. Therefore, we choose to re-use the logging module by adding a statistics object to it. That logging.statistics object is a nested dict: import logging if not hasattr(logging, 'statistics'): logging.statistics = {} It is not a custom class, because that would 1) require apps to import a third-party module in order to participate, 2) inhibit innovation in extrapolation approaches and in reporting tools, and 3) be slow. There are, however, some specifications regarding the structure of the dict. { +----"SQLAlchemy": { | "Inserts": 4389745, | "Inserts per Second": | lambda s: s["Inserts"] / (time() - s["Start"]), | C +---"Table Statistics": { | o | "widgets": {-----------+ N | l | "Rows": 1.3M, | Record a | l | "Inserts": 400, | m | e | },---------------------+ e | c | "froobles": { s | t | "Rows": 7845, p | i | "Inserts": 0, a | o | }, c | n +---}, e | "Slow Queries": | [{"Query": "SELECT * FROM widgets;", | "Processing Time": 47.840923343, | }, | ], +----}, } The logging.statistics dict has strictly 4 levels. The topmost level is nothing more than a set of names to introduce modularity. If SQLAlchemy wanted to participate, it might populate the item logging.statistics['SQLAlchemy'], whose value would be a second-layer dict we call a "namespace". Namespaces help multiple emitters to avoid collisions over key names, and make reports easier to read, to boot. The maintainers of SQLAlchemy should feel free to use more than one namespace if needed (such as 'SQLAlchemy ORM'). Each namespace, then, is a dict of named statistical values, such as 'Requests/sec' or 'Uptime'. You should choose names which will look good on a report: spaces and capitalization are just fine. In addition to scalars, values in a namespace MAY be a (third-layer) dict, or a list, called a "collection". For example, the CherryPy StatsTool keeps track of what each worker thread is doing (or has most recently done) in a 'Worker Threads' collection, where each key is a thread ID; each value in the subdict MUST be a fourth dict (whew!) of statistical data about each thread. We call each subdict in the collection a "record". Similarly, the StatsTool also keeps a list of slow queries, where each record contains data about each slow query, in order. Values in a namespace or record may also be functions, which brings us to: Extrapolation def extrapolate_statistics(scope): """Return an extrapolated copy of the given scope.""" c = {} for k, v in scope.items(): if isinstance(v, dict): v = extrapolate_statistics(v) elif isinstance(v, (list, tuple)): v = [extrapolate_statistics(record) for record in v] elif callable(v): v = v(scope) c[k] = v return c The collection of statistical data needs to be fast, as close to unnoticeable as possible to the host program. That requires us to minimize I/O,[...]

A replacement for sessions


I'm tired of sessions. They lock for too long, reducing concurrency, and in my current case, don't fail gracefully when a request takes longer than the session timeout. Problem: Session locks Session implementations typically lock very near the beginning of a request, and unlock near the end of a request. They tend to do this even if the current request handler does no writing to the session. Why so aggressive? Because the typical test case trotted out for sessions is that of a page hit counter: session.counter += 1. What if the user opens two tabs pointing at the same page at once? The count might be off by one! But if you don't do any counting, what's the benefit of such aggressive, synchronous locking? What we could really use is a system that used atomic commits instead of large, pessimistic locks. Problem: Session timeouts Sessions are often used for sites with thousands, even millions, of users. When any one of those users walks away from their computer, the servers usually try to free up resources by expiring any such inactive sessions. But lots of my admin-y sites have a few dozen users, not thousands. I'm just not that concerned with expiration of session state. I'm a little bit concerned, still, with cookies, so I still want to expire auth tokens. But there's no need to aggressively expire user data. But I find my current apps are so aggressive at expiring data that we frequently get errors in production where request A locked the session, and while it was processing a large job, request B locked the session because A was taking too long. B finishes normally, but then A chokes because it had the session lock forcibly taken away from it. Not fun. What we could really use is a system that allows tokens to expire, or be reused concurrently, without forcing user data to expire or other, concurrent processes to choke. Problem: Session conflation Sessions are used for more than one kind of data. In my current apps, it's used to store: Cookie tokens. In fact, the session id is the cookie name. Common user information, like user id, name, and permissions, and Workflow state, such as when a user builds up an action over multiple pages using multiple forms. The problem is that each of these three kinds of data has a different lifecycle. The session id tends to get recreated often as sessions and cookies time out (taking all of the rest of the data with it). The user info tends to change very rarely, being nearly read-only, but is often read on every page request (for example, to display the user's name in a corner, or to apply the user's timezone to time output). Workflow data, in contrast, persists for a few seconds or minutes as the user completes a particular task, and is then discardable at the end of the process; it never needs concurrency isolation, because the user is working synchronously through a single task. Sessions traditionally lump all of these together into a single bag of attributes, and place the entire bag under a single large lock. What we could really use is a solution that had finer-grained control over locking for each kind of data, even for each kind of info or workflow! Solution: Slates We can achieve all of the above by abandoning sessions. Let's face it: sessions were cool when they were invented but they're showing their age. And rather than try to patch them up and keep calling them "sessions", I'm inventing something new: "slates". I'm implementing slates in MongoDB, but you don't have to in order to get the benefits of slates. All you need is some sort of storage that uses atomic commits, and that allows you to partition such that you have a moderate number of "collections" (one for each user, plus a special "_auth" collection), and a moderate number of "documents" (one for each use case) in each collection. Let's look at an example: $ mongo MongoDB shell version: 1.6.2 connecting to: > use slates switched to db slates > sh[...]

Shoji Catalog Protocol version 2


I've updated the Shoji Catalog Protocol to draft version 02. See

The only significant change is that shojiCatalogs, shojiFragments, and shojiViews elements now use an object instead of an array for their IRI's. That is, instead of:

{"element": "shoji:catalog",
 "self": "",
 "catalogs": ["bills", "sellers", "sellers{?sold_count}"],

one would now write something like:

{"element": "shoji:catalog",
 "self": "",
 "catalogs": {"bills": "bills",
              "sellers": "sellers",
              "sellers by sold count": "sellers{?sold_count}"

This allows clients to bind to a more meaningful name across varying documents rather than a potentially opaque and varying URI. In this way, the names function somewhat like link relation types (e.g. the "rel" attributes in HTML, or the relation types in Link headers).

Zen of CherryPy video


My PyCon 2010 talk video is up. Enjoy: The Zen of CherryPy