Subscribe: joe shaw
http://joeshaw.org/rss.php
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
bin nop  bin  context context  context  ctx  data  handler  input  it’s  line  new  python  request  vagrant  work   
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: joe shaw

Joe Shaw





Updated: 2016-08-30T16:13:43-04:00

 



Revisiting context and http.Handler for Go 1.7

2016-08-30T16:10:00-04:00

Go 1.7 was released earlier this month, and the thing I’m most excited about is the incorporation of the context package into the Go standard library. Previously it lived in the golang.org/x/net/context package. With the move, other packages within the standard library can now use it. The net package’s Dialer and os/exec package’s Command can now utilize contexts for easy cancelation. More on this can be found in the Go 1.7 release notes. Go 1.7 also brings contexts to the net/http package’s Request type for both HTTP clients and servers. Last year I wrote a post about using context.Context with http.Handler when it lived outside the standard library, but Go 1.7 makes things much simpler and thankfully renders all of the approaches from that post obsolete. A quick recap I suggest reading my original post for more background, but one of the main uses of context.Context is to pass around request-scoped data. Things like request IDs, authenticated user information, and other data useful for handlers and middleware to examine in the scope of a single HTTP request. In that post I examined three different approaches for incorporating context into requests. Since contexts are now attached to http.Request values, this is no longer necessary. As long as you’re willing to require at least Go 1.7, it’s now possible to use the standard http.Handler interface and common middleware patterns with context.Context! The new approach Recall that the http.Handler interface is defined as: type Handler interface { ServeHTTP(ResponseWriter, *Request) } Go 1.7 adds new context-related methods on the *http.Request type. func (r *Request) Context() context.Context func (r *Request) WithContext(ctx context.Context) *Request The Context method returns the current context associated with the request. The WithContext method creates a new Request value with the provided context. Suppose we want each request to have an associated ID, pulling it from the X-Request-ID HTTP header if present, and generating it if not. We might implement the context functions like this: type key int const requestIDKey key = 0 func newContextWithRequestID(ctx context.Context, req *http.Request) context.Context { reqID := req.Header.Get("X-Request-ID") if reqID == "" { reqID = generateRandomID() } return context.WithValue(ctx, requestIDKey, reqID) } func requestIDFromContext(ctx context.Context) string { return ctx.Value(requestIDKey).(string) } We can implement middleware that derives a new context with a request ID, create a new Request value from it, and pass it onto the next handler in the chain. func middleware(next http.Handler) http.Handler { return http.HandlerFunc(func(rw http.ResponseWriter, req *http.Request) { ctx := newContextWithRequestID(req.Context(), req) next.ServeHTTP(rw, req.WithContext(ctx)) }) } The final handler and any middleware lower in the chain have access to all the previously request-scoped data set in middleware above it. func handler(rw http.ResponseWriter, req *http.Request) { reqID := requestIDFromContext(req.Context()) fmt.Fprintf(rw, "Hello request ID %v\n", reqID) } And that’s it! It’s no longer necessary to implement custom context handlers, adapters to standard http.Handler implementations, or hackily wrap http.ResponseWriter. Everything you need is in the standard library, and right there on the *http.Request type. [...]



On Issue 3, Ohio's Proposed Marijuana Legalization Amendment

2015-11-02T10:00:52-05:00

Tomorrow Ohioans will vote on Issue 3, a proposed amendment to the Ohio state constitution that would legalize marijuana for recreational use. Although I support legalization, I am voting against Issue 3 and urge others to do the same.

Below are some of the reasons why I am voting no.

It establishes a cartel for all commercial production.

The amendment gives special power to 10 private corporations, which will control all commercial production in the state. The owners – not all of whom are publicly known – are wealthy donors to the pro-amendment campaign.

This amendment does not set up a highly regulated but free market. It does not permit a competitive, level playing field among entrepreneurs. It simply allows a few rich people to get substantially richer by exploiting the demand for marijuana.

As a constitutional amendment, it is not subject to judicial review or legislative adjustment.

This is a very intentional move by those who crafted the amendment. By being inserted into the state constitution this cartel structure and implementation are elevated above the state legislature and the state supreme court.

These branches of government, while imperfect, exist to the serve the people of the state. This amendment strips too much of that power to reinforce the profitability of a few wealthy individuals who comprise this cartel.

The legislature (or the people, via ballot initiative) should have the ability to change the implementation of marijuana legalization in the future, as we see what works and what doesn’t. The courts should have the ability to interpret these laws to ensure fair and just treatment of Ohioans.

It is a misuse of the constitutional amendment process.

The constitution is – or ought to be – a foundational document. It should lay out the structure of governance of the state. It should not be used to override the legislature, the executive, and the judiciary by special interests and implemented by mob rule.

The casino cartel amendment and the same-sex marriage ban are perfect examples of past abuses of the amendment process. On the contrary, Issue 1 this year, which addresses how congressional districts are drawn, is a perfect example of a proper use of constitutional amendments.

I have a lot of issues with the Ohio constitution (like electing judges) but the worst of it is that it is far, far too easy to amend.

Massachusetts decriminalized and Washington legalized marijuana through regular ballot initiatives, not constitutional amendments. Ohio should legalize marijuana this way. It would give the legislature (or the people, through further ballot initiatives) the ability and responsibility to adjust the laws governing marijuana production, regulation, and executive implementation. It would give the judiciary the ability to review legality and justice of these laws. And most importantly, it would not permanently enshrine in the state’s highest legal document which few individuals and corporations are able to set policy and profit from recreational marijuana.

Please vote no on Issue 3.




Smaller Docker containers for Go apps

2015-07-31T14:00:00-04:00

At litl we use Docker images to package and deploy our Room for More services, using our Galaxy deployment platform. This week I spent some time looking into how we might reduce the size of our images and speed up container deployments. Most of our services are in Go, and thanks to the fact that compiled Go binaries are mostly-statically linked by default, it’s possible to create containers with very few files within. It’s surely possible to use these techniques to create tighter containers for other languages that need more runtime support, but for this post I’m only focusing on Go apps. The old way We built images in a very traditional way, using a base image built on top of Ubuntu with Go 1.4.2 installed. For my examples I’ll use something similar. Here’s a Dockerfile: FROM golang:1.4.2 EXPOSE 1717 RUN go get github.com/joeshaw/qotd # Don't run network servers as root in Docker USER nobody CMD qotd The golang:1.4.2 base image is built on top of Debian Jessie. Let’s build this bad boy and see how big it is. $ docker build -t qotd . ... Successfully built ae761b93e656 $ docker images qotd REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE qotd latest ae761b93e656 3 minutes ago 520.3 MB Yikes. Half a gigabyte. Ok, what leads us to a container this size? $ docker history qotd IMAGE CREATED BY SIZE ae761b93e656 /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "qotd"] 0 B b77d0ca3c501 /bin/sh -c #(nop) USER [nobody] 0 B a4b2a01d3e42 /bin/sh -c go get github.com/joeshaw/qotd 3.021 MB c24802660bfa /bin/sh -c #(nop) EXPOSE 1717/tcp 0 B 124e2127157f /bin/sh -c #(nop) COPY file:56695ddefe9b0bd83 2.481 kB 69c177f0c117 /bin/sh -c #(nop) WORKDIR /go 0 B 141b650c3281 /bin/sh -c #(nop) ENV PATH=/go/bin:/usr/src/g 0 B 8fb45e60e014 /bin/sh -c #(nop) ENV GOPATH=/go 0 B 63e9d2557cd7 /bin/sh -c mkdir -p /go/src /go/bin && chmod 0 B b279b4aae826 /bin/sh -c #(nop) ENV PATH=/usr/src/go/bin:/u 0 B d86979befb72 /bin/sh -c cd /usr/src/go/src && ./make.bash 97.4 MB 8ddc08289e1a /bin/sh -c curl -sSL https://golang.org/dl/go 39.69 MB 8d38711ccc0d /bin/sh -c #(nop) ENV GOLANG_VERSION=1.4.2 0 B 0f5121dd42a6 /bin/sh -c apt-get update && apt-get install 88.32 MB 607e965985c1 /bin/sh -c apt-get update && apt-get install 122.3 MB 1ff9f26f09fb /bin/sh -c apt-get update && apt-get install 44.36 MB 9a61b6b1315e /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 902b87aaaec9 /bin/sh -c #(nop) ADD file:e1dd18493a216ecd0c 125.2 MB This is not a very lean container, with a lot of intermediate layers. To reduce the size of our containers, we did two additional steps: (1) Every repo has a clean.sh script that is run inside the container after it is initially built. Here’s part of a script for one of our Ubuntu-based Go images: apt-get purge -y software-properties-common byobu curl git htop man unzip vim \ python-dev python-pip python-virtualenv python-dev python-pip python-virtualenv \ python2.7 python2.7 libpython2.7-stdlib:amd64 libpython2.7-minimal:amd64 \ libgcc-4.8-dev:amd64 cpp-4.8 libruby1.9.1 perl-modules vim-runtime \ vim-common vim-tiny libpython3.4-stdlib:amd64 python3.4-minimal xkb-data \ xml-core libx11-data fonts-dejavu-core groff-base eject python3 locales \ python-software-properties supervisor git-core make wget cmake gcc bzr mercurial \ libglib2.0-0:amd64 libxml2:amd64 apt-get clean autoclean apt-get autoremove -y rm -rf /usr/local/go rm -rf /usr/local/go1.*.linux-amd64.tar.gz rm -rf /var/lib/{apt,dpkg,cache,log}/ rm -rf /var/{cache,log} (2) We run Jason Wilder’s excellent docker-squash tool. It is especially helpful when combined with the clean.sh s[...]



Go's net/context and http.Handler

2016-08-30 16:10:00 -0400T00:00:00-00:00

The approaches in this post are now obsolete thanks to Go 1.7, which adds the context package to the standard library and uses it in the net/http *http.Request type. The background info here may still be helpful, but I wrote a follow-up post that revisits things for Go 1.7 and beyond. A summary of this post is available in Japanese thanks to @craftgear. こちらに @craftgearによる日本語の要約があります。 The golang.org/x/net/context package (hereafter referred as net/context although it’s not yet in the standard library) is a wonderful tool for the Go programmer’s toolkit. The blog post that introduced it shows how useful it is when dealing with external services and the need to cancel requests, set deadlines, and send along request-scoped key/value data. The request-scoped key/value data also makes it very appealing as a means of passing data around through middleware and handlers in Go web servers. Most Go web frameworks have their own concept of context, although none yet use net/context directly. Questions about using net/context for this kind of server-side context keep popping up on the /r/golang subreddit and the Gopher’s Slack community. Having recently ported a fairly large API surface from Martini to http.ServeMux and net/context, I hope this post can answer those questions. About http.Handler The basic unit in Go’s HTTP server is its http.Handler interface, which is defined as: type Handler interface { ServeHTTP(ResponseWriter, *Request) } http.ResponseWriter is another simple interface and http.Request is a struct that contains data corresponding to the HTTP request, things like URL, headers, body if any, etc. Notably, there’s no way to pass anything like a context.Context here. About context.Context Much more detail about contexts can be found in the introductory blog post, but the main aspect I want to call attention to in this post is that contexts are derived from other contexts. Context values become arranged as a tree, and you only have access to values set on your context or one of its ancestor nodes. For example, let’s take context.Background() as the root of the tree, and derive a new context by attaching the content of the X-Request-ID HTTP header. type key int const requestIDKey key = 0 func newContextWithRequestID(ctx context.Context, req *http.Request) context.Context { return context.WithValue(ctx, requestIDKey, req.Header.Get("X-Request-ID")) } func requestIDFromContext(ctx context.Context) string { return ctx.Value(requestIDKey).(string) } ctx := context.Background() ctx = newContextWithRequestID(ctx, req) This derived context is the one we would then pass to the next layer of the system. Perhaps that would create its own contexts with values, deadlines, or timeouts, or it could extract values we previously stored. Approaches These approaches are now obsolete as of Go 1.7. Read my follow-up post that revisits this topic for Go 1.7 and beyond. So, without direct support for net/context in the standard library, we have to find another way to get a context.Context into our handlers. There are three basic approaches: Use a global request-to-context mapping Create a http.ResponseWriter wrapper struct Create your own handler types Let’s examine each. Global request-to-context mapping In this approach we create a global map of requests to contexts, and wrap our handlers in a middleware that handles the lifetime of the context associated with a request. This is the approach taken by Gorilla’s context package, although with its own context type rather than net/context. Because every HTTP request is processed in its own goroutine and Go’s maps are not safe for concurrent access for performance reasons, it is crucial that we protect all map accesses with a sync.Mutex. This also introduces lock contention among concurrently processed requests. Depending on your application and workload, this could become a bottleneck. [...]



Contributing to GitHub projects

2015-04-20T11:20:28-04:00

I often see people asking how to contribute to an open source project on GitHub. Some are new programmers, some may be new to open source, others aren’t programmers but want to make improvements to documentation or other parts of a project they use everyday. Using GitHub means you’ll need to use Git, and that means using the command-line. This post gives a gentle introduction using the git command-line tool and a companion tool for GitHub called hub. Workflow The basic workflow for contributing to a project on GitHub is: Clone the project you want to work on Fork the project you want to work on Create a feature branch to do your own work in Commit your changes to your feature branch Push your feature branch to your fork on GitHub Send a pull request for your branch on your fork Clone the project you want to work on $ hub clone pydata/pandas (Equivalent to git clone https://github.com/pydata/pandas.git) This clones the project from the server onto your local machine. When working in git you make changes to your local copy of the repository. Git has a concept of remotes which are, well, remote copies of the repository. When you clone a new project, a remote called origin is automatically created that points to the repository you provide in the command line above. In this case, pydata/pandas on GitHub. To upload your changes back to the main repository, you push to the remote. Between when you cloned and now changes may have been made to upstream remote repository. To get those changes, you pull from the remote. At this point you will have a pandas directory on your machine. All of the remaining steps take place inside it, so change into it now: $ cd pandas Fork the project you want to work on The easiest way to do this is with hub. $ hub fork This does a couple of things. It creates a fork of pandas in your GitHub account. It establishes a new remote in your local repository with the name of your github username. In my case I now have two remotes: origin, which points to the main upstream repository; and joeshaw, which points to my forked repository. We’ll be pushing to my fork. Create a feature branch to do your own work in This creates a place to do your work in that is separate from the main code. $ git checkout -b doc-work doc-work is what I’m choosing to name this branch. You can name it whatever you like. Hyphens are idiomatic. Now make whatever changes you want for this project. Commit your changes to your feature branch If you are creating new files, you will need to explicitly add them to the to-be-commited list (also called the index, or staging area): $ git add file1.md file2.md etc If you are just editing existing files, you can add them all in one batch: $ git add -u Next you need to commit the changes. $ git commit This will bring up an editor where you type in your commit message. The convention is usually to type a short summary in the first line (50-60 characters max), then a blank line, then additional details if necessary. Push your feature branch to your fork in GitHub Ok, remember that your fork is a remote named after your github username. In my case, joeshaw. $ git push joeshaw doc-work This pushes to the joeshaw remote only the doc-work branch. Now your work is publicly visible to anyone on your fork. Send a pull request for your branch on your fork You can do this either on the web site or using the hub tool. $ hub pull-request This will open your editor again. If you only had one commit on your branch, the message for the pull request will be the same as the commit. This might be good enough, but you might want to elaborate on the purpose of the pull request. Like commits, the first line is a summary of the pull request and the other lines are the body of the PR. In general you will be requesting a pull from your current branch (in this case doc-work) into the master branch of the origin remote. If[...]



Terrible Vagrant/Virtualbox performance on Mac OS X

2016-03-16 10:00:00 -0400T00:00:00-00:00

Update March 2016: There’s a much easier way to enable the host IO cache from the command-line, but it only works for existing VMs. See the update below. I recently started using Vagrant to test our auto-provisioning of servers with Puppet. Having a simple-yet-configurable system for starting up and accessing headless virtual machines really makes this a much simpler solution than VMware Fusion. (Although I wish Vagrant had a way to take and rollback VM snapshots.) Unfortunately, as soon as I tried to really do anything in the VM my Mac would completely bog down. Eventually the entire UI would stop updating. In Activity Monitor, the dreaded kernel_task was taking 100% of one CPU, and VBoxHeadless taking most of another. Things would eventually free up whenever the task in the VM (usually apt-get install or puppet apply) would crash with a segmentation fault. Digging into this, I found an ominous message in the VirtualBox logs: AIOMgr: Host limits number of active IO requests to 16. Expect a performance impact. Yeah, no kidding. I tracked this message down to the “Use host I/O cache” setting being off on the SATA Controller in the box. (This is a per-VM setting, and I am using the stock Vagrant “lucid64” box, so the exact setting may be somewhere else for you. It’s probably a good idea to turn this setting on for all storage controllers.) When it comes to Vagrant VMs, this setting in the VirtualBox UI is not very helpful, though, because Vagrant brings up new VMs automatically and without any UI. To get this to work with the Vagrant workflow, you have to do the following hacky steps: Turn off any IO-heavy provisioning in your Vagrantfile vagrant up a new VM vagrant halt the VM Open the VM in the VirtualBox UI and change the setting Re-enable the provisioning in your Vagrantfile vagrant up again This is not going to work if you have to bring up new VMs often. Fortunately this setting is easy to tweak in the base box. Open up ~/.vagrant.d/boxes/base/box.ovf and find the StorageController node. You’ll see an attribute HostIOCache="false". Change that value to true. Lastly, you’ll have to update the SHA1 hash of the .ovf file in ~/.vagrant.d/boxes/base/box.mf. Get the new hash by running openssl dgst -sha1 ~/.vagrant.d/boxes/base/box.ovf and replace the old value in box.mf with it. That’s it. All subsequent VMs you create with vagrant up will now have the right setting. Update Thanks to this comment on a Vagrant bug report you can enable the host cache more simply from the command-line for an existing VM: VBoxManage storagectl --name --hostiocache on Where is your vagrant VM name, which you can get from: VBoxManage list vms and is probably "SATA Controller". The VM must be halted for this to work. You can add a section to your Vagrantfile to do this when new VMs are created: config.vm.provider "virtualbox" do |v| v.customize [ "storagectl", :id, "--name", "SATA Controller", "--hostiocache", "on" ] end And for further reading, here is the relevant section in the Virtualbox manual that goes into more detail about the pros and cons of host IO caching. [...]



Linux input ecosystem

2010-10-01T15:27:24-04:00

Over the past couple of days, I’ve been trying to figure out how input in Linux works on modern systems. There are lots of small pieces at various levels, and it’s hard to understand how they all interact. Things are not helped by the fact that things have changed quite a bit over the past couple of years as HAL – which I helped write – has been giving way to udev, and existing literature is largely out of date. This is my attempt at understanding how things work today, in the Ubuntu Lucid release. Kernel In the Linux kernel’s input system, there are two pieces: the device driver and the event driver. The device driver talks to the hardware, obviously. Today, for most USB devices this is handled by the usbhid driver. The event drivers handle how to expose the events generated by the device driver to userspace. Today this is primarily done through evdev, which creates character devices (typically named /dev/input/eventN) and communicates with them through struct input_event messages. See include/linux/input.h for its definition. A great tool to use for getting information about evdev devices and events is evtest. A somewhat outdated but still relevant description of the kernel input system can be found in the kernel’s Documentation/input/input.txt file. udev When a device is connected, the kernel creates an entry in sysfs for it and generates a hotplug event. That hotplug event is processed by udev, which applies some policy, attaches additional properties to the device, and ultimately creates a device node for you somewhere in /dev. For input devices, the rules in /lib/udev/rules.d/60-persistent-input.rules are executed. Among the things it does is run a /lib/udev/input_id tool which queries the capabilities of the device from its sysfs node and sets environment variables like ID_INPUT_KEYBOARD, ID_INPUT_TOUCHPAD, etc. in the udev database. For more information on input_id see the original announcement email to the hotplug list. X X has a udev config backend which queries udev for the various input devices. It does this at startup and also watches for hotplugged devices. X looks at the different ID_INPUT_* properties to determine whether it’s a keyboard, a mouse, a touchpad, a joystick, or some other device. This information can be used in /usr/lib/X11/xorg.conf.d files in the form of MatchIsPointer, MatchIsTouchpad, MatchIsJoystick, etc. in InputClass sections to see whether to apply configuration to a given device. Xorg has a handful of its own drivers to handle input devices, including evdev, synaptics, and joystick. And here is where things start to get confusing. Linux has this great generic event interface in evdev, which means that very few drivers are needed to interact with hardware, since they’re not speaking device-specific protocols. Of the few needed on Linux nearly all of them speak evdev, including the three I listed above. The evdev driver provides basic keyboard and mouse functionality, speaking – obviously – evdev through the /dev/input/eventN devices. It also handles things like the lid and power switches. This is the basic, generic input driver for Xorg on Linux. The synaptics driver is the most confusing of all. It also speaks evdev to the kernel. On Linux it does not talk to the hardware directly, and is in no way Synaptics(tm) hardware-specific. The synaptics driver is simply a separate driver from evdev which adds a lot of features expected of touchpad hardware, for example two-finger scrolling. It should probably be renamed the “touchpad” module, except that on non-Linux OSes it can still speak the Synaptics protocol. The joystick driver similarly handles joysticky things, but speaks evdev to the kernel rather than some device-specific protocol. X only has concepts of keyboards and pointers, the latter of which includes mice, touchpads, joysticks, wacom tablets, etc. X al[...]



AVCHD to MP4/H.264/AAC conversion

2010-04-10T10:28:03-04:00

For posterity:

I have a Canon HF200 HD video camera, which records to AVCHD format. AVCHD is H.264 encoded video and AC-3 encoded audio in a MPEG-2 Transport Stream (m2ts, mts) container. This format is not supported by Aperture 3, which I use to store my video.

With Blizzard’s help, I figured out an ffmpeg command-line to convert to H.264 encoded video and AAC encoded audio in an MPEG-4 (mp4) container. This is supported by Aperture 3 and other Quicktime apps.

$ ffmpeg -sameq -ab 256k -i input-file.m2ts -s hd1080 output-file.mp4 -acodec aac

Command-line order is important, which is infuriating. If you move the -s or -ab arguments, they may not work. Add -deinterlace if the source videos are interlaced, which mine were originally until I turned it off. The only downside to this is that it generates huge output files, on the order of 4-5x greater than the input file.

Update, 28 April 2010: Alexander Wauck emailed me to say that re-encoding the video isn’t necessary, and that the existing H.264 video could be moved from the m2ts container to the mp4 container with a command-line like this:

$ ffmpeg -i input-file.m2ts -ab 256k -vcodec copy -acodec aac output-file.mp4

And he’s right… as long as you don’t need to deinterlace the video. With the whatever-random-ffmpeg-trunk checkout I have, adding -deinterlace to the command-line segfaults. I actually had tried -vcodec copy early in my experiments but abandoned it after I found that it didn’t deinterlace. I had forgotten to try it again after I moved past my older interlaced videos. Thanks Alex!




Real-time MBTA bus location + Google Maps mashup

2009-11-15T23:31:48-05:00

This weekend I read that the MBTA and Massachusetts Department of Transportation had released a trial real-time data feed for the positioning of vehicles on five of its bus routes. This is very important data to have, and while obviously everyone would like to see more routes added, it’s a start. I decided to hack together a mashup of this data with Google Maps, to see how easy it would be. In the end it took me a few hours on Saturday to get the site up and running, and a couple more on Sunday adding features like the drawing of routes on the map, colorizing markers for inbound vs. outbound buses, and adding reverse geocoding of the buses themselves. To do this I used three technologies (Google App Engine, JQuery, Google Maps) and two data sources (the real-time XML feed and the MBTA Google Transit Feed Specification files). Google App Engine App Engine is so perfectly suited for smaller, playtime hacks like this that it’s hard to imagine how anyone got anything done before it existed. The tedious, up-front bootstrapping that is required in so many programming projects has been enough to completely turn me off to small, spare-time hacking projects on occasion in the past. The brilliance behind a hosted software environment is obvious, but the amount of work to build a safe, hosted system with a fairly comprehensive set of APIs seems to be such a mountain of work that in many ways I find it surprising that anyone – even, perhaps especially, Google – built it at all. I chose the Python SDK and the programming was straightforward and easy. It takes some elements from Django, with which I am familiar from work. JQuery A no-brainer. Hands down the best JavaScript toolkit available. Making the AJAX calls to get route and vehicle location information was a breeze, and the transparent handling of the XML data of the real-time feed prevents me from losing the will to live – a common feeling when dealing with XML. My only complaint is with the documentation. While the API reference is good for any given piece of the API, the examples are a little light and there is absolutely zero cross-referencing to other parts, especially ones not a part of JQuery itself. It was not obvious, for example, how to deal with the XML document returned by the AJAX call. It sounds like the docs are getting some work, though, so this will hopefully improve. Google Maps This was my first endeavor with the Maps API, and it’s good. It’s not the best API in the world, but it’s hardly the worst either. Adding markers of different colors is annoying, but not so onerous as to make it tedious. The breadth of functionality provided is impressive, but then again it has been around for a few years at this point. Markers are easy to add, drawing the route map is absolutely trivial with a KML file, and even the reverse geocoding – which gives you a street address given a latitude/longitude pair – is straightforward. The docs suck, though. There’s no indication that a size or anchor position is required when creating an icon for a custom marker – required for colors other than red – and due to the minified JS files tracking down that error took longer than any other task in the project. Reverse geocoding mentions that a Placemark object will be returned, but that class doesn’t appear anywhere in the reference documentation. Real-time data feed Lots to like. Straightforward, easy to parse. It’d be nice if I didn’t have to do the reverse geocoding to figure out what the street address is, but it’s not a dealbreaker. Main downside is that it’s XML as opposed to JSON. And of course, it’s only 5 bus routes and zero subway and commuter rail routes. MBTA Google Transit Feed Specification files A comprehensive set of data describing every transit route, every stop, and e[...]



Python daemon threads considered harmful

2015-06-02T16:48:10-04:400T00:00:00-00:00

Update April 2015: Reading it again years later, I regret the tone of this post. I was frustrated at the time and it comes across now as just smarmy. Still, I stand by the principal idea: that you should avoid Python’s daemon threads if you can. Update June 2015: This is Python bug 1856. It was fixed in Python 3.2.1 and 3.3, but the fix was never backported to 2.x. (An attempt to backport to the 2.7 branch caused another bug and it was abandoned.) Daemon threads may be ok in Python >= 3.2.1, but definitely aren’t in earlier versions. The other day at work we encountered an unusual exception in our nightly pounder test run after landing some new code to expose some internal state via a monitoring API. The problem occurred on shutdown. The new monitoring code was trying to log some information, but was encountering an exception. Our logging code was built on top of Python’s logging module, and we thought perhaps that something was shutting down the logging system without us knowing. We ourselves never explicitly shut it down, since we wanted it to live until the process exited. The monitoring was done inside a daemon thread. The Python docs say only: A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.” Which sounds pretty good, right? This thread is just occasionally grabbing some data, and we don’t need to do anything special when the program shuts down. Yeah, I remember when I used to believe in things too. Despite a global interpreter lock that prevents Python from being truly concurrent anyway, there is a very real possibility that the daemon threads can still execute after the Python runtime has started its own tear-down process. One step of this process appears to be to set the values inside globals() to None, meaning that any module resolution results in an AttributeError attempting to dereference NoneType. Other variations on this cause TypeError to be thrown. The code which triggered this looked something like this, although with more abstraction layers which made hunting it down a little harder: try: log.info("Some thread started!") try: do_something_every_so_often_in_a_loop_and_sleep() except somemodule.SomeException: pass else: pass finally: log.info("Some thread exiting!") The exception we were seeing was an AttributeError on the last line, the log.info() call. But that wasn’t even the original exception. It was actually another AttributeError caused by the somemodule.SomeException dereference. Because all the modules had been reset, somemodule was None too. Unfortunately the docs are completely devoid of this information, at least in the threading sections which you would actually reference. The best information I was able to find was this email to python-list a few years back, and a few other emails which don’t really put the issue front and center. In the end the solution for us was simply to make them non-daemon threads, notice when the app is being shut down and join them to the main thread. Another possibility for us was to catch AttributeError in our thread wrapper class – which is what the author of the aforementioned email does – but that seems like papering over a real bug and a real error. Because of this misbehavior, daemon threads lose almost all of their appeal, but oddly I can’t find people really publicly saying “don’t use them” except in scattered emails. It seems like it’s underground information known only to the Python cabal. (There is no cabal.) So, I am going to say it. When I went searching there weren’t any helpful hints in a Google search of “python daemon threads considered harmful”. So, I am staking [...]