Subscribe: chrishowie.com
http://www.chrishowie.com/feed/
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
animal  clone  const  copy  data  database  dog  don  int const  int  migration  new  public  return  schema  script  std 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: chrishowie.com

blog.chrishowie.com



The best laid plans are in my other pants



Last Build Date: Sun, 24 Aug 2014 22:54:00 +0000

 



My thoughts on the ALS challenge

Sun, 24 Aug 2014 22:54:00 +0000

Before you read this entry, please be aware that it might rub you the wrong way. (That's okay with me if it's okay with you. I don't expect that everyone will share my perspective.) There's always the one guy who has to be the party pooper, and I guess it's my turn. Ever since hearing about the ALS challenge I've felt a bit weird about it. With every video I watched I got a bit more upset and I couldn't quite put my finger on the cause. It seems like a very harmless way to raise money for charity. I don't have a problem with people trying to raise money for charity. After considerable thought and some discussion with friends I have figured out why this particular challenge rubs me the wrong way. I have a problem with the mechanisms by which it has spread from person to person, in particular that it involves challenging specific people on a public forum by name. Ultimately there are three reasons why people might participate after being named publicly: They like being part of something big, especially if it helps people. Your natural team player, they will jump on the challenge simply because they want to be involved in it. They feel guilty at the thought of not participating. After being called out by someone else, they can't help but join in lest they feel like a grinch. The 24-hour time limit on the challenge serves to exacerbate the guilt and get people to react quickly without taking any time to think about what they are doing. ALS research is their top priority and is something they genuinely care about. Some people may fit into all of these groups. I don't really fit into any of them. I've never been much of a team player to this extent; I mean, I work fine with others and even enjoy it, but I'm not a bandwagon-jumper. Neither am I one to succumb to social pressure or guilting; I vehemently resist it. (This will be no surprise to those who know me.) Ultimately, if I donate to a charitable cause it will be because I have given it considerable thought, and because I have considered it in contrast to other charities. It's not that I don't think ALS is a terrible condition, it's just that there are other charities to whom I would prefer to give. If that is truly where you want to give then more power to you. I don't think someone saying my name in a video is a good decision-making process for deciding when, how much, and to what charitable organization I should give. On the contrary, bandwagon mentality and/or guilt trips are extremely poor reasons to give. I don't believe that most people actually become part of the third group in 24 hours. Now here is where I'm going to step on some toes (if I haven't already), and while I don't want to, I can't really help how I feel about this. I've been having these thoughts long before I was ever nominated to participate in the challenge, and I was really hoping that nobody would mention me because I didn't want to say these things. By nominating someone, you are making the following assumptions: You assume that they have either the money or the water to participate. I have several friends on the west coast who don't have the funds to participate right now, and many parts of the west coast are under a severe drought. They can't spare either of the resources this challenge calls for. You assume that ALS research is one of their top priorities. Maybe it is and maybe it isn't. Maybe there is something else they have judged more important to them. This isn't a decision for you to make on their behalf. You assume that they would actually like to participate in the first place. By nominating someone, this is what you are saying: "I don't know if you have the money or water to participate, but since I've said your name you are now socially obligated to give this certain amount of money to charity, or waste water. You must provide us with proof of either action. If you don't then you will be judged." If you've done your share of nominating you may be rejecting this interpretation because it's not what you meant. But -- and I cannot stress this enough -- it is [...]



SGDQ 2013

Thu, 25 Jul 2013 22:55:00 +0000

Summer Games Done Quick 2013, a video game speedrunning marathon for charity, is underway. 100% of donations go to Doctors Without Borders. Please consider donating!

My brother will be speed-running The Addams Family: Pugsley's Scavenger Hunt (SNES) on Sunday, 2013-07-28, at approximately 10:05 AM EDT (GMT-4). The official estimated completion time for his run is 35 minutes but should be no longer than 25. (There is a full schedule on their website. Maybe there are other games you'll want to tune in for!)




New blog software

Tue, 23 Jul 2013 13:19:00 +0000

I have been migrating my primary server from VPS.NET to Linode because I can get better specs for the same rate as well as Linode having considerably (5-6x) better disk I/O performance. One of the goals of this migration was to avoid deploying Apache or MySQL on the new server. I have been trying to stick to nginx and PostgreSQL for a variety of reasons, and I really don't want two database servers running.

To that end, I've had to dump Wordpress since the only officially-supported database server is MySQL, and PostgreSQL support doesn't seem to be a priority. Nobody is actively working on it from what I can see. (There was some discussion around supporting PostgreSQL, but that was seven years ago, and the history of the Using Alternative Databases page on the Codex shows that interest apparently declined in 2008.)

The replacement I've chosen is Pelican, which is a static blog generator. This significantly lowers the bar for dependencies, as the hosting environment needs only support serving static files.

It does mean that comments are not something I get for free, but Pelican does support comments via Disqus. I have been working on importing existing comments, but until that work is complete I will be leaving comment support disabled.




Lock-free clustering of large PostgreSQL data sets

Fri, 15 Feb 2013 13:45:00 +0000

Since 2012-09-27, I have been collecting overstock data from TF2WH every five minutes and storing a snapshot of this data in my database. This enables me to do some really cool things, such as chart past data as well as make projections about future stock levels. For this amount of time, data collection has been proceeding largely uninterrupted and today I have just shy of 40 million individual records. However, the optimal physical layout of this data is to have the records for each item type grouped together, sorted by timestamp. Due to the nature of time, these records wind up being grouped first by timestamp, and then by item type. This can make lookups of data for a particular item slow, since many database pages need to be visited in order to collect all of the data about a particular item. PostgreSQL (among other databases, of course) has two primary solutions to deal with this. The simplest one is the CLUSTER statement, which will reorganize all of the data in a table based on an index, such that lookups making use of that index are quick. Since the table's primary key is composite around the item type and timestamp, this is convenient and does the right thing. Unfortunately, clustering requires an exclusive lock on the table. Sorting and writing out 40 million rows is not a fast operation, and while this is underway no new data can be added to the table, and nobody can query the table for information. This is not a good situation when new data is coming in every five minutes, and users are frequently requesting data of a web application. The more complex solution is to use table partitioning. This requires that one empty parent table be created, and many child tables be created, each one for a specific item type. The parent table will allow querying across the entire data set, but since each item's data is stored in a different table, clustering is no longer required -- each table will be ordered by timestamp (since new rows are appended) and the separate tables keep the data sets for each item type physically separate on disk. This is a nice solution in theory, but PostgreSQL does not currently provide any mechanism for automatically partitioning a table based on a set of columns; the child tables have to be maintained either by hand, or by script. I like to avoid this kind of complexity when possible. A compromise solution I'd thought of would allow collection of data to proceed during a cluster operation, but bring down the web application for the duration of the cluster. Before the cluster operation begins, I would have the data collection script retarget insertions into a different table. Once the cluster operation finishes, the script would return to inserting data into the primary table, and the contents of the other table would be transferred over. This is simple, but of course we want to keep the web application responsive if at all possible. After asking in #postgresql I received some suggestions, and wound up implementing a modified version of one of them. This one allows me to effectively cluster the data set while allowing data collection to proceed and keeping the web application up (though with degraded performance, as the clustering operation and web application will both be waiting on each others' disk I/O). Further, I was able to implement this change to the production database schema without interrupting data collection, nor taking the web application offline. (Although the application was unreachable for a few seconds due to an oversight on my part regarding table permissions. Theoretically, if I'd foreseen this requirement then there would have been no downtime.) The table is partitioned into three child tables: static1, static2, and latest. The parent table has a rule that redirects all inserts into the "latest" partition. This serves as the data collection point, where new rows sit until they are migrated into one of the other two partitions. The "static1" and "static2" partitions serve as front and back buff[...]



Object Copying in C#

Tue, 22 Jan 2013 16:18:00 +0000

When working on some sort of data-driven project, I frequently have the need to allow deep-copying of data objects. There are several patterns that accomplish this, but I've settled on one in particular. Most .NET developers are probably familiar with the ICloneable interface. While this is a good starting point, it is not what I choose to rely on, for two reasons. First, the return type of the Clone method is object, so a cast is required. Second, the interface doesn't really give you any special functionality. Nonetheless, implementing interfaces is usually a good thing, so my approach does use the interface, if only as a tag. (I am leaving out equality test implementations for the sake of brevity, but one usually wants to implement equality testing when an object can be cloned.) The common pattern I see when implementing a Clone method is to declare one public virtual method on the base class: abstract class Animal { public IList Children { get; set; } public virtual Animal Clone() { var copy = (Animal)MemberwiseClone(); // Deep-copy children copy.Children = Children.Select(c => c.Clone()).ToList(); return copy; } } Simple enough. Let's say that we have a derived class that needs some additional logic. We just override the Clone method, right? class Dog : Animal { public Collar Collar { get; set; } public virtual Animal Clone() { var copy = (Dog)base.Clone(); copy.Collar = Collar.Clone(); return copy; } } This works, but as soon as you try to clone a Dog directly, the ugliness of this pattern is apparent. Animal animal = otherAnimal.Clone(); // Great! Dog dog = otherDog.Clone(); // Compile-time error Unfortunately, the return type of Animal.Clone() is Animal and subclasses may not change the return type, not even to narrow it. So to clone a Dog into a variable of type Dog means we have to cast: Dog dog = (Dog)otherDog.Clone(); Yuck. This is passable, but it's hardly optimal. The good news is that with just one tweak, we can make this pleasant to deal with. First, the Clone method needs to be made protected and renamed. Second, we create a new public Clone method that is not virtual and calls the protected virtual method. Subclasses hide this method with a new implementation that does the same thing, but casts the result. Here's the full implementation: abstract class Animal : ICloneable { public IList Children { get; set; } public Animal Clone() { return CloneImpl(); } object ICloneable.Clone() { return CloneImpl(); } protected virtual Animal CloneImpl() { var copy = (Animal)MemberwiseClone(); // Deep-copy children copy.Children = Children.Select(c => c.Clone()).ToList(); return copy; } } class Dog : Animal { public Collar Collar { get; set; } new public Dog Clone() { return (Dog)CloneImpl(); } protected virtual Animal CloneImpl() { var copy = (Dog)base.CloneImpl(); copy.Collar = Collar.Clone(); return copy; } } Now, we have nice class-specific methods that will return a properly-typed reference to the new copy. Animal animal = otherAnimal.Clone(); // Works Dog dog = otherDog.Clone(); // Also works! It's worth noting that both patterns will properly copy objects of more specific types than the reference you use to copy them. For example, given this variable: Animal animal = new Dog(); animal.Clone() will return a Dog instance, typed as Animal. This is what we'd expect.[...]



httpd migration complete

Thu, 04 Oct 2012 13:32:00 +0000

I've finished the httpd migration process. chrishowie.com is now using nginx as its primary httpd, which reverse-proxies to Apache for only a few mod_python and mod_mono web applications. Over the next few weeks, I'll be trying to eliminate Apache entirely.




httpd and URI-to-site-mapping migration

Tue, 02 Oct 2012 16:27:00 +0000

Over the next week or so I'll be working on migrating this site from Apache to nginx, as well as altering the way that various URLs map to sites/applications. I will be trying very hard to avoid any service interruptions by fully testing my nginx configuration before replacing Apache, but who knows what might happen.

To give some insight on this, I am switching to nginx because the primary performance bottleneck on the server is available RAM, and nginx tends to be extremely conservative with RAM while also consuming less CPU resources.

I'm altering the URL mapping configuration to better cope with the fact that I host many different services on the primary www.chrishowie.com virtual host. The first one I installed was this Wordpress blog, and I installed it at the virtual host document root. This means that any sites hosted on the same virtual host effectively live in the same URI-space as my blog, and I really don't like this. After the migration, each site will have its own subdomain to better isolate them. Right now, the subdomain configuration is very simple:

  • www.chrishowie.com hosts this blog and several applications.
  • chrishowie.com redirects to www.chrishowie.com.

After the migration, this will change:

  • www.chrishowie.com will be deprecated. Various URIs will redirect (with HTTP status code 301 "Moved Permanently") to other subdomains in an effort to preserve the functionality of existing incoming links.
  • blog.chrishowie.com will host this blog.
  • chrishowie.com will host a simple static "business card" type site, linking to my blog and a few other applications.
  • Other subdomains will be created as necessary to support the other applications running on this site.



TF2 item store launched

Sun, 01 Jul 2012 23:07:00 +0000

So I had some free time last weekend and coded an item store for Team Fortress 2. I collect items and sell them for varying prices, usually one scrap metal each. If you're interested, check out the store!

The store is built on a few components, all written in Python. There is a script to download the TF2 item schema, which includes data such as item names, quality names, item image URLs, and more. I selectively import this data into a table in a PostgreSQL database. Then, another script fetches the contents of my backpack and stores it into another table. One more table references the backpack table, listing the items I currently have for sale and their prices in item quantities (one scrap metal = two items). Finally, a mod_python publisher script fetches this data and renders it as a visually-pleasing item list, all without any client-side JavaScript.

If the store has something you're interested in, feel free to contact me on Steam! You can add me to your friends list from the store, and I will reply as soon as possible.




C++ references, continued

Tue, 08 May 2012 23:53:00 +0000

So I got some feedback about my last C++ post. The comment states that references are not pointers, they are just names for another object. Sorry for reopening a topic after nearly 6 months. But I cannot stay silent. I think you got it wrong. Completely. Although a reference might behave like "some sort" of a pointer, it is *not* a pointer. Your statement: "A reference is effectively a pointer, but this is hidden by the language." is completely wrong. To quote the C++ standard: "A reference is an alterantive name for an object." It is just a new name for something that you’ve defined elsewhere. That’s the very reason why it cannot be null –> You cannot have an alternative name for an object that you do not have yet. --Willi Burkhardt Great, in theory. Unfortunately, none of the compilers I have used treat references as anything other than pointers. References are, on some level, supposed to guarantee non-null-ness as well as that they reference a valid object. This is not true in any compiler I have ever used. Take this example (see it run): #include static int const a_const = 5; int const& A() { return a_const; } static int const* b_ptr = 0; int const& B() { return *b_ptr; } int main() { int const& a_ref = A(); std::cout << "Called A()" << std::endl; std::cout << "a_ref: " << a_ref << std::endl; int const& b_ref = B(); std::cout << "Called B()" << std::endl; std::cout << "b_ref: " << b_ref << std::endl; return 0; } If we are to believe that references are simply another name for an object, then converting *b_ptr to a reference should have caused a runtime error. After all, we dereferenced a null pointer, right? The compiler should emit code to prevent this, right? In an ideal world, this would cause an error -- but it does not. The segmentation fault does not come until b_ref is used; indeed, we see "Called B()" in the program output, indicating that B() successfully returned a reference, which was stored in b_ref. Obviously, at runtime there was a null pointer dereference. But we didn't use a pointer, I hear you saying. We used a reference! Then please explain this behavior to me. On a language level, sure, references are "names for objects." But this does not change the fact that the implementation is done using memory addresses -- which is fundamentally the same thing pointers do. This helps to explain why we see the behavior of this sample. As I mentioned in my last post, when you convert an expression to a reference type, it's treated exactly as though you had converted it to a pointer type, with an implicit address-of operator (&). So we can rewrite this function: int const& B() { return *b_ptr; } Like this: int const* B() { return &*b_ptr; } And it becomes immediately clear why the segmentation fault did not occur here -- taking the address of a dereference expression is the same thing as taking the original expression. The & and * cancel out during compilation, and we just return the pointer. Take a look at this example, which is identical to the above example, except that A() is gone, and B() now returns a pointer, with dereferences added in the appropriate places (see it run): #include static int const* b_ptr = 0; int const* B() { return &*b_ptr; } int main() { int const* b_ptr = B(); std::cout << "Called B()" << std::endl; std::cout << "b_ref: " << *b_ptr << std::endl; return 0; } Identical behavior. So you can throw the spec at me all you want, but every implementation I've tried uses pointer-with-automatic-dereference semantics -- if you convert every reference to a pointer, ad[...]



Database versioning and handling branching

Mon, 26 Mar 2012 13:42:00 +0000

It's no secret to developers of database-driven applications that trying to version a database schema (and seed data) is a royal pain. Propagating changes from your development environment to a live environment is not something that most version control systems are well-equipped to do. This is further complicated by distributed VCSes, like Git -- how do you prevent schema migration chaos when developers can branch on a whim? I've been mulling this issue over for a few months now. There are several development projects I have where database versioning is necessary. Finally, I've come up with a process that is not only effective, but actually really simple! Taking a clue from Git, we can use a digest of the current schema definition file (the SQL script that will create the database structure) as the schema identity. The schema requires a meta-table to describe the identity of the current schema -- this will allow for automated schema migration, since the migration script can detect the current schema version. (I usually have a meta-table anyway, for storing site configuration in key/value pairs. The schema version fits in this table fairly well.) So, let's say we're starting a new project. We design a schema, making sure to include this meta-table as well as any seed data required for the application to function. This excludes the "current database schema identity" row, since adding that row in the schema script will cause the identity to change! Then we write a migration script. This script has two functions: load the initial schema, and migrate between schema versions. When performing the initial load, it should follow this by inserting the database schema identity into the meta-table. The identity is of course obtained by digesting the schema file. Now we are ready to make changes. Let's say we want to add a column to the table. First, we note what the current schema's identity is. Let's call this $SCHEMA_ORIGINAL. We tweak the schema definition file to include this column, and then we obtain the digest of the schema file, calling this $SCHEMA_NEW. Now, we write two migration functions in the migration script: one that will migrate from $SCHEMA_ORIGINAL to $SCHEMA_NEW (an ALTER TABLE ADD COLUMN query) as well as one that will migrate in the opposite direction (ALTER TABLE DROP COLUMN). This will allow us to roll back the database if necessary. Now, when you ask the migration script to upgrade the database schema, it only has to fetch the database's current version, digest the schema definition file, and then find a path between those versions using a breadth-first search of the available migrations, starting at the current version. This technique can even account for merges! When merging branches A and B together, you would resolve any conflicts that arise in the schema definition, and then construct four migration functions: A to merged, merged to A, B to merged, and merged to B. The breadth-first search during migrations means that if you are then switching from branch A prior to the merge to branch B prior to the merge, it may actually be faster to migrate the database through the merge instead of backing up until the point A and B diverged. It may also be useful to provide a mechanism to tag certain revisions as causing data loss (such as rolling back a column addition). The migration script would then prefer longer migration paths that preserve data over shorter migration paths that destroy it. There are some downsides to this approach. For one, migration functions will have names that provide little meaning to humans, something like migrate_0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33_to_62cdb7020ff920e5aa642c3d4066950dd1f01f4d(). And another is that the migration script will need to construct an in-memory graph of every migration so that it can perform its search. If the onl[...]