Subscribe: 20bits
Added By: Feedage Forager Feedage Grade B rated
Language: English
application  bits  english  facebook  game  language  letter  number  poll  polls  prime bits  prime  puzzle  questions  word  words 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: 20bits

Recent Articles at 20bits

20bits by Jesse Farmer


Keith Olbermann Thinks I'm an Idiot

Sat, 05 May 2007 00:03:49 +0000

This story ends with Keith Olbermann dismissing me as "another idiot" on national television, but it begins on a Monday morning with me sitting on my brown leather IKEA couch in Palo Alto, two blocks from Facebook's then-new College Terrace office. Five months earlier I started company with Matt Humphrey, Joe Damato, and Aman Gupta called Bumba Labs. Our first application, Polls, let anyone on Facebook create or vote in a poll on Facebook. When someone voted they were prompted to post their response to their newsfeed, where all their friends could see it. That was enough to make it grow exponentially, and within a month there were millions of people voting. Over 10,000 polls were created each day. Even Mark Zuckerberg used our app to create a poll about Gidget, the Taco Bell dog, who had died earlier that year. Not a bad start. By the end, Facebook, the Secret Service, and every major news station would be involved. Monday, September 2009, 9AM I sat down on my couch, opened my laptop, and logged on to Facebook to see how our application was doing. The application "Polls" is temporarily unavailable due to an issue with its third-party developer. We are investigating the situation and apologize for any inconvenience. That's the message I saw when I visited our application that morning. "Today is going to be awesome!" Every successful Facebook developer has seen that message at least once. It means Facebook found something they didn't like in your application and decided to take it down. Normally they'd warn you a few days in advance and tell you to fix it or else. I double-checked my email and, nope, no warning — the app was just gone. It would take Facebook at least a day to respond to any questions I had, so in the meantime I connected to the Polls database to see if I could spot anything unusual. At Facebook's request we had implemented a feature to report offensive polls a few months earlier, and now took time each morning to spot and delete any truly awful polls. People would report a poll for anything, offensive or not: the poll's prompt didn't agree with their politics, there was a typo in one of the answers, the poll's photo offended them, etc. For most polls there was one report every hundred votes or so, but today there was a poll with only a few hundred votes and thousands of reports. What was it? The Best Laid Plans: 10AM I knew in my gut this is why Facebook shut down our application, but it was still strange that they hadn't warned us. Playing dumb, I sent an email to someone I knew on the Facebook policy enforcement team. Hey XXX, Polls is down and it's displaying the TOS violation screen: The application "Polls" is temporarily unavailable due to an issue with its third-party developer. We are investigating the situation and apologize for any inconvenience. Any idea what's up? Cheers, Jesse Our users had created horrible polls in the past, asking questions like "Should gay people be lynched?" or "Is Mrs. Jones the English teacher at such-and-such High School a racist?" We'd delete those polls as soon as we found them and ban whomever made them from ever using Polls again. In this case, the poll was created by a middle school girl from Orange County, California. She'll be graduating from high school this year. This looked less like someone earnestly plotting to kill Obama and more like a bored kid phoning in a fake bomb threat to their high school. Thankfully, only a few hundred people voted in the poll — the truly viral polls had millions of votes. How did Facebook find this if it had so few votes, though? It took at least several thousand votes over the previous hour to break into the list of popular polls, and nobody outside Bumba saw the complaints. I decided to wait until Facebook got back to me before I did anything else. The poll was removed, the provocateur banned, and hardly anyone had voted in the poll. I left Palo Alto to spend the day working in San Francisco out of Matt, Joe, and Aman's apartment. The Carnival Begins: 12PM When I arri[...]

Facebook job puzzles: Prime bits

Fri, 27 Apr 2007 00:39:05 +0000

Welcome to the second installment of 20bits' Facebook job puzzles solution manual. This time I am going to tackle the Prime Bits puzzle. Like my Korn Shell solution, this puzzle is mostly mathematical. This time, however, we're going to be wading deep into combinatorics territory. Combinatorics is the mathematics of counting. If you have three pairs of socks, two pairs of pants, and four shirts, how many outfits can you wear? If you have a collection of twenty playing cards how many two-card hands are there? These are the sorts of questions combinatorics exists to answer, although the questions quickly become more complicated. Again, I'm aiming to make this understandable to an intelligent layperson interested in the puzzle. If that's you then read on! The Question This time around the question is much more straightforward. Every (positive) integer can be represented using binary. Normally we write using decimal, which is based around powers of ten. For example, 215 is really 2*102 + 1*101 + 5*100. Binary involves using 2s rather than 10s as place values, so, e.g., 5 is written as 101 = 1*22 + 0*21 + 1*20. Every integer therefore can be respresented as a string of zeroes and ones. Define P(x) to be true if the number of ones in the binary representation of x is prime and false otherwise. So, e.g., P(5) is true but P(4) is false. Our job is to implement the function uint64_t prime_bits(uint64_t a, uint64_t b); which returns the number of integers k, a ≤ k ≤ b such that P(k) is true. uint64_t is a way of designating 64-bit numbers in C/C++. The Obvious Solution Assuming we have implemented P(x) the obvious solution in Python is this def prime_bits(a,b): return sum([P(k) for k in range(a,b+1)]); That is, for each integer in the desired range we calculate P(k) and them sum all these values. Since "true" corresponds to "1" and "false" to "0" we get the total number of true entries in our desired range. A more explicit procedural implementation would be def prime_bits(a,b): c = 0 for k in range (a,b+1): if P(k): c = c + 1 return c Assuming P(n) = O(n) (this is called Big-O notation) then we have prime_bits(a,b) = O(b2). Surely we can do better. In fact, according to the constraints on the Facebook page we have to do better to even be considered in the running. The above big-o analysis tells us that we must be careful of two things: one, our imlementation, whatever it is, cannot iterate over every integer between a and b; two, our prime-checking function has to be fast enough that it doesn't dwarf the running time of the rest of our algorithm. Bring a Little Binomial into Your Life Forget about the fact that we're dealing with numbers and think of our binary representation as a string. That is, think of "110101" as nothing more than some sequence of zeroes and ones. It could just as easily be "aababa" or "!!?!?!" or any other two characters we choose. From this perspective we can turn the question on its head. Rather than going through each possible string one by one, let's count groups of strings en masse. How many 6-character strings of zeroes and ones have 3 ones? The answer rests in that foundational function of combinatorics, the binomial coefficient. It is defined as follows: pronounced "n choose k" and sometimes written nCk, where is the factorial. This might seem like gibberish so let's start with what it means. Let's say we're given a vat of balls labeled from 1-6. The binomial coefficient tells us how many ways we can pick three balls from the vat if we don't care about the order in which they are picked. We might choose {1,2,3} or {7,3,9}, for example, but {4,5,3} and {3,4,5} are considered to be the same drawing. How might we go about counting this? Well, let's start by caring about order because that's easier to count. If we do care about order then there are 6 ways to pick the first ball, 5 ways to pick the second ball, and 4 ways to pick the third and final ball. This means there are 6*5*4 ways t[...]

Facebook job puzzles: Korn Shell

Thu, 19 Apr 2007 23:37:14 +0000

Welcome to the first installment of 20bits' Facebook job puzzles solutions manual. My ultimate goal is to solve every interesting puzzle in the aforelinked list and make a public post with the solution code and an explanation. Why? Because I'm a mathematician by training and no good puzzle should go unsolved. I intended to start with the Evil Gambling Monster puzzle because I thought it would be the best learning experience given my lack of formal CS training. I did learn some new algorithms in solving it (e.g., the A* search algorithm), but that's when the Korn Shell puzzle caught my eye. My inner mathematician couldn't resist and I believe I have solved the puzzle mathematically. The solution involves my favorite area of mathematics (group theory), so I'm going to attempt to explain it in a way that is understandable to a layperson. The Rules of the Game You are sitting at a computer terminal whose 26 alphabetic keys have been randomly rearranged (permuted) and you enter your name, say, "Mike Korn." Since the keys are scrambled what appears on the screen is also scrambled. The game consists of you typing the characters you see on the screen until your name appears. The length of the game is the number of times you must type a the string on the screen before your name appears, at which point the game terminates. A game therefore consists of two pieces of information, viz., an input string and a permutation of the 26 alphabetic keys. Everything else is completely determined by these two pieces of information. The puzzle is this: if I give you a name can you give me the length of the longest possible game which uses that name as its first input? Questions to Ask The first step in reasoning mathematically is to isolate your assumptions. The biggest assumption above is that every possible game is guaranteed to terminate in a finite number of steps. This isn't particularly obvious on the outset. This leads us to the following questions. Is every possible game guaranteed to terminate in a finite number of steps? How does the length of game vary with respect to the two inputs, i.e., the name and the permutation? If every game does terminate in a finite number of steps, what is the longest game? If we can answer these questions well enough then we are done. I'm claiming that each of these questions has an explicit answer and that no algorithm is necessary to determine any of them. So, let's get answering. Helpful Notation I will introduce more notation as it becomes necessary but I'd like to introduce a few preliminaries. Let's say the A and B keys have been swapped on the keyboards and no other keys have been touched. We can write this as A → B → A, since, if you type 'A' the screen prints 'B,' and then if you type 'B' the screen again prints 'A.' Similarly, if we took the letters Q,W,E and R and moved each to the right (on a QWERTY keyboard), with R going to Q's place, we could write Q → W → E → R → Q. Additionally, if I want to refer to a specific permutation I will use the symbol σ or τ to represent it. Those are the lower-case Greek letters sigma and tau, respectively. This is only in the case where I am talking about a permutation in the abstract — if I want to talk specifically about what the permutation has done to the keyboard keys then I will use the above notation to describe its action. The single-letter case Is every possible game guaranteed to terminate in a finite number of steps? When in doubt, start simple. What happens if we have a single letter as an input, say, 'A?' We enter A and another letter, say, V, appears. We then enter V and yet another letter appears. Now, at any point, the next letter that appears could be A, in which case we're done. But can this go on forever? Well, let's say we've seen every letter from A to Z exactly once and the last letter on the screen was an X. What happens when we hit the X key? The original letter, A in this case,[...]

10 Tips for Optimizing MySQL Queries (That don't suck)

Tue, 10 Apr 2007 00:27:34 +0000

Justin Silverton at Jaslabs has a supposed list of 10 tips for optimizing MySQL queries. I couldn't read this and let it stand because this list is really, really bad. Some guy named Mike noted this, too. So in this entry I'll do two things: first, I'll explain why his list is bad; second, I'll present my own list which, hopefully, is much better. Onward, intrepid readers! Why That List Sucks He's swinging for the top of the trees The rule in any situation where you want to opimize some code is that you first profile it and then find the bottlenecks. Mr. Silverton, however, aims right for the tippy top of the trees. I'd say 60% of database optimization is properly understanding SQL and the basics of databases. You need to understand joins vs. subselects, column indices, how to normalize data, etc. The next 35% is understanding the performance characteristics of your database of choice. COUNT(*) in MySQL, for example, can either be almost-free or painfully slow depending on which storage engine you're using. Other things to consider: under what conditions does your database invalidate caches, when does it sort on disk rather than in memory, when does it need to create temporary tables, etc. The final 5%, where few ever need venture, is where Mr. Silverton spends most of his time. Never once in my life have I used SQL_SMALL_RESULT. Good problems, bad solutions There are cases when Mr. Silverton does note a good problem. MySQL will indeed use a dynamic row format if it contains variable length fields like TEXT or BLOB, which, in this case, means sorting needs to be done on disk. The solution is not to eschew these datatypes, but rather to split off such fields into an associated table. The following schema represents this idea: CREATE TABLE posts ( id int unsigned not null auto_increment, author_id int unsigned not null, created timestamp not null, PRIMARY KEY(id) ); CREATE TABLE posts_data ( post_id int unsigned not null. body text, PRIMARY KEY(post_id) ); That's just...yeah Some of his suggestions are just mind-boggling, e.g., "remove unnecessary paratheses." It really doesn't matter whether you do SELECT * FROM posts WHERE (author_id = 5 AND published = 1) or SELECT * FROM posts WHERE author_id = 5 AND published = 1. None. Any decent DBMS is going to optimize these away. This level of detail is akin to wondering when writing a C program whether the post-increment or pre-increment operator is faster. Really, if that's where you're spending your energy, it's a surprise you've written any code at all My list Let's see if I fare any better. I'm going to start from the most general. Benchmark, benchmark, benchmark! You're going to need numbers if you want to make a good decision. What queries are the worst? Where are the bottlenecks? Under what circumstances am I generating bad queries? Benchmarking is will let you simulate high-stress situations and, with the aid of profiling tools, expose the cracks in your database configuration. Tools of the trade include supersmack, ab, and SysBench. These tools either hit your database directly (e.g., supersmack) or simulate web traffic (e.g., ab). Profile, profile, profile! So, you're able to generate high-stress situations, but now you need to find the cracks. This is what profiling is for. Profiling enables you to find the bottlenecks in your configuration, whether they be in memory, CPU, network, disk I/O, or, what is more likely, some combination of all of them. The very first thing you should do is turn on the MySQL slow query log and install mtop. This will give you access to information about the absolute worst offenders. Have a ten-second query ruining your web application? These guys will show you the query right off. After you've identified the slow queries you should learn about the MySQL internal tools, like EXPLAIN, SHOW STATUS, and SHOW PROCESSLIST. These will tell you what resources are being spent where, and what side[...]

What is a word? An introduction to computational linguistics.

Wed, 04 Apr 2007 15:09:00 +0000

What is a word? This question is one of the most deceptively simple ones I know. Everyone will say they know the answer, or at least say they know one when they see one, but even native speakers of a language can and do disagree. The dictionary isn't much help since many dictionaries have multi-sentence, ad hoc definitions which basically boil down to "a word is a unit of language that means something, sort of." Let's jump ahead and assume we know what a word is, or that we can get native speakers to identify most words most of the time. Furthermore, let's say that our goal is to get a computer to understand a given language. Since humans learn languages initially by learning words and basic grammar it seems like a good choice to try and get computers to recognize words. So, our goal: given a string of English letters insert spaces between the words. What is a word? To show that the above exercise isn't totally contrived let's look at some of the subtleties in the idea of the word. This is only for people interested in the "linguistics" part of "computaitonal linguistics," but if you want to read it then click here. Words are linguistic constructs, not orthographic ones. Language preceeds writing. Children can speak and comprehend a language before they learn to read and write in that language. That is to say nothing of people who are illiterate or languages which have no formalized writing system. Furthermore, when learning a foreign language one typically learns words and basic grammar long before learning how to write, particularly if the writing system is dramatically different, e.g., an English-speaker learning Chinese or Arabic. This is all to say that words are not just formal orthographic constructs like quotation marks or apostrophes. Words appear to have some linguistic reality and are therefore worth studying from a language perspective. None of this tells us what a word is, only that speakers of a language believe there is such a thing as a word. English speakers can still say, however, that what we see on paper is more-or-less accurate: spaces represent breaks between words. This dodges the question since the idea of a "word" exists cross-linguistically. The definition of a word, therefore, should encompass all such contingencies. Synthetic languages A few definitions, first. A morpheme is the smallest unit of language with a meaning. "Dogs" has two morphemes: "dog" and "s," with the former indicating a canine and the latter indicating multiplicity. Similarly, "looked" as two morphemes, with the suffix "-ed" indicating the tense of the verb "look." Synthetic languages have a high morpheme/word ratio. English-speakers might be familar with the comically long German word. In fact, German allows for essentially arbitrarily long words. Donaudampfschiffahrtsgesellschaftskapitän, for example, means "Danube company steamship captain." Is this one word or four? And even in the English translation, is "steamship" one word, or two? Another extreme are synthetic languages like Hungarian which have many more grammatical affixes than English or German. The ideas of "conditional," "future," etc. are all marked by single morphemes attached to a word. In English "I would go" is three words, but in Hungarian it would be appear to be just one or perhaps two. Phonetics versus Orthography But most linguists wouldn't even say the above isn't that relevant. At best it just provides us with more evidence that words are something real. When we speak, however, there is no "space" between words in the same way that there are spaces between words in English. If you've ever learned a foreign language you probably remember a point at which you realized you were hearing words rather than sounds. Before that it sounded like one continuous stream of nonsense — and you were right about the "one continuous stream" part, anyhow. Once you have learned some of the languag[...]