Subscribe: Damien Katz
http://damienkatz.net/atom.xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
code  couchbase  design  didn  don  erlang  good  hard  high  level  make  simple  tests  things  time  work  working 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Damien Katz

Damien Katz



Just Relax. Nothing is Under Control.



Updated: 2014-11-21T23:43:15Z

 



Kayos: Going Full Unix

2014-11-21T23:43:15Z

For Kayos Message Queue, we are going to use command line friendly command and response syntax. The initial versions of the Kayos server will just be a single instance using stdio. This makes it easy to play with locally and...

For Kayos Message Queue, we are going to use command line friendly command and response syntax. The initial versions of the Kayos server will just be a single instance using stdio. This makes it easy to play with locally and to write a fast test suite. Simple and text based.

The code is straight C. We are using blocking IO, both for commands and file IO, and will fork a process per client. We will rely on the file system cache to keep data in RAM. Each process will have small heaps, small stacks, and the compiled code will be small. This will give us very good concurrency as the L1 and L2 cache won't be thrashed so much. Simple and fast.

Right now Doug Coleman is doing all the heavy lifting. He's going to make public the repo later today. Simple design, simple code, simple protocols. We aren't just building a message queue here, but a community. The more successful a project the more likely all the code will be rewritten at some point. Keeping things dead simple at this early time is important to get us there.




CAP Should Be CLAP

2014-11-16T16:20:46Z

The CAP theorem, says you cannot have your data Consistent (all updates are temporally ordered to all actors), Available (transactions respond success or failure) and Partition tolerant (tolerant of a subset of machines are unreachable by some actors, yet still...

The CAP theorem, says you cannot have your data Consistent (all updates are temporally ordered to all actors), Available (transactions respond success or failure) and Partition tolerant (tolerant of a subset of machines are unreachable by some actors, yet still reachable by others).

A problem with CAP is that Availability is a confusing word here. In distributed systems it's really Latency we are interested in. Any resource that's down now might recover soon, soon enough to consider the operation a success if it just waits a little longer. And even available resources might be overloaded and take so long to answer as to be practically unavailable.

The confusion is Availability implies a binary choice, true or false, available or not. But degrees of availability are more easily conceptualized as latency. Latency can be measured and expressed as an single value, such as an average or an upper limit. Or a range of values, broken out into percentiles.

We can't just assume because a client can connect that a server is available. It's much more complicated than that. Modern systems are all about resource sharing. Multi-core, multi-tasking machines with multiple independent and isolated processes carrying about semi-autonomously, all using shared networks, NICs, DRAM, processors and drives. This resource sharing dramatically lowers the costs of building and extending systems: We don't need all the resources necessary to do everything all once, we just need enough to satisfy peak concurrent demands.

But the cost of all this resource sharing is we don't have guarantees about how long something will take, or that it will even be successful. Though each process might be well behaved, system as a whole is chaotic. Maybe an independent process is doing large bulk file copies and invoking lots of disk IO. Or temporarily needs a bunch of RAM to do an image rendering, causing your processes to page some of it working set out to disk. Or fail memory allocation. Or get killed by OOM.

And without timeout failures, distributed processes sharing resources can deadlock on each other infinitely, each holding resources the other needs to continue.

It's because of non-deterministic resource sharing we must be able to deal with timeout failures. The alternative is deadlocking or more likely "slowlocking", concurrent operations slow each down to unusable levels, much slower than executing the requests serially. Systems must be able to tolerate such latency failures, since we can't eliminate them without the expense of assigning dedicated resources to everything. Which is how hard realtime systems work. And they are very expensive.

Fortunately we can model any loss of availability as excessive latency and reuse the same error handling. No new error paths need to be created or modified. Restart each failure with an exponential backoff and the system is guaranteed to clear out it's backlog of work if the failures are a shared capacity problem.

This isn't a new notion, but looking at it this way, any availability requirements are re-expressible as latency requirements, and availability failures are simply timeouts. Requests don't fail, they timeout.

CAP should be renamed CLAP, for Consistency, Latency/Availability and Partition tolerance. It doesn't change how anything works, but it makes reasoning about distributed design a bit easier.




7 Habits of Highly Defective Testing

2014-11-11T10:02:51Z

Good testing of software requires effort and discipline. Ain't no one got time for that. 1. Tests should be difficult to run. Give developers good excuses for not running tests by requiring version specific dependencies that don't exist in the...

Good testing of software requires effort and discipline. Ain't no one got time for that.

1. Tests should be difficult to run.

Give developers good excuses for not running tests by requiring version specific dependencies that don't exist in the production code. Also the more manual steps to set up and tear down test environments, the lower the likelihood they'll get run.

2. Tests should take a long time and require lots of resources.

Tests that run slowly, ideally many hours or even days long, helps keep developers out the zone and easily distractible. To keep development expensive and lengthy, make sure each individual test instantiates fresh instances of everything, new servers, new clients, new datasets, ideally whole new installs. The more CPU, RAM and Disk and Network IO necessary, the greater the costs and non -roductivity.

3. Tests should have spurious failures

When tests fail because the tests themselves are buggy it reduces confidence in the tests, making it much easier to ignore real failures. Any efforts to clean up the tests should be met with suggestions time is better spent growing production code size and complexity.

4. Don't write a tests for bugs found in production.

When a bug is found in production code, don't waste time writing tests that trigger the bug, just fix the code and be done. This increases the likelihood of future regressions, and as a bonus hides the areas of code that are poorly designed and implemented.

5. Tests should rely on timeouts to indicate failure

Sleeps that wait long enough for production code to finish before checking the outputs will lengthen the time tests take to run AND cause hard to duplicate, spurious failures. This reduces confidence that a failing test found a bug, making real bugs easy to ignore.

6. Only test success conditions.

Writing tests for what happens when connections timeout, when allocations fail, or when processes are terminated or machines lose power is like wanting the code to fail. So only test what happens when everything goes right, it makes you seem like a team player.

7. Failures should be hard to debug

Throw away error codes, error strings, log messages, and anything else that could help debug a failure. Writing production code is hard. Debugging it should be hard too.




New Messaging and Queueing Project

2014-11-06T09:36:34Z

I'm on a message queue kick. The most recent thing I worked on at Couchbase was leading the design for the 3.0 Database Change Protocol (DCP), which shipped a few weeks ago. DCP's got some nice properties that allow it...

I'm on a message queue kick.

The most recent thing I worked on at Couchbase was leading the design for the 3.0 Database Change Protocol (DCP), which shipped a few weeks ago. DCP's got some nice properties that allow it to move a lot of data quickly and safely, It's used by Couchbase replication, indexing and a few other places. It's really cool to see how well it's working, I left just as the coding work was starting. I'm always a little surprised when things actually work, I spend of lot of time worrying about design flaws that might have been overlooked.

Couchbase DCP is basically an engine and protocol to quickly receive changes from the database, incrementally and asynchronously. Coincidentally, it's got a lot of the same traits of a message queue and similar design decisions as the Apache Kafka project. The more I thought about it, the more I realized the basics of messaging and queueing is really the foundation of most distributed systems.

So now I'm starting a new message queue project that's loosely based on Couchbase DCP technology (and using the Couchbase ForestDB durable storage engine code as a starting place). But this is being designed and optimized for messaging and queueing applications, so there are some key differences. This can be used like an enterprise message bus, for email and messaging, for stream processing and analytics, and even can apply to high scale, near realtime interactive applications like Uber.

A lot of the uses fall into the mission-critical infrastructure category, which an ever increasing amount of money being spent on. It's when you can find business model to support your technological interests that things get interesting. And I think things are about to get really interesting.




Pumpkin Combat

2014-10-31T02:40:08Z

(image)




Tank Man

2014-10-10T07:32:13Z

Years ago, when I was in Charlotte NC working on a new document database named CouchDb, it sounded (at the time) about as sexy as a bowl of diarrhea and I had bunch of code that didn't quite work...

width="560" height="315" src="//www.youtube.com/embed/YeFzeNAHEhU" frameborder="0" allowfullscreen>

Years ago, when I was in Charlotte NC working on a new document database named CouchDb, it sounded (at the time) about as sexy as a bowl of diarrhea and I had bunch of code that didn't quite work and our savings were shrinking every month. I was embarrassed to tell people what I was working on. And I made a decision to stop being a coward and commit to what I'm doing. I pasted a sign onto my monitor. "Commitment: Learn it, Live It" as a reminder to be all in on this. To not feel like a fool for pursuing a path when I was in over my head. Every day was a struggle to make progress. A battle to remain committed.

I used to go to the gym and pretend in my head I had already died, and it was 100 years into the future, and I was dreaming of some guy who wrote a database. I didn't know how the story ended yet, I hadn't gotten that far. It helped me to remember life is journey, not a destination. Stay on the path, the story is what's interesting. The success or failure might already be predetermined, so fight like a hero you respected, a hero who lost, who went down swinging. Live that story if you have to, but goddammit, fight.




Single Dad

2014-09-25T19:28:19Z

It's been a year since my I "separated" from my high school sweetheart, the love of my life. We'd been married for 16 years at the time, and struggling for the last 6. I left Couchbase mostly because I wanted... It's been a year since my I "separated" from my high school sweetheart, the love of my life. We'd been married for 16 years at the time, and struggling for the last 6. I left Couchbase mostly because I wanted to get away from the stress of startup life and find a way for us make things work. I cashed in some stock to take time off but things only got worse, and the pain of leaving Couchbase, who made up most of my friends, while also our marriage was filled with conflict, was too much. I think it's CouchDB and then Couchbase, and how far I took it, that was a big factor for our problems. I spent so much time on it, and was so isolated for so long I ended losing a big part of myself. It happened before the startup. I became detached, anxious and wasn't there for my wife as much as I should have been. And that hurt her. I was also sleep eating. I'd devour all kinds of high calorie food and not remember it. And that also freaked her out. But since I was good at shutting off my own feelings, I couldn't empathize with her and the hard times of raising 3 kids as the primary caregiver, and moving way too many times over the years. We both tried very hard, but in my case I didn't understand what she was going through. And I felt she didn't understand me either. Startups are hard. But we try to make it seem glamorous. It's part of the game. But it's hard and a lot of stress. And if it doesn't work out, you end up with nothing. And a bunch of people who believed in you lose their jobs. And in the struggle there comes a point when you realize it's worse for the kids to be married and if things get any worse, our problems will truly mess them up. That all the conflict and hurt is taking it's toll on the ones you are fighting so hard to give a good life to. The divorce was final a few months ago. The terms of the divorce pretty standard for 50/50 custody, and we were able to complete it amicably, no court battle. Overall it was a peaceful process, but at times it felt extremely painful, scary and conflict ridden. It still does. And sometimes the pain and anger comes back and I behave like a big jerk. But compared to most divorces I think we both did pretty good. We now live about 1/2 mile from each other in Alameda CA. It's a beautiful area and it's great for the kids. I can't say for sure I'm any happier, the transition has been, and continues to be hard. Being alone is hard. But I think I'm a better father. But still much room for improvement. My own upbringing I was bounced around a lot. I didn't get a lot of care and attention. And that contributed to low self esteem, which took years to finally understand. I was lucky I didn't end up in much worse shape. My ex wife was a big part of me not being a total wreck, she took good care of me over the years. Soothed me when I needed it. I have to learn to be able to function without that. I need to be there for my kids, I don't want them to deal with the crap I went through, some of which affects me to this day. Dating when you have 3 kids is scary. I want them to be around good people and have them see healthy relationships. And I don't want to bring crazy people into their lives. But I'm not sure how to know what's what. So I'm trying to be cautious and take it slower. The problem with marrying so young is I didn't experience all that crap most people do in their twenties, when it's easier to make mistakes. And now, after being married so young and for so long, I have to become a new person. A single person. A single dad. Also I have to grow up more. I have to be able handle anything that comes my way, and be able to do it alone if necessary. My kids actually make this easier. They are my sense of pu[...]



What a difference a few months make

2013-12-17T08:17:25Z

4 months off and I feel reborn.This time has meant everything to me and especially my kids. I miss Couchbase terribly, but I'm also glad to be done and start a new chapter in my career. The thing I miss...

4 months off and I feel reborn.This time has meant everything to me and especially my kids.

I miss Couchbase terribly, but I'm also glad to be done and start a new chapter in my career. The thing I miss most are the great people there, super bright hard working folks who amazed me on a daily basis. Which, ironically, was the thing that made it easy to leave. Seeing the different teams taking the ball and running with it without me leading the charge. Things at Couchbase grew and matured so fast I started to realize I couldn't keep up without spending way more time working. I was no longer the catalyst that moved things forward, I was becoming the bottleneck preventing engineers from maturing and leaders from rising.

Anyway, I'll miss my whole CouchDB and Couchbase odyssey immensely. I know it's a rare thing to have helped create and lead the things I did. I don't take it for granted. It was a hell of a ride.

And now what's next? Well, beginning in January 2014 I'll be starting at salesforce.com and working closely with Pat Helland on a project that eventually will underpin huge amounts of their site infrastructure, improving performance, reliability and predictability, while reducing production costs dramatically. It's quite ambitious and I don't know if I'm yet at liberty to talk about the project details and scope. But if we can pull it off we will change a lot more than Salesforce, in the same way the Dynamo work changed a lot more than Amazon's shopping cart. It's ridiculously cool. And we are hiring the best people possible to make it happen.

Here I go again. I'm a very, very lucky man.




Human, After All

2013-09-24T20:30:04Z

Whoa, I have a blog. Weird. As I take a break from all work (I left Couchbase about a month ago), one of things I'm trying to do is get the machine out of my head. Building a distributed database... Whoa, I have a blog. Weird. As I take a break from all work (I left Couchbase about a month ago), one of things I'm trying to do is get the machine out of my head. Building a distributed database is a peculiar thing. Building a startup that sells distributed databases is a very peculiar thing. It did something weird to my brain. I'm still not sure what happened. That moment in chess when I see the mistake, and it suddenly feels like the blood drains from my head. For me it's when the game is decided. Win or lose, it was a mistake to play at all. I didn't want to lose. I didn't want to win. I just wanted to play. To keep the game going. Somehow I developed social anxiety. Not a fear of people. A fear of causing fear in people. I lost my voice. Not my physical voice. But the one that says what it really thinks, is gregarious, is angry, is sad, wants to have fun, wants to complain. The one that cares not about the right answer. The one that just wants to interact, with no particular goal. I forgot how to be human. I didn't know that was possible. I didn't even notice it happened, I didn't know what I had lost until I started to get better. I saw this thing in my head, the machine. Automata. It was beautiful. The more I thought about it, the more clearly I could see it. I connected all the dots. It was so compelling. It was engineering. It was physics. It was metaphysics. I had to bring it into the real world. I couldn't stop thinking about it. It could be lost forever if I did. Most people create differently. They create a little, think a little, create a little, think a little. I like to work by thinking very hard until I can clearly see what should be built. Before I write code. Before I write specs. I want to see it, in my mind. I can't explain what I see. I suppose it's like describing color to a blind man. There is a hidden dimension. The people who can see it, who can move around in this unseen dimension are special to me. It's like when everyone puts their head down to pray, only you don't. You look around. And you see the other people who didn't put their head down. We broke the rules. But we broke nothing, we just see something others don't. Sacred doesn't exist. The only language I know for sure to describe it is code. When I can see it working in my head, I know it will work in the real world, in code. Then I move to bring it to the real world through code. But I took it too far. I thought too long. What I built in my head was too big for a human. Too big for this human anyway. I was compelled to keep the vision of the machine lit, for fear it would vanish before it made it into the real world. The machine started to take over my mind. No, that's not true. I pushed everything I could aside, squished it up to make room for the machine. Or maybe I fed it to the machine. Or maybe I threw it overboard. It never occurred to me I might be giving up something I needed, that others needed from me, that I wanted to give to them, to myself. Or maybe I didn't care. I wanted to bring the machine to life. I knew if I could bring it to life, it would change the world. Isn't that worth fighting for? Fear is a powerful motivator. It's also the mind killer. I was afraid of losing the battle. Creating technology is play. Creating a startup is a fight. But I didn't notice I was losing the war. Everything was riding on this. I no longer played with a posture of I couldn't lose. Now I must win. Then something happened, and I saw a glimmer of what I once was. I realized I was no longer playing a game of creation, but waging a war of attrition. And my humanity was the resource. I was grinding myself away.[...]



Dynamo Sure Works Hard

2013-05-06T19:11:49Z

We tend to think of working hard as a good thing. We value a strong work ethic and determination is the face of adversity. But if you are working harder than you should to get the same results, then it's... We tend to think of working hard as a good thing. We value a strong work ethic and determination is the face of adversity. But if you are working harder than you should to get the same results, then it's not a virtue, it's a waste of time and energy. If it's your business systems that are working harder than they should, it's a waste of your IT budget. Dynamo based systems work too hard. SimpleDB/DynamoDB, Riak, Cassandra and Voldemort are all based, at least in part, on the design first described publicly in the Amazon Dynamo Paper. It has some very interesting concepts, but ultimately fails to provide a good balance of reliability, performance and cost. It's pretty neat in that each transaction allows you dial in the levels of redundancy and consistency to trade off performance and efficiency. It can be pretty fast and efficient if you don't need any consistency, but ultimately the more consistency you want the more have to pay for it via a lot of extra work. Network Partitions are Rare, Server Failures are Not ... it is well known that when dealing with the possibility of network failures, strong consistency and high data availability cannot be achieved simultaneously. As such systems and applications need to be aware which properties can be achieved under which conditions. For systems prone to server and network failures, availability can be increased by using optimistic replication techniques, where changes are allowed to propagate to replicas in the background, and concurrent, disconnected work is tolerated. The challenge with this approach is that it can lead to conflicting changes which must be detected and resolved. This process of conflict resolution introduces two problems: when to resolve them and who resolves them. Dynamo is designed to be an eventually consistent data store; that is all updates reach all replicas eventually. - Amazon Dynamo Paper The Dynamo system is a design that treats the probability of a network switch failure as having the same probability of machine failure, and pays the cost with every single read. This is madness. Expensive madness. Within a datacenter, the Mean Time To Failure (MTTF) for a network switch is one to two orders of magnitude higher than servers, depending on the quality of the switch. This is according to data from Google about datacenter server failures, and the publish numbers of the MTBF of Cisco switches (There is a subtle difference between MTBF and MTTF, but for our purposes we can treat them the same) It is claimed that when W + R > N you can get consistency. But it's not true, because without distributed ACID transactions, it's never possible to achieve W > 1 atomically. Consider W=3, R=1 and N=3. If a network failure or more likely a client/app tier failure (hardware, OS or process crash) happens during the writing of data, it's possible for only replica A to receive the write, with a lag until the cluster notices and syncs up. Then another client with R = 1 can do two consecutive reads, getting newer data first from a node A, and older data next from node B for the same key. But you don't even need a failure or crash, once the first write occurs there is always a lag for the next server(s) to receive the write. It's possible for a fast client to do the same read 2 times again, getting a newer version from one server, then an older version from another. What is true is that if R > N / 2, then you get consistency where it's not possible to read in a newer value, then a subsequent read get's an older value. For the vast majority of applications, it's okay for a failure leading to temporary unavail[...]



Development Methodologies?

2013-01-18T20:01:56Z

Hi Damien, ... If I were to list projects as small, medium, and large or small to enterprise, what methodologies work across them? My thoughts are Agile works well, but eventually you'll hit a wall of complexity, which will make...
Hi Damien,

...

If I were to list projects as small, medium, and large or small to enterprise, what methodologies work across them? My thoughts are Agile works well, but eventually you'll hit a wall of complexity, which will make you wonder why you didn't see it many, many iterations ago. I don't know anyone at NASA or Space-X or DoD so I don't know what software methodology they use? Given your experience can you shed some light on it?

Regards,

Douglas

I don't really use a specific methodology, however I find it very useful to understand the most popular methodologies and when they are useful. Then it's helpful when you are at various stages of projects and know what kinds of approaches are helpful, and how you can apply them to your situation.

For example, I find Test Driven Design (TDD) very much overkill, but for a mature codebase I find lots of testing invaluable. Early in a codebase I find lots of tests very restrictive, I value the ability to quickly change a lot of code without also having to change a larger amount of tests. Early on, when I'm creating the overall architecture that everything else will hang on, and the code is small and design is plastic and I can keep it all in my head, I value being able to move very quickly. However, other developers may find TDD very valuable to think through the design and problems. I don't work like that. To each his own.

Blindly applying methodologies or even "best practices" is bad. For the inexperienced it's better than nothing, but it's not as good as knowledge of self and team, experience with a variety of projects and their stages, and good old-fashioned pragmatism.




Follow up to "The Unreasonable Effectiveness of C"

2013-01-18T19:27:43Z

My post The Unreasonable Effectiveness of C generated a ton discussion on Reddit and Hacker News, nearly 1200 comments combined as people got in to all sorts of heated arguments. I also got a bunch of private correspondence about it.... My post The Unreasonable Effectiveness of C generated a ton discussion on Reddit and Hacker News, nearly 1200 comments combined as people got in to all sorts of heated arguments. I also got a bunch of private correspondence about it. So I'm going to answer some of the most common questions, feedback and misunderstandings it's gotten. Is C the best language for everything? Hell no! Higher level languages, like Python and Ruby, are extremely useful and should definitely be used where appropriate. Java has a lot of advantages, C++ does too. Erlang is amazing. Most every popular language has uses where it's a better choice. But when both raw performance and reliability are critical, C is very very hard to beat. At Couchbase we need industrial grade reliability without compromising performance. I love me some Erlang. It's very reliable and predictable, and the whole design of the language is about robustness, even in the face of hardware failures. Just because we experienced a crash problem in the core of Erlang shouldn't tarnish its otherwise excellent track record. However it's not fast enough for our and our customers needs. This is key, the hard work to make our code as efficient and fast as possible in C now benefits our many thousands of Couchbase server deployments all over the world, saving a ton of money and resources. It's an investment that is payed back many, many times. But for most projects the extra engineering cost isn't worth it. if you are building something that's only used by your organization, or small # of customers, your money is likely better spent on faster/more hardware than very expensive engineers coding, testing and debugging C code. There is a good chance you don't have the same economies of scale we do at Couchbase where the costs are spread over high # of customers. Don't just blindly use C, understand its own tradeoffs and if it makes sense in your situation. Erlang is quite good for us, but to stay competitive we need to move on to something faster and industrial grade for our performance oriented code. And Erlang itself is written in C. If a big problem was C code in Erlang, why would using more C be good? Because it's easier to debug when you don't lose context between the "application" layer and the lower level code. The big problem we've seen is when C code is getting called from higher level code in the same process, we lose all the debugging context between the higher level code and the underlying C code. So when we were getting these crashes, we didn't have the expertise and tooling to figure out what exactly the Erlang code was doing at the moment it crashed. Erlang is highly concurrent and many different things were all being executed at the same time. We knew it had something to do with the async IO settings we were using in the VM and the opening and closing of files, but exactly what or why still eluded us. Also, we couldn't manifest the crash with test code, though we tried, making it hard to report the issue to Erlang maintainers. We had to run the full Couchbase stack with heavy load in order to trigger the crash, and it would often take 6 or more hours before we saw it. This made debugging problematic as we had confounding factors of our own in-process C code that also could have been the source of the crashes. In the end, we found through code inspection the problem was Erlang's disk based sorting code, the compression options it was using, and the interaction with how Erlang closes files. When Erlang closed files with the com[...]



The Unreasonable Effectiveness of C

2013-01-10T14:42:34Z

For years I've tried my damnedest to get away from C. Too simple, too many details to manage, too old and crufty, too low level. I've had intense and torrid love affairs with Java, C++, and Erlang. I've built things... For years I've tried my damnedest to get away from C. Too simple, too many details to manage, too old and crufty, too low level. I've had intense and torrid love affairs with Java, C++, and Erlang. I've built things I'm proud of with all of them, and yet each has broken my heart. They've made promises they couldn't keep, created cultures that focus on the wrong things, and made devastating tradeoffs that eventually make you suffer painfully. And I keep crawling back to C. C is the total package. It is the only language that's highly productive, extremely fast, has great tooling everywhere, a large community, a highly professional culture, and is truly honest about its tradeoffs. Other languages can get you to a working state faster, but in the long run, when performance and reliability are important, C will save you time and headaches. I'm painfully learning that lesson once again. Simple and Expressive C is a fantastic high level language. I'll repeat that. C is a fantastic high level language. It's not as high level as Java or C#, and certainly no where near as high level as Erlang, Python, or Javascript. But it's as high level as C++, and far far simpler. Sure C++ offers more abstraction, but it doesn't present a high level of abstraction away from C. With C++ you still have to know everything you knew in C, plus a bunch of other ridiculous shit. "When someone says: 'I want a programming language in which I need only say what I wish done', give him a lollipop." - Alan J. Perlis That we have a hard time thinking of lower level languages we'd use instead of C isn't because C is low level. It's because C is so damn successful as an abstraction over the underlying machine and making that high level, it's made most low level languages irrelevant. C is that good at what it does. The syntax and semantics of C is amazingly powerful and expressive. It makes it easy to reason about high level algorithms and low level hardware at the same time. Its semantics are so simple and the syntax so powerful it lowers the cognitive load substantially, letting the programmer focus on what's important. It's blown everything else away to the point it's moved the bar and redefined what we think of as a low level language. That's damn impressive. Simpler Code, Simpler Types C is a weak, statically typed language and its type system is quite simple. Unlike C++ or Java, you don't have classes where you define all sorts of new runtime behaviors of types. You are pretty much limited to structs and unions and all callers must be very explicit about how they use the types, callers get very little for free. "You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong What sounds like a weakness ends up being a virtue: the "surface area" of C APIs tend to be simple and small. Instead of massive frameworks, there is a strong tendency and culture to create small libraries that are lightweight abstractions over simple types. Contrast this to OO languages where codebases tend to evolve massive interdependent interfaces of complex types, where the arguments and return types are more complex types and the complexity is fractal, each type is a class defined in terms of methods with arguments and return types or more complex return types. It's not that OO type systems force fractal complexity to happen, but they encourage it, they make it easier to do the wrong thing. C doesn't make it impossible, but it makes it harder. C tends to breed simpler, sh[...]



How to achieve lots of code?

2012-10-29T03:51:19Z

I get mail. hello damien I read about you from a book on erlang. Your couchdb application is really a rave. please can you help me out ,i've got questions only a working programmer can answer. i'm shooting now: i've...

I get mail.

hello damien

I read about you from a book on erlang.

Your couchdb application is really a rave.

please can you help me out ,i've got questions only a working programmer can answer.

i'm shooting now:

i've been programming in java for over 3 years

i know all about the syntax and so on but recently i ran a code counter on my apps and
the code sizes were dismal. 2-3k

commercial popular apps have code sizes in the 100 of thousands.

so tell me- for you and what you know of other developers how long does it take to write those large applications ( i.e over 30k lines of code)

what does it take to write large applications - i.e move from the small code size to really large code sizes?

thank you.

Never try to make your project big. Functionality is an asset, code is a liability. What does that mean? I love this Bill Gates quote:

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

More code than necessary will bloat your app binaries, causing larger downloads and more disk space, use more memory, and slow down execution with more frequent cache misses. It can make it harder to understand, harder to debug, and will typically have more flaws.

CouchDB, when we hit 1.0, was less than 20k lines of production code, not including dependencies. This included a storage engine (crash tolerant, highly concurrent MVCC with pauseless compactor), language agnostic map/reduce materialized indexing engine (also crash tolerant highly concurrent MVCC with pauseless compactor), master/master replication with conflict management, HTTP API with security model, and simple JS application server.

The small size is partly because it was written in Erlang, which generally requires 1/5 or less code of the equivalent in C or C++, and also because the original codebase was mostly written by one person (me), giving the design a level of coherency and simplicity that is harder to accomplish -- but still very possible -- in teams.

Test are different. Lines of code are more of an asset in tests. More tests (generally) means more reliable production code, helps document code functionality that can't get out of sync the way comments and design docs can (which is worse than no documentation) and doesn't slow down or bloat the compiled output. There are caveats to this, but generally more code in tests is a good thing.

Also you can go overboard with trying to make code short (CouchDB has some WTFs from terseness that are my fault). But generally you should try to make code compact and modular, with clear variable and function names. Early code should be verbose enough to be understandable by those who will work on it, and no more. You should never strive for lots of code, instead you want reliable, understandable, easily modifiable code. Sometimes that requires a lot of code. And sometimes -- often for performance reasons -- the code must be hard to understand to accomplish the project goals.

But often with careful thought and planning, you can make simple, elegant, efficient, high quality code that is easy to understand. That should be your goal.




CouchConf SF

2012-08-30T22:49:23Z

CouchConf SF is coming. This is our premier Couchbase event. We're going ham. Come hear speakers from established enterprises and how they are betting their business on Couchbase. Hang out and talk with speakers, me and other Couchbase engineers in...

CouchConf SF is coming.

This is our premier Couchbase event. We're going ham.

Come hear speakers from established enterprises and how they are betting their business on Couchbase.

Hang out and talk with speakers, me and other Couchbase engineers in the Couchbase lounge.

I'll be talking at the closing session. Let me know what you'd like to hear about!

Killer after-party. Witness my drunken antics ;)

Some highlights:


  • Three tracks and nearly 30 technical sessions for dev and ops

  • 15 customer speakers from companies like:

    • McGraw Hill - who will be sharing their experiences and demoing their Couchbase Server 2.0 app - including full-text search integration among other features

    • Orbitz who will be talking about how they replaced Oracle Coherence with Couchbase NoSQL software

    • Sabre - discussing how they are using NoSQL to reduce mainframe costs

    • Tencent will be sharing their evaluation process (and results) for choosing a NoSQL solution

    • Other speakers include Linked In, Tapjoy, TheLadders, and more


  • There are also training sessions for developers and admins the two days prior to CouchConf for those who want to also get more hands-on experience.

When you register, you can get the early bird rate if you use the promotional code Damien.

Register here: http://www.couchbase.com/couchconf-san-francisco