Subscribe: John Arbash Meinel's Bazaar Blog
Added By: Feedage Forager Feedage Grade A rated
Language: English
>>>  bazaar  branch  branches  bzr  code  feature  file  make  merge  new  objects  project  release  series  week  work 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: John Arbash Meinel's Bazaar Blog

John Arbash Meinel's Bazaar Blog

Posts about the development of Bazaar, a distributed version control system, meant to be something developers like to use, rather than something that gets in the way.

Updated: 2018-04-10T06:05:07.747-06:00


Step-by-step Meliae


Some people asked me to provide a step-by-step guide to how to debug memory using Meliae. I just ran into another unknown situation, so I figured I'd post a step-by-step along with rationale for why I'm doing it.First is loading up the data. This was a dump while running 'bzr pack' of a large repository.>>> from meliae import loader>>> om = loader.load('big.dump')>>> om.remove_expensive_references()The last step is done because otherwise instances keep a reference to their class, and classes reference their base types, and you end up getting to 'object', and somewhere along the way you end up referencing too much. I don't do it automatically, because it does remove actual references, which someone might want to keep.Then, do a big summary, just to get started >>> om.summarize()Total 8364538 objects, 286 types, Total size = 440.4MiB (461765737 bytes)Index Count % Size % Cum Max Kind0 2193778 26 181553569 39 39 4194281 str1 12519 0 97231956 21 6012583052 dict2 1599439 19 68293428 14 75 304 tuple3 3459765 41 62169616 13 88 20 bzrlib._static_tuple_c.StaticTuple4 82 0 29372712 6 94 8388724 set5 1052573 12 12630876 2 97 12 int6 1644 0 4693700 1 98 2351848 list7 4038 0 2245128 0 99 556 _LazyGroupCompressFactoryYou can see thatThere are 8M objects, and about 440MB of reachable memory.The vast bulk of that is in strings, but there are also some oddities, like that 12.5MB dictionaryAt this point, I wanted to understand what was up with that big dictionary.>>> dicts = om.get_all('dict')>>> dicts[0]dict(417338688 12583052B 1045240refs 2par)om.get_all() gives you a list of all objects matching the given type string. It also sorts the returned list, so that the biggest items are at thebeginning.Now lets look around a bit, to try to figure out where this dict lives>>> bigd = dicts[0]>>> from pprint import pprint as ppWe'll use pprint a lot, so map it to something easy to type.>>> pp(bigd.p)[frame(39600120 464B 23refs 1par '_get_remaining_record_stream'),_BatchingBlockFetcher(180042960 556B 17refs 3par)]So this dict is contained in a frame, but also an attribute of _BatchingBlockFetcher. Let's try to see which attribute it is.>>> pp(bigd.p[1].refs_as_dict()){'batch_memos': dict(584888016 140B 4refs 1par),'gcvf': GroupCompressVersionedFiles(571002736 556B 13refs 9par),'keys': list(186984208 16968B 4038refs 2par),'last_read_memo': tuple(536280880 40B 3refs 1par),'locations': dict(417338688 12583052B 1045240refs 2par),'manager': _LazyGroupContentManager(584077552 172B 7refs 3716par),'memos_to_get': list(186983248 52B 1refs 2par),'total_bytes': 774119}It takes a bit to look through that, but you can see:'locations': dict(417338688 12583052B 1045240refs 2par)Note that 1045240refs means there are 522k key:value pairs in this dict.How much total memory is this dict referencing?>>> om.summarize(bigd)Total 4035636 objects, 22 types, Total size = 136.8MiB (143461221 bytes)Index Count % Size % Cum Max Kind0 1567864 38 66895512 46 46 52 tuple1 285704 7 24972909 17 64 226 str2 1142424 28 20757800 14 78 20 bzrlib._static_tuple_c.StaticTuple...8 2 0 1832 0 99 1684 FIFOCache9 35 0 1120 0 99 32 _InternalNodeSo about 136MB out of 440MB is reachable from this dict. However, I'm noticing that FIFOCache and _InternalNode is also reachable, and those don't really seem to fit. I also notice that there are 1.6M tuples here, which is often a no-no. (If we are going to have that many tuples, we probably want them to be StaticTuple() because they use a fair amount less memory, can be interned, and aren't in the garbage collector. So lets poke around a little bit>>> bigd[0]bzrlib._static_tuple_c.StaticTuple(408433296 20B 2refs 9par)>>> bigd[1]tuple(618390272 44B 4refs 1par)>>> pp(bigd[0].c)[str(40127328 80B 473par 'svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:trunk:126948'),str(247098672 85B 37par '14@138bc75d-0d04-0410-961f-82ee72b054a4:trunk%2Fgcc%2Finput.h')]>>> pp(bigd[1].c)[tuple(618383880 36B [...]

Meliae 0.3.0, statistics on subsets


Ah, yet another release. Hopefully with genuinely useful functionality.In the process of inspecting yet another unexpected memory consumption, I came across a potential solution to the reference cycles problem.Specifically, the issue is that often (at least in our codebases) you have coupled classes, that end up in a cycle, and you have trouble determining who "owns" what memory. In our case, the objects tend to be only 'loosely' coupled. In that one class passes off reference to a bound method to another object. However, a bound method holds a reference to the original object, so you get a cycle. (For example Repository passes its 'is_locked()' function down to the VersionedFiles so that they know whether it is safe to cache information. Repository "owns" the VersionedFiles, but they end up holding a reference back.)What turned out to be useful was just adding an exclusion list to most operations. This ends up letting you find out about stuff that is referenced by object1, but is not referenced inside a specific subset.One of the more interesting apis is the existing ObjManager.summarize().So you can now do stuff like:>>> om = loader.load('my.dump')>>> om.summarize()>>> om.summarize()Total 5078730 objects, 290 types, Total size = 367.4MiB (385233882 bytes)Index Count % Size % Cum Max Kind 0 2375950 46 224148214 58 58 4194313 str 1 63209 1 77855404 20 78 3145868 dict 2 1647097 32 29645488 7 86 20 bzrlib._static_tuple_c.StaticTuple 3 374259 7 14852532 3 89 304 tuple 4 138464 2 12387988 3 93 536 unicode ...You can see that there is a lot of strings and dicts referenced here, but who owns them. Tracking into the references and using om.compute_total_size() just seems to get a lot of objects that reference everything. For example:>>> dirstate = om.get_all('DirState')[0]>>> om.summarize(dirstate)Total 5025919 objects, 242 types, Total size = 362.0MiB (379541089 bytes)Index Count % Size % Cum Max Kind 0 2355265 46 223321197 58 58 4194313 str...Now that did filter out a couple of objects, but when you track the graph, it turns out that DirState refers back to its WorkingTree, and WT has a Branch, which has the Repository, which has all the actual content. So what is actually referred to by just DirState? >>> from pprint import pprint as pp>>> pp(dirstate.refs_as_dict()){'_bisect_page_size': 4096,...'_sha1_file': instancemethod(34050336 40B 3refs 1par),'_sha1_provider': ContentFilterAwareSHA1Provider(41157008 172B 3refs 2par),...'crc_expected': -1471338016}>>> pp(om[41157008].c)[str(30677664 28B 265par 'tree'),WorkingTree6(41157168 556B 35refs 7par),type(39222976 452B 4refs 4par 'ContentFilterAwareSHA1Provider')]>>> wt = om[41157168]>>> om.summarize(dirstate, excluding=[wt.address])Total 5025896 objects, 238 types, Total size = 362.0MiB (379539040 bytes)Oops, I forgot an important step. Instances refer back to their type, and new-style classes keep an MRU reference all the way back to object which ends up referring to the whole dataset. >>> om.remove_expensive_references()removed 1906 expensive refs from 5078730 objsNote that it doesn't take many references (just 2k out of 5M objects) to cause these problems.>>> om.summarize(dirstate, excluding=[wt.address])Total 699709 objects, 19 types, Total size = 42.2MiB (44239684 bytes)Index Count % Size % Cum Max Kind 0 285690 40 20997620 47 47 226 str 1 212977 30 8781420 19 67 48 tuple 2 69640 9 8078240 18 85 116 set...And there you see that we have only 42MB that is directly referenced from DirState. (still more than I would like, but at least it is useful data, rather than just saying it references all objects).I'm not 100% satisfied with the interface. Right now it takes an iterable of integer addresses. Which is often good because those integers are small and shared, so the only cost is the actual list. Taking objects requires creating the python proxy objects, which is something I'm avoiding because it actua[...]

Meliae 0.2.1


Meliae 0.2.1 is now officially released.

The list of changes isn't great, it is mostly a bugfix release. There are a couple of quality-of-life changes.

For example you used to need to do:
>>> om = loader.load(filename)
>>> om.compute_parents()
>>> om.collapse_instance_dicts()

However, that is now done as part of loader.load(). This also goes along with a small bug fix to scanner.dump_all_objects() that makes sure to avoid dumping the gc.get_objects() list, since that is artifact from scanning, and not actually something you care about in the dump file.

Many thanks to Canonical for bringing me to Prague for the Launchpad Epic, giving me some time to work on stuff that isn't just Bazaar.

Meliae 0.2.0


And here we are, with a new release of Meliae 0.2.0. This is a fairly major reworking of the internals, though it should be mostly compatible with 0.1.2. (The disk format did not change, most of the apis have deprecated thunks to help you migrate.)The main difference is how data is stored in memory. Instead of using a Python dict + python objects, I know use a custom data collection. Python's generic objects are great for getting stuff going, but I was able to cut memory consumption in half with a custom object. This means that finally, analyzing a 600MB dump takes less than 600MB of memory (currently takes about ~300MB). Of course that also depends on your data structures (600MB dump that is one 500MB string will take up very little memory for analysis.)The second biggest feature is hopefully a cleaner interface.Call references 'parents' or 'children'. Indicating objects which point to me, and objects which I point to, respectively. 'ref_list' and 'referrers' was confusing. Both start with 'ref', so it takes a bit to sort them out.Add attributes to get direct access to parents and children, rather than having to go back through the ObjManager.Change the formatting strings to be more compact. No longer show the refs by default, since you can get to the objects anyway.A third minor improvement is support for collapsing old-style classes (ones that don't inherit from 'object'.)So how about an example. To start with, you need a way to interrupt your running process and get a dump of memory. I can't really give you much help, but you'll end up wanting:from meliae import scannerscanner.dump_all_objects('test-file.dump')(This is the simplest method. There are others that take less memory while dumping, if overhead is a concern.)Once you have that dump file, start up another python process and let's analyze it.$ python>>> from meliae import loader>>> om = loader.load('test-file.dump')loaded line 3579013, 3579014 objs, 377.4 / 377.4 MiB read in 79.6sI recommend just always running these lines. If you used a different method of dumping, there are other things to do, which is why it isn't automatic (yet).>>> om.compute_parents(); om.collapse_instance_dicts()set parents 3579013 / 3579014checked 3579013 / 3579014 collapsed 383480set parents 3195533 / 3195534Now we can look at the data, and get a feel for where our memory has gone:>>> s = om.summarize(); sTotal 3195534 objects, 418 types, Total size = 496.8MiB (520926557 bytes)Index Count % Size % Cum Max Kind 0 189886 5 211153232 40 40 1112 Thread 1 199117 6 72510520 13 5412583192 dict 2 189892 5 65322848 12 66 344 _Condition 3 380809 11 30464720 5 72 80 instancemethod 4 397892 12 28673968 5 78 2080 tuple 5 380694 11 27409968 5 83 72 builtin_function_or_method 6 446606 13 26100905 5 88 14799 str 7 189886 5 21267232 4 92 112 _socketobject 8 197255 6 14568080 2 95 14688 list...At this point, you can see that there are 190k instances of Thread, which is consuming 40% of all memory. There is also a very large 12.5MB dict. (It turns out that this dict holds all of those Thread objects.)But how do we determine that. One thing we can do is just get a handle to all of those Thread instances>>> threads = om.get_all('Thread')>>> threads[0]Thread(32874448 1112B 23refs 3par)So this thread is at address 32874448 (not particularly relevant), consumes 1112 bytes of memory (including its dict, since we collapsed threads), references 23 python objects, and is referenced by 3 python objects.Lets see those references>>> threads[0].c # shortcut for 'children'[str(11409312 54B 189887par '_Thread__block'), _Condition(32903248 344B 11refs 1par), str(11408976 53B 189887par '_Thread__name'), str(32862080 77B 1par 'PoolThread-twisted.internet.reactor-1'), str(1...It looks like there might be something interesting there, but it is a bit hard to sort out. Step one is to try using python's pprint utility.>>>[...]

Memory Debugging with Meliae


Background of Meliae 0.1.0Earlier this year I started working on a new memory debugging program for python. I had originally tried to use heapy, but at the time it didn't support Windows, Mac, or 64-bit environments. (Which turned out to be all of my interesting platforms.) The other major problem is that I'm often debugging memory consumption of up to a GB of active data. While I think some of the former issues have been fixed, the latter is still a major issue for me.So with the help of Michael Hudson, I started putting together a new structure. The code would be split into a scanner and a processor (loader). Such that you can interrupt a running process, dump the memory consumption to disk, and then analyze it in a separate process. (Often after the former has stopped.) The scanner can have a minimal memory profile, so even if your system is already swapping, you can dump out the memory info. (Robert Collins successfully dumped a 6GB memory profile, though analyzing that beast is still an issue.) The other advantage of this system, is that I don't have to play tricks with objects that represent the current state, like Guppy does with all sorts of crazy decorators.In recent months, I've also focused on improving Bazaar's memory profile, which also meant improving memory profiling. Enough that I felt it was worth releasing the code. So officially Meliae 0.1.0 has been released. (For those wondering about the name, it is from Ash-Wood Nymph in Greek Mythology, aka it is just a fun name.)Doing real workSo how does one actually use the program. Bazaar has a very nice ability, that you can use SIGQUIT (Ctrl+|) or SIGBREAK (Ctrl+Pause/Break) to drop into a debugger in the middle of a process to figure out what is going on. At that point, you can just:from meliae import scannerscanner.dump_all_objects('filename.json')(There is an alternative scanner.dump_gc_objects() which has even lower memory profile, but will dump some objects more than once, creating a larger dump file.)This creates a file describing all of the Python objects it was able to find along with their known size, references, and for some objects (strings, ints) their content. From there, you start another shell, and use:>>> from meliae import loader>>> om = loader.load('filename.json')>>> s = om.summarize(); sThis dumps out something like:Total 17916 objects, 96 types, Total size = 1.5MiB (1539583 bytes)Index Count % Size % Cum Max Kind 0 701 3 546460 35 35 49292 dict 1 7138 39 414639 26 62 4858 str 2 208 1 94016 6 68 452 type 3 1371 7 93228 6 74 68 code 4 1431 7 85860 5 80 60 function 5 1448 8 59808 3 84 280 tuple 6 552 3 40760 2 86 684 list 7 56 0 29152 1 88 596 StgDict 8 2167 12 26004 1 90 12 int 9 619 3 24760 1 91 40 wrapper_descriptor 10 570 3 20520 1 93 36 builtin_function_or_method ...Showing the top objects and what data they consume. This can often be revealing it itself. Do you have millions of tuples? One giant dict that is consuming a surprising amount of memory? (A dict with 200k entries is ~6MB on a 32-bit platform.)There is more that can be done. You can run:om.compute_referrers()At this point, you can look at a single node, and find out what was referencing it. (So what was referencing that largest dict?)>>> om[s.summaries[0].max_address]MemObject(29351984, dict, 49292 bytes, 1578 refs [...], 1 referrers [26683840])>>> om[26683840]MemObject(29337264, function, format_string, 60 bytes, 6 refs...)However, it also turns out that all 'classic' classes in Python indirect to their data via self.__dict__, which is a bit annoying to walk through. It also makes it looks like 'dict' is the #1 memory consumer, when actually it might be instances of Foo, which happen to use dicts. So you can useom.collapse_instance_dicts()Which will find[...]

The Joys of multiple releases


I had originally written a longer post over at wordpress, only to have Firefox crash while trying to move an image, and WP doesn't do auto-saving like blogger. So now I'm back...Bazaar 2.0.1 and 2.1.0b1 have now 'gone gold' in that I've uploaded the official tarballs, and asked people to make installers for them. Once installers are made, then we'll make the official announcement.For those who haven't been following, Bazaar has now split its releases into 2 series. The 2.0.x series is based on 2.0.0 and has only bugfixes. Things that could cause compatibility problems (new features, removal of deprecated code, etc.) is only done in the 2.1.0.x series. We're hoping that this can give people some flexibility, as well as giving us more flexibility. In the past, we've suffered a bit trying to maintain backwards compatibility for some features/bugfixes, only to break compatibility for a big feature. Instead of suffering the worst of both, we're trying to get the best of both. If something needs to break compatibility, it just goes in the dev branch. Note that the development branch is still considered 'stable', in that the test suite always passes, and the code is pretty much always ready for a release. We just don't make the same guarantees about stable internal apis for 3rd parties to use.The other change to the process is to stop doing as many "release candidate" builds. Instead, we will just cut a release. If there are problems, we'll cut the next release sooner. The chance for regressions in the 'bugfix-only' 2.0.x series should be low, and getting away from pre-builds means less overhead. We will still be doing releases we call 'rc1' before the next major stable release (2.1.0), and in that vein we expect to do little-to-no changes from the rc1 to the final build.However, this new system does increase overhead for a single release. As now it is equivalent to doing the rc and the final in the same day. Also, because we now have 2 "integration" branches, it requires a bit more coordination between them.For example, this is the revision graph for the recent 2.0.1 and 2.1.0b1 releaseThe basic workflow that I used was something likeHave a LOSA create 2 release branches lp:~bzr-pqm/bzr/2.0.1 and lp:~bzr-pqm/bzr/2.1.0b1Create a local branch of eachCreate another branch for doing my updates in, such as lp:~jameinel/bzr/2.0.1Update 2.0.1 with a new version stringUpdate NEWS to clean it up, show that there is an official release, and provide a summary/overview of the changes.Land this update into the official 2.0.1 branch via PQM. (Unfortunately this can take up to 2 hours depending on a bunch of different factors. We are trying to get this down to more like 10 min.)Update my local copy from the final release. Tag it (bzr-2.0.1).Create the tarballCreate the release launchpadUpload the tarball to the releaseWhile this is going on, go through the bugtracker and make sure that things mentioned in NEWS have the appropriate "Fix Released" state in the bug tracker, as well as being associated with the right milestones. With 34 bugfixes, this is a non-trivial undertaking.Merge the 2.0.1 final release into the 2.1.0b1 branch. (All bugfixes in the stable series are candidates for merging at any time into the development series.)Do lots of cleanup in NEWS. The main difficulty here is that bugfixes are present on 2 integration branches simultaneously, and those releases are slightly independent. We've talked about having the bugfix mentioned in both sections. Which would be more important if we ever make a development release without doing the corresponding stable release.Do steps 4-10 again for 2.1.0b1.While working or waiting on that, prepare lp:~bzr-pqm/bzr/2.0 since it is now going to be prepped for 2.0.2. This involves, bumping the version number, updating NEWS with blank entries for the next release (avoids some conflicts for people landing changes in that branch), and submitting all of that back to [...]

Refactoring work for review (and keep your annotations)


Tim Penhey recently had a nice post about how he split up his changes to make it easier to review. His method used 'bzr pipeline' and some combinations of shelving, merging, and reverting the merges.However, while I wanted to refactor my changes to make it easier to review, I didn't want to lose my annotation history. So I took a different approach.To start, with, I'll assume you have a single branch with lots of changes, each wrt different features. You developed them 'concurrently' (jumping back and forth between features, without actually committing it to a different branch). And now that you are done, you want to split them out again.There are a lot of possible ways that you can do this. With some proponents prefering a 'rebase' style. Where you replay the commits you made in a new order, possibly squashing them, etc. I'm personally not a big fan of that.Tim's is another method, where you just cherrypick the changes into new branches, and use something like bzr-pipeline to manage the layering. However in reading his workflow, he would also lose the history of the individual changes.So this is my workflow.Start with a branch that has a whole lot of changes on it, and is essentially 'done'. We'll call this branch "dogpile".Create a new branch from it (bzr branch --switch ../dogpile ../feature1), and remove all of the changes but the 'first step'. I personally did that with "bzr revert -r submit: file1 file2 file3" but left "file4" alone."bzr commit" in that branch. The delta for that revision will show a lot of your hard-worked on changes being removed. However "bzr diff -r submit:" should show a very nice clean patch that only includes the changes for "feature1".Go back to the original dogpile branch, and create a new "feature2" branch. (bzr branch --switch ../dogpile ../feature2)Now merge the "feature1" branch (bzr merge ../feature1). At this point, it looks like everything has been removed except for the bits for feature1. However, just using "bzr revert file2..." we can restore the changes for "feature2".You can track your progress in a few ways. "bzr diff -r submit:" will show you the combine differences from feature1 and feature2. "bzr diff -r -1:../feature1" will show you just the differences between the current feature2 branch and the feature1 branch. The latter is what you want to be cleaning up, so that it includes all of your feature2 changes, built on top of your feature1 changes. You also have the opportunity to tweak the code a bit, and run the test suite to make sure things are working correctly."bzr commit" in that branch. At this point, the diff from upstream to feature1 should be clean, and the diff from feature1 => feature2 should be clean. As an added benefit, doing "bzr annotate file2" will preserve all the hard-won history of the file.repeat steps 4-7 for all the other features you wanted to split out into their own branches.When you are done, you will have N feature branches, split up from the original "dogpile" branch. By using the "merge + revert things back into existence" trick, you can preserve all of the annotations for your files. This works because you have 2 sources that the file content could come from. One source is the "dogpile" branch, and the other source is a branch where "dogpile" changes were removed. Since the changes are present in one of the parents, the annotations are brought from there.This is what the 'qlog' of my refactoring looks like.The actual content changes (the little grey dots) actually span about 83 commits. However, you can see that I split that up into 6 new branches (some more independent than others), all of which generate a neat difference to their parent, and preserve all of the annotation information from the full history. You can also see that now that I have it split out, I can do simple changes to each branch (notice that purple has an extra commit). This will most likely come into play if peo[...]



Jonathan Lange decided to drop some hints about what is going on in Bazaar, and I figured I could give a bit more detail about what is going on. "Brisbane-core" is the code name we have for our next generation repository format, since we started working on it in our November sprint in Brisbane last year.I'd like to start by saying we are really excited about how things are shaping up. We've been doing focused work on it for at least 6 months now. Some of the details are up on our wiki for those who want to see how it is progressing.To give the "big picture" overview, there are 2 primary changes in the new repository layout.Changing how the inventory is serialized. (makes log -v 20x faster)Changing how data is compressed. (means the repository becomes 2.5:1 smaller, now fits in 25MB down from 100MB, MySQL fits in 170MB down from 500MB)The effect of these changes is both much less disk space used (which also affects number of bytes transmitted for network operations), and faster delta operations (so things like 'log -v' are now O(logN) rather than O(N), or 20x faster on medium sized trees, probably much faster on large trees).Inventory SerializationThe inventory is our meta-information about what files are versioned and what state each file is at, (git calls it a 'tree', mercurial calls it the 'changelog'). Before brisbane-core, we treated the inventory as one large (xml) document, and we used the same delta algorithm as user files to shrink it when writing it to the repository. This works ok, but for large repositories, it is effectively a 2-4MB file that changes on every commit. The delta size is small, but the uncompressed size is very large. So to make it store efficiently, you need to store a lot of deltas rather than fulltexts, which causes your delta chain to increase, and makes extracting a given inventory slower. (Under certain pathological conditions, the inventory can actually take up more than 50% of the storage in the repository.)Just as important as disk consumption, is that when you go to compare two inventories, we would then have to deserialize two large documents into objects, and then compare all of the objects to see what has and has not changed. You can do this in sorted order, so it is O(N) rather than O(N^2) for a general diff, but it still means looking at every item in a tree, so even small changes take a while to compute. Also, just getting a little bit of data out of the tree, meant reading a large file.So with brisbane-core, we changed the inventory layer a bit. We now store it as a radix tree, mapping between file-id and the actual value for the entry. There were a few possible designs, but we went with this, because we knew we could keep the tree well balanced, even if users decide to do strange things with how they version files. (git, for example, uses directory based splitting. However if you have many files in one dir, then changing one record rewrites entries for all neighbors, or if you have a very deep directory structure, changing something deep has to rewrite all pages up to the root.) This has a few implications.1) When writing a new inventory, most of the "pages" get to be shared with other inventories that are similar. So while conceptually all information for a given revision is still 4MB, we now share 3.9MB with other revisions. (Conceptually, the total uncompressed data size is now closer to proportional to the total changes, rather than tree size * num revisions.)2) When comparing two inventories, you can now safely ignore all of those pages that you know are identical. So for two similar revisions, you can find the logical difference between them by looking at data proportional to the difference, rather than the total size of both trees.Data CompressionAt the same time that we were updating the inventory logic, we also wanted to improve our storage efficiency. Right now, we store [...]

This Week in Bazaar


Ah, to take a break from reporting to the world, but now we are back. This used to be a completely weekly series of posts about the on-going events in the world of Bazaar (and may be yet again). Written by co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad.Bazaar 1.6rc3 ReleasedWith Martin Pool going on vacation for the next two weeks, John has stepped up to marshall 1.6 out the door. And he started with not 1 but 2 release candidates in 2 days. We're trying hard to get back into a time-based release schedule. The problem with sneaking in a feature-based release, is that they always end up slipping, as everyone tries to get "one-more-thing" in to the delayed release. However, with RC3, we've actually gotten the list of things that must be in 1.6 down to 0, so there is a very good chance it will become 1.6-final next week.Since it has been a delayed release, there are lots of goodies inside to partake of. Stacked Branches, improved Weave merge, significantly faster 'bzr log --short', improvements to the Windows installation, better server side hooks, and the list goes on. Most of this we have mentioned in previous "This Weeks", the big difference is that it is available in a release, rather than just in the trunk.The Windows install is one of the major changes, in that it will now (by default) bundle TortoiseBzr as part of the standalone install. TortoiseBzr still needs work before it is as much of a joy to work with as the rest of the system, but this release is mostly about testing our ability to bundle them together.Looking forward to Bazaar 1.7As 1.6 nears it's official release, the development community has started planning the 1.7 development process. As it stands now, bzr 1.7 has a planned release date of September 8th. This means there are two whole weeks two get various bugfixes and contributions to bazaar in before getting down to release time (mentoring available).Among the proposed potential features, there are a few that really stand out. Mark Hammond has been polishing Bazaar on Windows, and there is much desire for someone to help getting the bazaar test suite to run cleanly in Mac OS X. These features will greatly add to the existing portability strengths of Bazaar. While the majority of changes needed are actually in the test suite, and not the core functionality, the community could really use someone who could step up, and learn how to do unit testing in Python. Bazaar 1.7 will also see some increased merge flexibilities, especially with criss cross merges.Improvements to the indexing layer are likely to land in 1.7, though as always, not on the default format. (We want at least 1 release supporting a format before we suggest it as the default, to give people time for compatibility.) The new b+tree layout for indexes makes them smaller (by approx 2:1) and makes them faster to search (eg, bzr log file being 3x faster).We also have a chance to land Group Compress, which has shown to compress repositories by as much as 3:1 over their current size. This change needs a bit more tweaking, though. There are generally tradeoffs between how much time you spend compressing, and how small the result is. And we want to make sure that we make the right tradeoffs. It is currently being evaluated as a test plugin.Bazaar Bug DayAs Bazaar development speeds up, so do the incoming bugs. There are currently 1062 open bugs in Launchpad, and 287 of them have a "New" status, meaning they have not yet been triaged and categorized. At a past Bazaar sprint, a "bug day" was talked about, and it has been brought up again on the mailing list. Often, we fix many bugs and just haven't gotten around to marking them fixed. This is a great opportunity for members of the community who use Bazaar but don't directly develop[...]

Last Week in Bazaar


Well, I'm late this week, so I'm officially marking this post as Last Week in Bazaar. In my defense, I got busy last Thursday, and then my cohort (Paul Hummer) flew off to New Zealand for a work-related sprint. So today, I (John Arbash Meinel, a developer on Bazaar) get to exercise full control over the content.Keyword ExpansionPeople often request the ability to expand keywords, like they are used to in SVN and CVS. We've sort of postponed the implementation, because probably 90% of the time, it isn't really the right solution to the problem users are having. Also, they are kind of a mess in CVS anyway. Where I used to work we tried to use $Id$ style expansion, only to find out that they conflict on every attempt at merging, and we started working hard to strip them out of our files. In a distributed VCS, you usually merge at least an order-of-magnitude more often, which also tends to reveal this problem.SVN at least works around the problem, in that when you commit, it actually strips the texts of their expanded keywords, so that the repository never stores the expanded form. And merges are also done on the non-expanded form. Which fixes that little problem. Though it introduces a couple others. Specifically, what you have on disk is not what sits in the repository, nor is it exactly what you will get out of a fresh checkout. The biggest reason is that if you commit revno 1354, it will update the tags of files that are touched. But if you checkout revno 1354 it will update the tags of *all* files. (I'm not positive on this, but I know there was a bug which was causing problems for people trying to do conversions. Because they couldn't quite find the right invocation to have 'update -r 1354' (from 1353) give the exact same tree as 'checkout -r 1354').The other reason keyword expansion is not usually what you want, is because it expands only for the given file. If you make a commit to 5 other files, the *tree* is at revno 1359, but the file with your:my_version = "$Version$";Tag is still pointing at 1354. (Again, if 'svn update' would force all the tags to get re-expanded it might work correctly, though you run into performance problems expanding every keyword in every file on every update.) Bazaar has supported the bzr version-info command for a while, which lets you generate a version file (possibly from a template) which can store all the real details. Including the last-modified version for every file, whether any files in the working tree have been modified since commit, etc.The only case that I've really heard a good reason for keyword expansion is for a Website. Where each individual file is spread out into the world. So having a little "last modified" at the bottom can be pretty convenient. You also don't tend to have a "build" process which lets you generate the version information at that time.However, as Bazaar is meant to be a flexible system, Ian Clatworthy has done a wonderful job of adding the ability to support content munging via plugins. And has continued on to write a plugin specifically for expanding keywords. for all those people who feel they really need keyword expansion, look it up.I would imagine that once people get a good feel for it, and it matures a bit, it has a good chance to be brought into core code. Or at least make it into the release tarball as an "official" plugin.Open Source, Python, and Counting My BlessingsNow onto something a bit more personal. This last week I had cause to re-visit an old library I had written, and try to get it up and running again. (Specifically, the project was pydcmtk, python wrappers for the Dicom Toolkit.)It took me several hours times several days just to get it to build and run the test suite again. All without changing any of the code. It was simply a build-step problem.Wh[...]

This Week in Bazaar


Welcome back to the terrarium of the Bazaar distributed version control system. Written by co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad as he refines his plans for world domination from his shiny new lair.Bazaar 1.6b3 releasedThe next beta release of Bazaar has just been cut, and is available at your local PPA: Windows installers should be available later today. This release provides lots of the shiny things that we've been talking about, like Stacked Branches, Real Weave Merge, more hooks for server-side operation, and lots of bug fixes and general polishing. The full UI for using stacked branches still needs a little bit of polishing, so the feature is not enabled by default. The functionality is all there, and if you are interested, we'd love to hear from you (kudos and complaints are equally welcome).New updates to Gnome Bazaar PlaygroundComing back from a very productive trip to Guadec, Tim Penhey has been overseeing some customizations to the Bazaar Playground for Gnome. All of the branches created at the local server in Turkey for Guadec have been added to the public playground. The Loggerhead installation has received some TLC by way of customizations to the UI. Accerciser's playground page is a good demonstration af the UI changes that have been made. The playground is actively being used by applications such as Brasero, jhbuild, Metacity and more.One of the fun results of meeting with people at Guadec, is that it showed ways to improve Loggerhead when dealing with lots of projects and lots of branches. Work is continuing to make customizing Loggerhead's look-and-feel easier, and providing better tools for creating these "Bazaar Playgrounds" to use in evaluating Bazaar. The Bazaar developers are committed with making tools easier to use, and making the process as simple and powerful as possible.Up and Coming Repository Format UpdatesRobert Collins has been hard at work to refine how Bazaar stores its history information. We all like to have deep context, but we don't like to have to pay the penalty of downloading all of that context. Because Bazaar has a flexible repository structure, Robert has been able to play with changing the on-disk structure without major surgery to the rest of the code.First is a change to how indexes are written, switching from a bisectable list to a btree structure. This paged structure allows us to compress the indexes, making them smaller, and faster to process remotely. It also reduces the number of lookups to find a key. (On average, a bisect search is log2N, while the btree is closer to log100N.) At the moment, he is testing this with a shared repository containing all of the projects available from in the Ubuntu apt repositories. This weighs in at around 13k branches, and somewhere around 20GB of disk space used.Second is an update to how texts are stored. At the moment we use a simple format which places fulltexts periodically, and then stores deltas against those fulltexts. It has served us rather well, but can be improved upon. With his Group compress work, we can see a savings of as much as 2x-3x. Further, the data is stored such that you can do simple linear reads to get the base fulltext and all deltas necessary to generate a given fulltext. This reduces the pressure on indices, as you don't have to search for base texts. (Instead you just store a pointer to the start, and give the total length that needs to be read.)These are still in development phase, but a format that uses them will likely appear in the next release (bzr 1.7).Community AgileIan Clatworthy has recently released a wonderful document describing the workflow we (generally) use at Canonical. It describes how[...]

This Week in Bazaar


Here we are again, bringing you the gossip and dirty secrets in the development world of the Bazaar distributed version control system. In this, the 10th week, the series is now under new management, with co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad.

Bundle Buggy

Aaron Bentley has once again been improving his wonderful Bundle Buggy. He just introduced support for multiple projects using a single instance of Bundle Buggy. There are now 5 Bazaar projects using the main bundle buggy instance. (Bazaar, bzr-gtk, Bundle Buggy itself, Bzrtools, and PQM.) Of course, Daniel Watkins has made excellent use of his time, and has managed to crank out lots of updates for PQM. At this point it is code clean up, reducing the dependencies making it easier to set up and install.

Bazaar playground for Gnome

Originally, John Carr set up Bazaar mirrors of all the Gnome modules, which people could then use as a starting point for publishing code and collaborating. This week, the Bazaar playground for gnome was created so that any Gnome developers could be involved in pushing, branching, and sharing code through bazaar. This new server runs Loggerhead for viewing the code committed to these Bazaar branches. Damned Lies is also set up on the playground. This server was also reproduced locally at GUADEC because of the flaky internet connection at the conference, and all those local branches will be moved to the playground shortly.

Weave merging and handling "interesting" history

One of the great things about having a large project like MySQL using your software is that they push and stretch you in ways that you haven't necessarily encountered before. Specifically, their branch workflow looks a bit like a pile of spaghetti. With several long-term maintenance branches, team branches based off of that, and individual developer branches based off of that. Patches have a tendency to travel in unexpected ways (you may go user => team => release 1 => release 2, or you might go release 1 => team => team-2 => release 2, etc). They also are very fond of 'null merging' patches that aren't relevant to the next release. They merge the change and revert the text changes and commit.

Bazaar supports all of this, but it exposes weaknesses in simple 3-way merge logic. Because patches don't flow in anything considered orderly, you don't have the opportunity to select a "clean" base very often. Bazaar has long had an option for doing a "--weave" merge. It didn't receive much attention for a while, and had become rather slow. It turned out to be a good fit for MySQL's workflow, so John has spent a bit of time recently to make the functionality efficient and correct in some specific edge cases. Expect the improvements to show up in the next release.

This Week in Bazaar


This is the 9th in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, unlicensed health professional. This week we are joined by Paul Hummer, who works on integrating Bazaar into Launchpad.

How to integrate bzr into your build and release process

Once you are happily using bzr on your project, the next step is some basic integration into your build process. A common desire is getting revision number to store during build process, so that you can tell what revision your program was built with. This is easy to do with 'bzr revno', which prints the current revision number. Thats not very exciting though.

There is a much more sophisticated command in bzr called version-info. For example, running:
 bzr version-info --custom \
--template="#define VERSION_INFO \"Project 1.2.3 (r{revno})\"\n"

Will produce a C header file with formatted string containing the current revision number. Other supported variables in the templates are: date, build date, revno, revision id, branch nickname, and clean (which shows whether the tree contained uncommitted changes). This makes integrating into make or another build system very easy. The templates make it very easy to generate a version file for whatever language you are writing in.

What else could be automated other than version info? The bzr-stats plugin has a credits command. This is useful for getting a list of contributors to fill out a credits page, easter egg, etc. Also, changelogs can be generated with the gnulog plugin.

Andrew Bennetts has been working on a new server side push hook that can be used to run tests before allowing a push to complete. Wow, this could replace PQM! Well, not quite. This is more of a poor-man's PQM. It doesn't scale as well, but would work for smaller teams that don't necessarily need PQM. Blocking push while tests are running is not a good idea if you have a very long test suite, and PQM will merge and commit, making it easier to deal with multiple people trying to merge changes at the same time. If you're working in a very small group (1-3 people) with a smaller test suite, using these hooks might be just the trick, but for a larger work group you should still set up PQM.

Right now PQM is a fair amount of work to set up, but that should be changing soon. Daniel Watkins has started work on making PQM easier to set up and use, and others have been submitting cleanup patches too.

Finally, if you are using bzr on a project that builds .deb packages, check out the builddeb plugin. It would be great to have plugins for other packaging tools as well! RPM, MSI, JAR, WAR, etc.

This Week in Bazaar


This is the eighth (wow, 2 whole months of solid updates, yipee!) in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who drinks the rain. This week we are joined by Martin Albisetti, talking about Loggerhead, and dreaming of a cold pint.

bzr-search, loggerhead, gnome, and you

Robert Collins recently published his awesome bzr-search plugin, and John Carr has been doing a lot of work on setting up a bzr mirror of Gnome. A neat search module and a bunch of source trees is just begging to be combined in some sort of web interface!

There are a few web front ends for Bazaar at the moment, such as Loggerhead, webserve, viewbzr, and bzrweb. Today we are going to be focusing on Loggerhead (you can also go to its Launchpad project page to watch the development activity). It is probably the one with the most active development at the moment. An installation of the latest stuff in action is available at the bzr mirror of Gnome. Loggerhead shows side-by-side diffs, has RSS feeds, and lets you download specific changes, just like you would expect.

You can get the latest version of it yourself by doing:
bzr branch lp:loggerhead
You'll need python-simpletal and python-paste. Then by running "" in the directory where you're branches live, you should be up and running with your own web interface. Eventually is to expected to become a bzr plugin which will let you easily serve your branches with a single bzr command.

We hinted at it above; recent versions have started integrating with bzr-search. So for branches that you've run "bzr index" on, it can give hints in the search dialog, and quickly find revisions that match your search terms. You can try it yourself by just typing a few letters into the search dialog.

In the coming weeks, Loggerhead will be getting a bit of a face lift with a new theme to make its externals as shiny and new as its internals.

So give it a poke, and send any feedback to either, or

This Week in Bazaar


This is the seventh in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who is sentimental today.

MySQL Switches to Bazaar

Very big news for the Bazaar team today, as MySQL announces switching from Bitkeeper to Bazaar.

One of the things that was important in doing this conversion was doing a very high quality import of all the existing history. John did a great job working on that, and even added a new feature to Bazaar and bzr-gtk to enable this: per-file commit messages. Since per-file commit messages had been used for years in the MySQL code base, it was not acceptable to lose them, and none of the DVCS systems under consideration supported these messages. Although this feature is debated by some, it was important to preserve that history, and so support for per-file commit messages was added to Bazaar in a non-invasive way, where projects who wanted to use them could, but existing projects were not forced to adopt them. At the moment, to enter per-file commit messages you need to use the bzr-gtk GUI commit tool, but we'd love it if someone came up with a clean way to enable this in the standard CLI also.

It was also important to have a smooth transition period that did not interrupt delivering MySQL releases. This meant we needed a stable importer where the imports could be periodically refreshed without causing all of the developers around the world working on the project to re-download all their trees. At one point we were doing continuous imports of over 30 trees.

It's been a fun and challenging project providing support to MySQL during this time. Although we're really excited about this milestone, we still have plenty of work to do. Here are a few things we've learned, where we are working to make Bazaar even better.

Stacked Branches. We've talked previously about stacked branches, and for a project like MySQL this new feature will make uploading a new branch to Launchpad much faster.

Merging - Bazaar has several good merge algorithms, but we still have some ideas to make merging go even smoother, particularly for some of the complicated ancestries that MySQL has. All merge algorithms have their own set of trade offs, edge cases that they handle better or worse than other algorithms.

We also need to continue to add GUI tools, and make further enhancements to existing tools. If you are looking for a valuable way to contribute to Bazaar, try lending a hand to one of the Bazaar GUI projects.

Last week we asked about bzr screencasts, and James Westby told me about a screencast that he recorded - if anyone else is interested in getting involved in producing a series of screencasts, please do let us know.

This Week in Bazaar


This is the sixth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who just wants a nice story and a nap.1.6 on the wayWe decided to change the release process a bit for the bzr 1.6 release. We're introducing a bit more than normal in this relase (such as Stacked Branches), so we've decided to delay the final release a couple of weeks to ensure that everything gets an extra coat of polish. We've already had 2 beta releases, which are available in the Source.Please give it a poke and let us know what you think.Diff and Merge ToolsWhen you start working with other people on a project, you need some way of seeing what code has changed, doing code reviews, resolving conflicts, etc. The 'bzr diff' command has a '--using=foo' argument that allows you to plug in your favorite diff/merge tool if you don't want the built-in text based diff. You can also add an alias for your favorite tool. For example, Elliot uses meld all the time, so he has 'alias mdiff=diff --using=meld'. You also might want to install the difftools plugin, which adds some smarts to Bazaar about whether a particular tool understands how to diff a full tree or needs to handle the files one at a time. Here are some of the more interesting diff tools that you might want to try out:MeldKdiff3vimdiffWikipedia lists many more file comparision toolsOne technique for easily reviewing a lot of incoming code is to keep around a pristine branch of your project that you use for conducting reviews. You can apply a patch to the tree, then run 'bzr mdiff' (or your own favorite tool), and take a look at all the changes in the patch with a lot more context than is included in the patch itself. This also gives you a spot to run the automated tests for that project, see if it compiles, etc. Once you are done with the review you can simply 'bzr revert' to get back to a clean tree and move on to the next patch to be reviewed.Another neat trick is to use the 'merge --preview' switch. You might want to use this command to take a look at any conflicts that might have been introduced if there have been changes since the patch was generated. It shows you the patch of exactly what would be merged into the branch at that moment in time, which can sometimes have differences from what you would be reviewing by reading the patch.Another interesting (but commercial) tool is It is a Mac OS X client which integrates with Finder and provides a comparison tool. It has direct support for Bazaar as well as several other version control systems.ScreencastsScreencasts are becoming a very popular way to show people how to use your fancy tool, and we'd like to get some volunteers to help with putting together some screencasts explaining how to use various parts of bzr and related tools. If you want to help with this, email elliot at canonical dot com. The great thing about screencasts is that they use a different avenue for conveying information (audio, motion, etc) so while it won't replace a written tutorial, it is a wonderful supplement.[...]

DVCS Comparison: On mainline merges and fast forwards


DVCS Comparison: On mainline merges and fast forwards has a discussion about whether 'fast forward' is a "better" method for merging in a distributed topology.I can understand where he is coming from, and we respect that some users prefer other workflows. Bazaar even has direct support for 'fast forward' with 'bzr merge --pull', and with our aliasing functionality, you can set:[ALIASES] merge = merge --pullIn ~/.bazaar/bazaar.conf and change the default meaning of 'bzr merge'. However, I still fall of the side of the fence that fast forward should not be the default.I can agree that if you have 2 people collaborating on the same feature that you would want fast forward. Though I would argue that is because they are effectively working on the same branch. For my personal workflow, I have a different alias set:log = log --short -r -10..-1 --forwardWhat this means is that when I type 'bzr log' I see just the mainline commits of a branch, without the merge cruft. (Where I define the merge cruft as the individual revisions that make up a feature change, not the 'merge foo' node.)Take this view of Patch Queue Manager 2008-06-02 [merge] (jam) Give Aaron the benefit of bug #2029283467 Patch Queue Manager 2008-06-03 [merge] (Martin Albisetti) Better message when a repository is locked.3468 Patch Queue Manager 2008-06-03 [merge] (mbp) merge 1.6b1 back to trunk3469 Patch Queue Manager 2008-06-04 [merge] (mbp) Update more users of default file modes from control_files to bzrdir3470 Patch Queue Manager 2008-06-04 [merge] (Jelmer) Move update_revisions() implementation from BzrBranch to Branch.3471 Patch Queue Manager 2008-06-04 [merge] (vila) Split a test3472 Patch Queue Manager 2008-06-04 [merge] (jam) Fix bug #235407, if someone merges the same revision twice, don't record the second one.3473 Patch Queue Manager 2008-06-05 [merge] Isolate the test HTTPServer from chdir calls (Robert Collins)3474 Patch Queue Manager 2008-06-05 [merge] Add the 'alias' command (Tim Penhey)3475 Patch Queue Manager 2008-06-05 [merge] (mbp) #234748 fix problems in final newline on Knit add_lines and get_linesYou get to see a nice short summary of everything that has been happening (in proper chronological order.) Admittedly, seeing "Patch Queue Manager" on each of those commits is less optimal (which is why we add the author names.) That is just a temporary limitation of our PQM. Bazaar already supports setting an --author, separate from the committer, we just need to teach our integration bot to use it.The big difference, IMO, is whether you are bringing in someone else's changes to enhance your work, or whether you are collaborating on the same item. I would argue that collaborating on the same item is slightly less common. It also depends what you do with the merge commits. Just saying "merge from branch A" is certainly not helpful. But when you can say "merge Jerry's changes to", it can indeed be helpful when tracking back through and figuring out where and when "foo" changed, without being lost in the forest for having too many trees.[...]

This Week in Bazaar


This is the fifth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot "fresh needle" Murphy. The two topics for this week are not related, but it's our blog and we get to write what we want.Hosting of Bazaar branchesOne of the first questions people ask when moving to Bazaar is "Where can I host my branches?" Even with distributed revision control, it is often handy to have a shared location where you publish your code, and merge code from others. Canonical has put a lot of work into making an excellent place to host code, but there are many other options available.Because bazaar supports "dumb" transports like sftp, you can publish your branches anywhere that you can get write access to some disk space. For example, gives projects some web space with sftp access, and you can easily push branches up over sftp. It's also easy to use bzr on any machine that you have ssh access to, you don't even need to install bazaar on the remote machine.As bazaar is a GNU project, we've been working with the Savannah team to enable bazaar hosting on Savannah also.Another option is serving bazaar branches over HTTP. You can do this for both read and write access, and there is a great HOWTO in the bazaar documentation. Do you know of anywhere else that is offering Bazaar hosting? Let us know in the comments!Bazaar review and integration processHow do you ensure high quality code, when working on a fast moving codebase in a widely distributed team? Here are some things that we've been doing with the Bazaar project, and we think they are useful practices for most projects.Automated Test SuiteOne very important key towards having a stable product is proper testing. As people say "untested code is broken code". In the Bazaar project, we recommend that developers use Test Driven Development as much as possible. However, what we *require* is that all new code has tests. The reason it is important for the tests to be automated, is because it transfers knowledge about the code base between developers. I can run someone else's test, and know if I conformed to their expectations about what this piece of code should do.This actually frees up development tremendously. Especially when you are doing large changes. With a good test suite, you can be confident that after your 2000 line refactoring, everything still works as before.Code ReviewHaving other people look at your changes is a great way to catch the little things that aren't always directly testable. It also helps transfer knowledge between developers. So one developer can spend a couple weeks experimenting with different changes, and then there is at least one other person who is aware of what those are.The basic rules for getting code merged into Bazaar is that:It doesn't reduce code clarityIt improves on the previous codeIt doesn't reduce test coverageIt must be approved by 2 developers who are familiar with the code base.We try to apply those rules to avoid hitting the rule "The code must be perfect before it is merged", and the associated project stagnation. Code review is a very powerful tool, but you have to be cautious of "oh, and while you are there, fix this, and that, and this thing over here." Sometimes that is useful to catch things that are easy (drive-by fixes). It can also lead to long delays before you actually get to see the improvements from someone's work, and long delays are demotivating.Item number 3 is a pragmatic way to a[...]

This Week in Bazaar


This is the fourth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, imaginary boy and part-time impostor.Stacked branchesSome projects are very big with lots of files and lots of history. Many projects want to maintain the policy that development is done on independent branches, which are then merged back when the development is complete. However, the overhead of downloading, branching, and uploading the full history is prohibitive. There are a couple of different ways to solve this problem.Dealing with a large branch can be split into two problems: downloading and uploading.Bazaar has had a storage optimization called shared repositories for quite a while. This serves to dramatically reduce the amount of data downloaded for the second, third, etc branches of a project. A shared repository is a big pool of revisions which multiple branches point to. When you grab a new branch into a shared repository, bzr figures out how much of the history it already has, and only downloads the new revisions. So the first branch of a large project transfers most of the data, and grabbing additional branches is very cheap. In extreme cases, like working on a multi-gigabyte project from a 56k dial-up connection, you could even do things like distribute the initial data on a DVD to prime the shared repository, and then the user only needs to download incremental changes.This technique can also be used for solving the uploading problem. If the upload location uses a shared repository, then uploading a new branch can just copy the new data. The problem with this, is once you start introducing multiple users, who decide that they may not want to give access to other people to push data into their repository.Another approach to minimizing the data uploaded is called server side forking, and you can see a nice implementation of this on The user places a request with the code host to do the copy for them, and when it finishes, they have their own location already primed with the current branch.The Bazaar project is approaching it in a different way. If some data is already public, then you can just reference the other public location when you start uploading your new branch. The first steps in this direction are being termed "Stacked Branches". Basically, instead of requiring all branches to contain the full history, you are allowed to "Stack" a branch on top of another. Because the uploader does not have write access to the lower levels of the stack, this addresses the security risks of shared repositories.Stacking also opens up possibilites for the "download" side of the equation. For many users, they don't need a very deep copy of history to get their work done. If there is a location that can be trusted to be available when they need it, they can copy just the tip revisions. Which would allow them to do most of their work (commit, diff, etc) without consulting the remote host. And when they need more information (such as during a complex merge), the bzr client is able to fall back to the original source to get any needed information.The goal of all this is to make it very easy to start working with a large project, while still making all the history available in a meaningful way. The bulk of this work has been completed, and it is likely that it will land in bzr 1.6 (to be released in a couple of weeks.)[...]

This Week in Bazaar


This is the third in an amazingly regular weekly series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, Launchpad developer and relentless agitator. This week we have a special guest, Jelmer Vernooij, Samba developer, and author of the Bazaar Subversion plugin.In last week's episode, our fearless explorers braved the new world of plugins. Today we will focus on a specific plugin, and talk about how you can use Bazaar with Subversion. Earlier this week there was a very nice blog post about using Git with the Subversion servers on Google Code Hosting, and plenty of interesting discussion afterwards.RationaleIf you have Bazaar installed, why would you want to work with Subversion? Well, it's nice not to have to force the whole world to change at once. Bazaar-Subversion integration allows you to use Bazaar without any changes required from the project administrators to the central Subversion server.There are three general cases, where you would want to use bzr-svn:Upstream uses Subversion, and you don't yet have commit access. With bzr-svn, you are able to still make your improvements with all the benefits of a great VCS.Project has chosen to use Subversion, you want something better, but still want to play nice with your fellow developers. You can commit to your local Bazaar branch, and push those changes back into Subversion. You can even do "bzr commit" in your Subversion checkout and have it commit those changes to the Subversion server.Migration from Subversion to Bazaar. Often when migrating from once VCS to another, there is a period of time where people are adjusting to the new system. bzr-svn allows you to continue allowing people to commit to Subversion, it's just another branch with changes to be merged.OverviewCurrently the bzr-svn dependencies can be a bit tricky to install on some platforms, but that should be much easier once Subversion 1.5 is released. Once you get things installed, it's pretty amazing what you can do. On most debian based systems, it is a simple "apt-get install bzr-svn" away.Once you have bzr-svn installed, you can start using Subversion branches as though they were regular Bazaar branches.General usageNow that you have bzr-svn installed, how do you get a local copy of your Subversion project? Generally, it is just a "bzr checkout URL" away. $ bzr checkout svn+ will create a local checkout of your project that contains a local copy of the history present remotely.You should now be able to use this branch like any regular Bazaar branch. Since this is a bound branch, any commits you make will also be show up in the Subversion repository.It is possible to create new local branches from this branch, for example for feature branches:: $ bzr branch trunk feature-1And to merge the branch back into Subversion once it is finished, you can use merge like you would with any ordinary Bazaar branch $ bzr merge ../feature-1 $ bzr ci -m "Merge feature-1"In addition to the code changes, bzr-svn will write metadata about the history of the new commit into Subversion. This means that your merge history is available, so when someone else comes along and grabs a copy of the branch using Bazaar, they can see what happened. To a normal Subversion client this is transparent, the custom properties are simply ignored.It is also possible to push directly from the feature branch into Subversi[...]

This Week in Bazaar


This is the second in a mostly-every-week series of posts about whats been happening in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, Launchpad developer and compulsive conflict avoider.PluginsOne of the nice things about Bazaar is the API, which enables new features to be added with plugins. Once a feature is polished and proves widely useful, it can move from a plugin into core bazaar. Most of the plugins are hosted/mirrored on Launchpad, and are a simple "bzr branch lp:bzr-plugin ~/.bazaar/plugins/plugin" away. For the rest, they are indexed at Here's a quick summary of some of the plugins we are using on our laptops right now:bookmarks: This allows me to store an alias for a branch location, so it is easier to branch/push to a common location. So I can type 'bzr get bm:work/foo' instead of 'bzr get bzr+ssh://'bzrtools: a collection of commands which provide extended functionality. Such as 'bzr cdiff' to display colored diffs and 'bzr shelve' to temporarily revert sections of changes.difftools and extmerge: These plugins let me view differences in meld or kdiff3 (or anything that you want to configure, really), and do merges via Keep people informed of what you are working on by sending an email after every commit.fastimport: This plugin allows me to import code from my friends mercurial repository and push it to launchpad.git: this gives me read access to a local git repositorygtk: This is the Bazaar Gtk GUI, which has some nice tools like visualize and gcommit.htmllog: Useful for generating html formatted logs for publishing on the web.loom: Allows me to manage several "layers" of development in a single branch, and colloborate on those layers with other people.notification: Gives a GUI popup when a pull or push completespqm: This formats a merge request to PQM. PQM then takes my branch, merges to main, runs tests, and commits the merge if all was well. This ensures that we always have passing tests in the main tree!push_and_update: This updates the working tree when I push my branch to a remote server. Very useful for doing website updates.removable: I try to keep all branches very small for easier review, so I have a lot of branches at one time. This tells me which branches have already been merged to the main tree (and thus can be removed). It can also let me know why something is not ready to be removed.stats: Provides 'bzr stats' which gives a simple view of how many people have committed to your project and how many commits each has done.update_mirrors: 'bzr update-mirrors' recursively scans for Bazaar branches and updates them to their latest upstream.vimdiff: Adds the commands 'bzr vimdiff' and 'bzr gvimdiff'. Which opens vim in side-by-side mode, showing you your changes.qbzr: Another great GUI for bzr, this one is written using Qt.1.5rc1, 1.5 this FridayContinuing our pattern of having time-based releases, bzr 1.5rc1 was released last Friday, and 1.5 final should be released tomorrow. Ever wonder how we churn out releases so regularly? The biggest factor enabling us to make consistent releases is our use of a Patch Queue Manager. It ensures that all of our 11,724 unit tests pass before allowing any merge into mainline. Even when lots of changes are landing, the trunk can be considered release quality. Most of the developers use the tip of mainline for their da[...]

Creating a new Launchpad Project (redux)


A while back I posted about how to set up a new launchpad project. At the time it took quite a few steps to set everything up that you wanted. I'm happy to report that a lot of those steps have been streamlined, so I posting a new step-by-step instruction for setting up your project in Launchpad.

  1. Make sure the project isn't already registered. A lot of upstream projects have already been registered in Launchpad, as it is used to track issues in Ubuntu. So it is always good to start on the main page and use the search box "Is your project registered yet?".
  2. If you don't find your project, there will be a link to Register a new project
  3. The form for filling out your project details has been updated a bit, but you should know the answers. (I still use 'bazaar' as the "part of" super-project, and bzr-plugin-name for my plugins)
  4. This is where things start to get easier. After you have registered the project you can follow the Change Details link. This is generally It was the same before, but now more information is on a single page, so you can set up more at once. Here I always set the bug tracker to be Launchpad, I click the boxes to opt-in for extra launchpad features.
  5. Optionally you can assign the project to a shared group. Follow the "Change Maintainer" link ( I generally assign them to the bzr group, because I don't want to be the only one who can update information.
  6. At this point you should be able to push up a branch to be used as the mainline using:
    bzr push lp:///~GROUP/PROJECT/BRANCH
    in my example, this is lp:///~bzr/PROJECT/trunk. (You may need to run 'bzr launchpad-login' so that bzr knows who to connect as, rather than using anonymous http:// urls)
  7. You now want to associate your mainline branch with the project, so that people can use the nice lp:///PROJECT urls. You can follow the link on your project page for the "trunk" release series (usually this is On that page is a "Edit Source" link, or
    Set the official release series branch to your new ~GROUP/PROJECT/BRANCH.
See, now it is only 7 steps instead of 11. (Though only really one or two steps has actually changed.)

This Week In Bazaar First Edition


This is the first in a mostly-every-week series of posts about whats been happening in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, Launchpad developer and wanted criminal.We get to talk about anything we want. This week:What's been happening for a better GUI on WindowsWhat's new in the 1.4 releaseImporting from other VCS's with bzr fast-import... details ...GUI on WindowsWe found this guy named Mark Hammond who claims to know how to make python stuff work well on windows. There is an existing GUI tool for Bazaar on Windows called TortoiseBZR now, modeled after TortoiseSVN. If you haven't used a Tortoise before, they are extensions that integrate into Windows Explorer; allowing you to see and control the versioning of your files without needing to change to a separate tool.Mark has taken a look and proposed a series of enhancements to make the tool work even better. Bazaar already works very well from the Windows command prompt, but we want to provide excellent GUI tools as well. Take a look at the TortoiseBZR web page for screenshots of it in action.What's new in the 1.4 releaseThe Bazaar team releases a new version of Bazaar just about every month, with both bugfixes and new features. The bzr-1.4 release came out last Thursday, May 1st.The major changes for 1.4 include improvements in performance of 'log' and 'status', and a new Branch hook called post-change-branch-tip, which will trigger any time a Branch is modified (push, commit, etc). This should enable server generated emails whenever somebody publishes their changes. Write something cool with it and tell us what you did!The full list of changes for 1.4 can be found at: list of all changes is at fast-importBazaar fast-import is a plugin for bazaar that allows you to import from many different version control systems. The fast-import stuff is intended to support any system that can use the fast-export format. This format was originated by git developers, and quickly adopted elsewhere. So if a source format can generate a "fast-import" stream, you should be ableto import it into Bazaar.CVSTo convert from cvs, you currently use the cvs2svn converter. Which has a flag to generate a "fast-import" stream.MercurialThere is a script called bundled with the plugin (in the exporters/ directory).SVNThe svn-fast-export script is also bundled with the bzr-fastimport plugin.gitBundled with the standard git distribution is the git-fast-export command.Your own exotic system here.Give fast-import a try. It's mostly designed for 1-time conversions, rather than mirroring, but there are already some rudimentary mirroring capabilities.That's all for the first installment of "This Week in Bazaar".(edited for formatting)[...]

Bazaar vs Subversion


Every so often someone comes along wanting to know which VCS they should use. I won't claim to be an impartial observer, but this is a list of things I put together for the last discussion, that I thought I would share here.SVN requires all commits to go to a central location, and tends to favor having multiple people working on the same branch.This is both a positive and a negative depending on what you are trying to do.When you have a bunch of developers that don't know a lot about VCS, it simplifies things for them. They don't have to worry about branches, they just do their work and check it in.The disadvantage is that they can tread on each other's toes (by committing a changethat breaks someone else's work), and their work immediately getsmixed together and can't be integrated separately.Bazaar has chosen to address this with workflows. You can explicitly have a branch set up to send all commits to a central location (bzr checkout), just as you do with SVN. Also, if two people checkout the same branch, they must stay in sync. (Bazaar actually has a stronger restriction here than SVN does, because SVN only complains if they modify the same files, whereas Bazaar requires that the whole tree be up to date.)However, with a Bazaar checkout, there is always the possibility to either bzr unbind or just bzr commit --local when you are on a plane, or just want to record in-progress work before integrating it into the master branch.SVN has a lot more 3rd party support.SVN has just been around longer, and is pretty much the dominant open source centralized VCS. There are a lot of DVCSes at the moment, all doing things a little bit differently. Competition is good, but it makes it a bit more difficult to pick one over the other, and 3rd party tools aren't going to build for everyone.However, Bazaar already has several good third party tools. For viewing changes to a single file, bzr gannotate can show when each line was modified, and what the associated commit message was. It even allows drilling back in history to prior versions of the file.For viewing the branch history (showing all the merged branches, etc) there is bzr viz.There are both gtk and qt GUIs, a Patch Queue Manager (PQM) for managing anintegration branch (where the test suite always must pass or the patch isrejected.)There is even basic Windows Shell integration (TortoiseBzr), a Visual Studio plugin, and an Eclipse plugin.Bazaar is generally much easier to set up.SVN can only really be set up by an administrator. Someone who has a bit more of an idea what they are doing. Setting up WebDAV over http is easier than it used to be, but it isn't something you would ask just anyone to do. Getting a project using Bazaar is usually as simple as bzr init; bzr add; bzr commit -m "initial import".You can push and pull over simple transports (ftp, sftp, http).Because SVN is centralized, you only really set it up one time anyway, so as long as you have one competent person on your team, you can probably get started.It is easier to get 3rd party contributions.If you give a user commit access to your SVN repository, then you have their changes available whenever they commit. But usually this also means that they have access to change things that you don't really want them to touch. (Yes, there are ACLs that you can set up, but I don't know many projects that go to that trouble for casual contributors.)If you haven't given them commit access, then they have to [...]

Ogg Vorbis and iTunes


I've been a longtime supporter of Ogg Vorbis, and I'm also a Mac user. While I haven't figured out how to get my iPod to play Ogg just yet, I have worked on getting iTunes to play it. I periodically do searches to see if things have improved, but they seem to return mostly old data.

So I just wanted to get it out that the good people at Xiph have started maintaining the Ogg Vorbis plugin. It is available here.

I don't seem to be able to find the page again, but I thought I read there were some small problems with the last release. They have development snapshots here. At least so far, I haven't run into any problems with it. And overall it seems to consume fewer CPU resources than the older releases.