Subscribe: The Chronicles of Richard
Added By: Feedage Forager Feedage Grade B rated
Language: English
code lines  code  dspace  group  information  lines code  lines  meeting  open  ore  project  release  repository  sword  work 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: The Chronicles of Richard

The Chronicles of Richard

A fugitive on an ice planet of information

Updated: 2015-10-07T17:56:14.656+00:00


An Analytical Anniversary


Today is my anniversary.  I have been at Symplectic Ltd for one of your Earth "years".  And a very busy one it has been, what with writing repository integration tools for our research management system to deposit content into DSpace, EPrints and Fedora, plus supporting the integration into a number of other platforms.  I thought it would be fun to do a bit of a breakdown of the code that I've written from scratch in the last 12 months (which I'm counting as 233 working days).  I'm going to do an analysis of the following areas of productivity:lines of codelines of inline code commentarynumber of A4 pages of documentation (end user, administrator and technical)number of version control commitsLets start from the bottom and work upwards.Number of version control commitsTotal: 700Per day: 3I tend to commit units of work, so this might suggest that I do 3 bits of functionality every day.  In reality I quite often also commit quick bug fixes (so that I can record in the commit log the fix details), or at the end of a day/week, when I want to know that my code is safe from hardware theft, nuclear disaster, etc.Number of A4 pages of documentationTotal: 72Per day: 0.31Not everyone writes their documentation in A4 form any more, and it's true that some of my dox take the form of web pages, but as a commercial software house we tend to produce well formatted, nice end-user and administrator documentation.  In addition, I rather enjoy at a geek level a nice printable document that's well laid out, so I do my technical dox that way too.The amount of documentation is relatively small, but it doesn't take into account a lot of informal documentation.  More importantly, though, at the back end of the first version of our Repository Tools software, the documentation is still in development.  I expect the number of pages to probably triple or quadruple over the next few weeks.Lines of Code and Lines of CommentaryI wrote a script which analysed my outputs.  Ironically, it's written in Python, which isn't one of the languages that I use professionally, so it's not included in this analysis (and none of my personal programming projects are therefore included).  This analysis covers all of my final code on my anniversary (23rd March), and does not take into account prototyping or refactoring of any kind.  Note also that blank lines are not counted.Line Counts:XML (107 Files) :: Lines of Code: 17819; Lines of Inline Comments: 420XML isn't really programming, but it was interesting to see how much I actually work with it.  This figure is not used in any of the below statistics.  Some of these are large metadata documents and some are configuration (maven build files, ant build files, web server config, etc).XSLT (36 Files) :: Lines of Code: 8502; Lines of Inline Comments: 2762JAVA (181 Files) :: Lines of Code: 22350; Lines of Inline Comments: 7565JSP (16 Files) :: Lines of Code: 2847; Lines of Inline Comments: 1PERL (58 Files) :: Lines of Code: 6506; Lines of Inline Comments: 1699---------------TOTAL (291 Files) :: Lines of Code: 40205; Lines of Inline Comments: 12027I remember once being told that 30k lines of code a year was pretty reasonable for a developer.  I feel quite chuffed!Lines of code/comments per day:XSLT :: Lines of Code: 36; Lines of Inline Comments: 12JAVA :: Lines of Code: 96; Lines of Inline Comments: 32JSP :: Lines of Code: 12; Lines of Inline Comments: 0PERL :: Lines of Code: 28; Lines of Inline Comments: 7---------------TOTAL :: Lines of Code: 173; Lines of Inline Comments: 52It looks much less impressive when you look at it on a daily basis.  We just have to remember that this is 173 wonderful lines of code every day!Comment to code ratio (comments/code):XSLT :: 0.33JAVA :: 0.34JSP :: 0PERL :: 0.26---------------TOTAL :: 0.30It was interesting to see that my commenting ratio is fairly stable at about 30% of the overall codebase size.  I didn't plan that or anything.  This includes block comments for classes and methods, and inline programmer documentation.  The reason for the shortfall in Perl is sugge[...]

ORE software libraries from Foresite


The Foresite [1] project is pleased to announce the initial code of two software libraries for constructing, parsing, manipulating and serialising OAI-ORE [2] Resource Maps. These libraries are being written in Java and Python, and can be used generically to provide advanced functionality to OAI-ORE aware applications, and are compliant with the latest release (0.9) of the specification. The software is open source, released under a BSD licence, and is available from a Google Code repository:

You will find that the implementations are not absolutely complete yet, and are lacking good documentation for this early release, but we will be continuing to develop this software throughout the project and hope that it will be of use to the community immediately and beyond the end of the project.

Both libraries support parsing and serialising in: ATOM, RDF/XML, N3, N-Triples, Turtle and RDFa

Foresite is a JISC [3] funded project which aims to produce a demonstrator and test of the OAI-ORE standard by creating Resource Maps of journals and their contents held in JSTOR [4], and delivering them as ATOM documents via the SWORD [5] interface to DSpace [6]. DSpace will ingest these resource maps, and convert them into repository items which reference content which continues to reside in JSTOR. The Python library is being used to generate the resource maps from JSTOR and the Java library is being used to provide all the ingest, transformation and dissemination support required in DSpace.

Please feel free to download and play with the source code, and let us have your feedback via the Google group:

Richard Jones & Rob Sanderson

[1] Foresite project page:
[2] OAI-ORE specification:
[3] Joint Information Systems Committee (JISC):
[4] JSTOR:
[5] Simple Web Service Offering Repository Deposit (SWORD):
[6] DSpace:

DSpace 1.5 Beta 1 Released


I'm pleased to be able to relay that DSpace 1.5 has been released for beta testing. Particularly big thanks to Scott Philips, the release coordinator and lead Manakin developer for his contributions to it. From the email announcement:

The first beta for DSpace 1.5 has been released. You may either checkout the new tag directly from SVN or download the release from sourceforge. One sourceforge you will not that there are two types of releases:


- The "dspace-1.5.0-beta1-release" is a binary download that just contains dspace, it's manual, configuration, and a few other essential items. Use this package if you want to download DSpace pre-compiled and get it up running with no customizations.

- The other release, "dspace-1.5.0-beta1-src-release" is a full copy of the DSpace source code that you can modify and customize. Use this release as an alternative to checking out a copy of the source directly from SVN.

Sourceforge download URL:

There is going to be a full week testathon next week, which we encourage everyone to get involved in. Please do donwload and install either or both of the available releases, and let us know how you get on. Give it your best shot to break them, and if you do and are able to, consider sending us a patch to fix what was broken. The developers will be available (depending on time zone) in the DSpace IRC channel to help with diagnoses and fixes and any other questions:

channel: #dspace

See you there!(image)

CRIG Flipchart Outputs


The JISC CRIG meeting which I previously live-blogged from has now had its output formulated into a series of slides with annotations on Flickr, which can be found here:

The process by which this was achieved was through an intense round of brain-storming sessions culminating in a room full of topic spaced flip chart sheets. We then performed a Dotmocracy, and the results that you see on the Flickr page are the ideas which made it through the process as having some interest invested in them.(image)

European ORE Roll-Out at Open Repositories 2008


The European leg of the ORE roll-out has been announced and will occur on the final day of the Open Repositories 2008 conference in Southampton, UK. This is to complement the meeting at Johns Hopkins University in Baltimore on March 3. From the email circular:

A meeting will be held on April 4, 2008 at the University of Southampton, in conjunction with Open Repositories 2008, to roll-out the beta release of the OAI-ORE specifications. This meeting is the European follow-on to a meeting that will be held in the USA on March 3, 2008 at Johns Hopkins University.

The OAI-ORE specifications describe a data model to identify and describe aggregations of web resources, and they introduce machine-readable formats to describe these aggregations based on ATOM and RDF/XML. The current, alpha version of the OAI-ORE specifications is at

Additional details for the OAI-ORE European Open Meeting are available at:

- The full press release for this event:

- The registration site for the event:

Note that registration is required and space is limited.

Fine Grained Repository Interoperability: can't package, won't package


Sadly (although some of you may not agree!), my paper proposed for this year's Open Repositories conference in Southampton has not made it through the Programme Committee. I include here, therefore, my submission so that it may live on, and you can get an idea of the sorts of things I was thinking about talking about.The reasons given for not accepting it are probably valid; mostly concerning a lack of focus. Honestly, I thought it did a pretty good job of saying what I would talk about, but such is life.What is the point of interoperability, what might it allow us to achieve, and why aren't we very good at it yet? Interoperability is a loosely defined concept. It can allow systems to talk to each other about the information that they hold, about the information that they can disseminate, and to interchange that information. It can allow us to tie systems together to improve ingest and dissemination of repository holdings, and allows us to distribute repository functions across multiple systems. It ought even to allow us to offer repository services to systems which don't do so natively, improving the richness of the information space; repository interoperability is not just about repository to repository, it is also about cross-system communications. The maturing set of repositories such as DSpace, Fedora and EPrints and other information systems such as publications management tools and research information systems, as well as home-spun solutions are making the task of taking on the interoperability beast both tangible and urgent.Traditional approaches to interoperability have often centred around moving packaged information between systems (often other repositories). The effect this has is to introduce a black-box problem concerning the content of the package itself. We are no longer transferring information, we are transferring data! It therefore becomes necessary to introduce package descriptors which allow the endpoint to re-interpret the package correctly, to turn it back into information. But this constrains us very tightly in the form of our packages, and introduces a great risk of data loss. Furthermore, it means that we cannot perform temporally and spatially disparate interoperability on an object level (that is, assemble an object's content over a period of time, and from a variety of sources). A more general approach to information interchange may be more powerful. This paper brings together a number of sources. It discusses some of the work undertaken at Imperial College London to connect a distributed repository system (built on top of DSpace) to an existing information environment. This provides repository services to existing systems, and offers library administrators custom repository management tools in an integrated way. It also considers some of the thoughts arising from the JISC Common Repository Interfaces Group (CRIG) in this area, as well as some speculative proposals for future work and further ideas that may need to be explored.Where do we start? The most basic way to address this problem is to break the idea of the package down into its most simple component parts in the context of a repository: the object metadata, the file content, and the use rights metadata. Using this approach, you can go a surprisingly long way down the interoperability route without adding further complexity. At the heart of the Imperial College Digital Repository is a set of web services which deal with exactly this fine structure of the package, because the content for the repository may be fed from a number of sources over a period of time, and thus there never is a definitive package.These sorts of operations are not new, though, and there are a variety of approaches to it which have already been undertaken. For example, WebDAV offers extensions to HTTP to deal with objects using operations such as PUT, COPY or MOVE which could be used to achieve the effects t[...]



Last week I was at the ORE meeting in Washington DC, and presented some thoughts regarding SWORD and its relationship to ORE. The slides I presented can be found here:

[Be warned that discussion on these slides ensued, and they therefore don't reflect the most recent thinking on the topic]

The overall approach of using SWORD as the infrastructure to do deposit for ORE seems sound. There are three main approaches identified:

- SWORD is used to deposit the URI of a Resource Map onto a repository
- SWORD is used to deposit the Resource Map as XML onto a repository
- SWORD is used to deposit a package containing the digital object and its Resource Map onto a repository

In terms of complications there are two primary ones which concern me the most:

- Mapping of the SWORD levels to the usage of ORE.

The principal issue is that level 1 implies level 0, and therefore level 2 implies level 1 and level 0. The inclusion of semantics to support ORE specifics could invoke a new level, and if this level is (for argument's sake) level 3, it implies all the levels beneath it, whatever they might require. Since the service, by this stage, is becoming complex in itself, such a linear relationship might not follow.

A brief option discussed at the meeting would be to modularise the SWORD support instead of implementing a level based approach. That is, the service document would describe the actual services offered by the server, such as ORE support, NoOp support, Verbose support and so forth, with no recourse to "bundles" of functionality labelled by linear levelling.

- Scalability of the service document

The mechanisms imposed by ORE allow for complex objects to be attached to other complex objects as aggregated resources (ORE term). This means that you could have a resource map which you wish to tell a repository describes a new part of an existing complex object. In order to do this, the service document will need to supply the appropriate deposit URI for a segment of an existing repository item. In DSpace semantics, for example, we may be adding a cluster of files to an existing item, and would therefore require the deposit URI of the item itself. To do otherwise would be to limit the applicability of ORE within SWORD and the repository model. Our current service document is a flat document describing what is pragmatically assumed (correctly, in virtually all cases) to be a small selection of deposit URIs. The same will not be true of item level deposit targets, which could be a very large number of possible deposit targets. Furthermore, in repositories which exploit the full descriptive capabilities of ORE, the number of deposit targets could be identical to the number of aggregations described (which can be more than one per resource map), which has the potential to be a very large number.

The consequences are in scalability of response time, which is a platform specific issue, and the scalability of the document itself and the usefulness of the consequences. It may be more useful to navigate hierarchically through the different levels of the service document in order to identify deposit nodes.

Any feedback on this topic is probably most useful in the ORE Google Group(image)

Pointless Password Pedantry


Nobody trusts me, and nobody can agree on what the best way of making me trustworthy is.

This is the sense that I get from password form schemes, when I'm signing up for new services. I don't know about you, but I have literally tens of passwords to remember, and so, sensibly, I have devised a personal algorithm to generate passwords in different situations, rather than doing something deeply insecure like writing them down in a text file on my desktop (yes, people really do do this, even with system root passwords!).

Without giving away too much, my password algorithm allows me to domain or namespace my passwords both in terms of the service they are for, and the context they are being used in. Further, there is a feedback loop between these two components which explains how to modify the password further in a way which is not possible to predict in advance, and upon which a further set of standard modifications is then applied. The result: easy to reconstruct without the aid of memory but totally unguessable passwords. They include alphanumeric characters, special characters and both capital and lower case letters. They are a paragon of good password design.

So why oh why oh why do different services have such wildly different notions of "good" passwords. Let me give you some examples. Sourceforge don't permit special characters in their passwords! eBuyer don't permit passwords of more than 20 characters (the passwords that my algorithm generates can be extremely long, adding to their security). My online bank requires 2 digits and 2 capital letters, and disallows certain special characters. So I still have to remember which services require which variations on the algorithm, and I'm constantly having to make new adjustments to it. The problem is, that many services conflict with their requirements: you MUST have special characters, you MUST NOT have special characters. How's a security conscious person going to win? I suppose I could start writing my passwords down in a plain text file on my desktop ...

Why don't these systems just implement something like:

and reject passwords that come out at less than "Reasonable"?(image)

The Data Access Layer Divide


Warning: technical post.

One of the things that has been giving me consternation this week is the division between the data storage layer and the application layer. A colleague of mine has been working hard on this problem for some months for DSpace, and his work will form the backbone of the 1.6 release next year. As an new HP Labs employee, I'm just getting involved in this work too, with my focus currently on identifiers for objects in the system (not just content objects, but everything from access policies to user accounts).

We are replacing the default Handle mechanism for exposing URLs in DSpace with an entirely portable identification mechanism which should support whatever identifier scheme you want to put on top of it. DSpace is going to provide its own local identification through UUIDs, so that we can try to break the dependency of identification of artifacts in the system away from the specific implementation of the storage engine. That is, at the moment, database ids are passed around and used with little thought. But what happens if the data storage layer is replaced with something which doesn't use database ids? It's not even slightly inconceivable. Hence the introduction of the UUID.

Now, here's where it gets tricky. The UUID becomes an application level identifier for system artifacts. Fine. The database is free to give columns in tables integer ids, and use them to maintain its own referential integrity. Fine.

I have several questions, and some half-answers for you:

- Why is this a problem?

Suppose I have two modules which store in the database. Lets use a DSpace example of Item and Bitstream objects (DSpace object model sticklers: I know what I'm about to say isn't really true, it's for the purposes of example): I want to store the Item, I want to store the Bitstream, and I want to preserve the relationship between them. Therefore, the Item storage module needs to know how to identify the Bitstream (or vice versa). If I want, I can use the UUIDs, nice long strings, which may have implications on my database performance; why use a relational database if I'm going to burden it with looking up long strings when it could be using nice small integers?

So the problem is: how does the Item get to find out the Bitstream storage id?

- How far up the API can I pass the database id?

The answer to this is "not very far". In fact, it looks like i can't even pass it as far as the DAO API.

- Can I use a RelationalDatabase interface?

The best solution I've come up with so far is to allow my DAO to implement a RelationalDatabase interface, so that other DAO implementations can inspect it to see if they can get database ids out of it. Is that a good solution? I don't know, I'm asking you!

- What's the point?

At the moment the DSpace API is awash with references to the database id. It's fine for the time being, and most people will never get upset about it. But it bothers engineers, and it will bother people who want to try and implement novel storage technologies behind DSpace.

The title of this post reflects my current feeling that these two particular layers of the system, the application and the data storage, have, at some point, to collide; can we really engineer it so that no damage occurs? Answers on a postcard.(image)

OAI-ORE Alpha Specifications


The ORE Project has released the first draft of the specifications for public consumption. There is due a final Technical Committee meeting in January next year which may cause changes to this initial draft:

BMC and the Free Open Repository Trial


Our good buddies at BioMedCentral's Open Repository team have released the latest upgrade to their service, and are offering 3 month trial repositories for evaluation. From the DSpace home page:

BioMed Central announced the latest upgrades to Open Repository, the open access publisher's hosted repository solution. Open Repository offers institutions a cost effective repository solution (setup, hosting and maintenance) which includes new DSpace features, customization options, improved user interface. Along with the annoucement of the upgrades, Open Repository is offereing a free 3-month pilot repository, so institutions can test the suitability of the service without obligation. See the full articles in Weekly News Digest and in Alpha Galieo.

Multi-lingualism and the masses


Multi-lingualism, and the provision of multi-lingual services, is one of those problems that just keeps on giving. Like digging a hole in sand which just keeps filling with water as fast as you can shovel it out again, or the loose thread which unravels your clothes when you pull on it. I remember being told, back at the start, that multi-lingualism was a solved problem; that i18n allowed us to keep our language separate from our application.When the first major work was done on DSpace to convert the UI away from being strictly UK to being internationalised, there was great cause for celebration. This initial step was extremely large, and DSpace has reaped the benefits of having an internationalised UI, with translations into 19 languages at time of writing. It's also helped me, among others, understand where else we might want to go with the internationalisation of the platform, and what the issues are. This post is designed to allow me to enumerate the issues that I've so far come up against or across, to suggest some directions where possible, but mostly just to help organise thoughts.So lets start with the UI. It turns out that there are a couple of questions which immediately come to the fore once you have a basically international interface. The first is whether display semantics should be embedded in your international tags. My gut reaction was, of course, no ... but, suppose, for example, emphasised text needs to be done differently in different locales? The second is in the granularity of the language tags, and the way that they appear on the page. Suppose it is better in one language to reverse the order of two distinct tags, to dispense with one altogether, or to add additional ones? All of these require modifications in the pages which call the language specific messages, not in the messages themselves. Is there a technical solution to these problems? (I don't know, by the way, but I'm open to suggestion).We also have the problem of wholesale documentation. User and Administrator help, and system documentation. Not only are they vast, but they are often changing, and maintaining many versions of them is a serious undertaking. It seems inappropriate to use i18n tagging to do documentation, so a different approach is necessary. The idea of the "language pack" would be to include not only custom i18n tags, but also language specific documentation, and all of the other things that I'm going to waffle about below.Something else happens in the UI which is nothing to do with the page layout. Data is displayed. It is not uncommon to see DSpace instances with hacked attempts at creating multi-lingual application data such as Community and Collection structures, because the tools simply don't yet exist to manage them properly. For example: the English and Swedish terms are included in the single field for the benefit of their national and international readership.Capturing all data in a multi-lingual way is very very hard, mostly because of the work involved. But DSpace should be offering multi-lingual administrator controlled data such as Communities and Collections, and at least offering the possibility of multi-lingual items. The application challenges here are to:Capture the data in multiple languagesStore the data in multiple languagesOffer administrator tools for adding translations (automated?)Disseminate in the correct language.Dissemination in the correct language ought not to be too much hassle through the UI (and DSpace already offers tools to switch UI language), but I wonder how much of a difficulty this would be for packaging? Or other types of interoperability? Do we need to start adding language qualifiers to everything? And what happe[...]

CRIG Meeting Day 2 (2)


Topics for today:

The ones that interest me the most are probably these:

- Death to Packages

Not really Death to Packages, but lets not forget that packaging sometimes isn't what we want to do or what we can do.

- Get What?

This harks to my ORE interest, as to what is available under the URLs, and what that means for something like content negotiation.

- One Put to Multiple Places

Really important to distributed information systems (e.g. ethosnet integration into local institutions). Also, this relates, for me, to the unpackaging question, because it introduces differences between what systems might all be expecting.

- Web 2.0 interfaces (ok, ok)

I'm interested in web services. Yes it's a bit trendy. But it is useful.

- Core Servies of a Repository

For repository core architecture, this is important. With my DSpace hat on I'd like to see what sorts of things an internal service architecture or api ought to be able to support(image)

CRIG Meeting Day 2 (1)


It's first thing on day two. I'm late because I have to get all the way across town, which takes a surprisingly long time in London. I should have just stayed at a nearby hotel. Oh well.

The remainder of yesterday was interesting. Scope for live blogging is difficult, as the conference is extremely mobile. Today I will have to pick a point and hide in a corner to get you up to date.

In the afternoon we discussed the CRIG scenarios, and then implemented something called a Dotmocracy, which involves sticking dots (like house points at school) next to topics which appeared which we were interested in. When we start up today, the first order of business will be to see what topics made the cut. From what I saw at the end of the day, this will include Federated Searching, Google Search, and package deconstruction (my personal favourite this week).

As a brief aside, one running theme has been "no more standards". As it happens, I disagree with this. We're never going to get everything thinking the same and working the same. That's why there are so many standards, and why new ones get made all the time. It's the way of the world. At least, with a standard, though, when you have implemented one, you at least have a way of telling people what you did, over the home grown undocumented solutions which are the alternative.

Right, I suppose I'd better get my skates on.(image)

CRIG Meeting Day 1 (2)


See also Jim Downing's live blogging.

We've just done a round of preliminary unconferencing, where the CRIG Podcast topics were brainstormed onto flip charts. Not sure how useful that's going to be, but I'm going to approach the whole thing with an open mind. I've got my marker pen, my baloon, and my three dots.

wish me luck ...(image)

CRIG Meeting Day 1 (1)


Some live blogging; may be slightly malformed, as this is happening inline, with no post-editing.

Les Carr and Jim Downing have introduced us to the CRIG workshop first day. We're unconferencing which means that there's not a programme! We're going to try and stay at the abstract or high level discussion, not try to talk about technology.

David Flanders outlines the meeting philosophy. The outputs aimed for the meeting include: ideas (bluesky), standards and scenarios and how they can be linked together. The outputs will be taken to OR08. The best way for a group to produce good stuff is for everyone to think about themselves. Makes me think of an article I read recently:

We are not about creating new specs.

Julie then brings us some stuff about SWORD. See my previous post on this. We are going to have implementations for xrXiv, white rose research online and Jorum. A SPECTRa deposit client, and later an article in Ariadne and a presentation at OR08.

Break time ... tea and coffee!(image)

CRIG Podcast


A couple of weeks ago the JISC CRIG (Common Repository Interfaces Group) organised a series of telephone debates on important areas for it. These have now been edited into short commentaries which might be of interest to you, and are aimed at priming and informing the upcoming "unconference" to be held 6/7 December in London:

The "unconference" will take place at Birkbeck College in Bloomsbury, London. Take a listen, and enjoy. Yours truly appears in the "Get and Put within Repositories" and the "Object Interoperability" discussions.(image)

SWORD 1.0 Released


Just a quick heads up to say that the SWORD 1.0 release is now out and ready for download from SourceForge:

Here you will find the common java library which supports repositories wanting to implement SWORD, plus implementations for DSpace and Fedora. There is also a client (with GUI and CLI versions) which you can use to deposit content into the repositories.

The DSpace implementation is designed only to work with the forthcoming DSpace 1.5 (which is currently in Alpha release). Your feedback and experiences with the code would be much appreciated. We expect to be making refinements to the DSpace implementation up unitl DSpace 1.5 is released as stable.(image)

Scandinavian Dugnad


I was invited by the Scandinavian DSpace User Group meeting to join them in their first official meeting yesterday in Oslo. It was great to see so many people representing a small-ish geographical area and a reasonably small population all together from 4 nations (Norway, Sweden, Finland and Denmark) to talk about DSpace. Probably 35 people all-in, with plans to extend the group to be the Nordic DSpace User Group to include members from Iceland, and perhaps even the Faroe Islands, and Greenland (if DSpace instances appear there).

In the grand traditions of Open Source and Open Access, I borrowed presentations given at the recent DSpace User Group Rome, and gave them an update on the state of the DSpace Foundation, DSpace 2.0, and then went on to produce some original slides telling folks how to get involved in DSpace developments. Hopefully all the content will be available on the web soon.

As your humble chronicaller struggled with his sub-par Norwegian, he picked up some interesting things. There is good user end development going on in Scandinavia which could be harnessed to bring improvements to the DSpace UI. There are also increasingly many requests for "Integration with ...", where the object of integration is one of a variety of library information systems. Statistics are high on the agenda here as they are everywhere else. They are also a base of experts in multi-language problems stemming from being polyglot nations with additional letters in their native alphabets.

It's clear where the future of repositories lie in Scandinavian nations where the national interest and the community feature prominently in society and culture. Bibsys, a major supplier of library systems and services in Norway (and organisers of the meeting), have 29 DSpace clients on their books already, and are looking at tighter integration between it and their other products, right down to the information model level. National research reporting systems are much desired repository data sources, and internal information systems at each institutions are starting to feed into their public repositories.

With such a big user group, and such a community focus, there is little doubt in my mind that the Nordic user group will be a great asset to the DSpace users in that region, and probably to the DSpace community as a whole.

PS Dugnad is a Norwegian word effectively referring to voluntary, communal work which benefits the community to some degree, but is also social and enjoyable for the participants. It also formed the basis of the 2006 DSpace User Group Meeting in Bergen

Exciting news from the pages of the Chronicles


Some of you will already know this, but for the benefit of those that don't but wanted to know, here is some job related news on my part.

With the recent launch of Spiral, I have felt free to consider again my place in the world, the work I do on Open Source and Open Access, and my general future, knowing that if I were to leave Imperial College, I would not be leaving having achieved nothing visible.

I have, therefore, decided to make a move from the academic into the commercial sector, and have taken up a position with HP Labs to work with DSpace especially in the context of India, where it has become extremely popular. So towards the end of next month you will see the "About Me" section of this blog get updated, and I may vanish off the radar for a week or two while I get myself up and running in this new post.

I'm greatly looking forward to working with the DSpace folks both in HP Labs Bristol, Bangalore and Vermont!(image)

DSpace 1.5 Alpha with experimental binary distribution


The DSpace 1.5 Alpha has now been released and we encourage you to download this exciting new release of DSpace and try it out.

There are big changes in this code base, both in terms of functionality and organisation. First, we are now using Maven to manage our build process, and have carved the application into a set of core modules which can be used to assemble your desired DSpace instance. For example, the JSP UI and the Manakin UI are now available as separate UI modules, and you may build either or both of these. We are taking an important step down the road, here, to allowing for community developments to be more easily created, and also more easily shared. You should be able, with a little tinkering, to provide separate code packages which can be dropped in alongside the dspace core modules, and built along with them. There are many stages to go through before this process is complete or perfect, so we encourage you to try out this new mechanism, and to let us know how you get on, or what changes you would make. Oh, and please do share your modules with the community! Props to Mark Diggory and the MIT guys for this restructuring work.

The second big and most exciting thing is that Manakin is now part of our standard distribution, and we want to see it taking over from the JSP UI over the next few major releases. A big hand for Scott Phillips and the Texas A&M guys for getting this code into the distribution; they have worked really hard.

In addition to this, we have an Event System which should help us start to decouple tightly integrated parts of the repository, from Richard Rodgers and the guys at MIT. Browsing is now done with a heavily configurable system written initially by myself, but with significant assistance from Graham Triggs at BioMed Central. Tim Donohue's much desired Configurable Submission system is now integrated with both JSP and Manakin interfaces and is part of the release too.

Further to this we have a bunch of other functionality including: IP Authentication, better metadata and schema registry import, move items from one collection to another, metadata export, configurable multilingualism support, Google and html sitemap generator, Community and Sub-Communities as OAI Sets, and Item metadata in XHTML head elements.

All in all, a good looking release. There will be a testathon organised shortly which will be announced on the mailing lists, so that we can run this up to beta and then into final release as soon as possible. There's lots to test, so please lend a hand.

We are also experimenting with a binary release, which can be downloaded from the same page as the source release. We are interested in how people get on with this, so let us know on the mailing lists.

Come and get it:

DSpace User Group 2007, Rome


Last week was the annual DSpace User Group Meeting, this year held in Rome, hosted by the Food and Agriculture Organization of the United Nations:

These guys have an interest in DSpace for sharing knowledge throught the developing world, and kindly offered to run the user group this year. The FAO building is set at the east end of the incredible Circus Maximus, and just 5 minutes up the road from the Colosseum. And we could see all of this from the 8th floor terrace cafe where lunch and coffee was served every day.

The presentations for this event are mostly available online, at:

If there are presenters reading this whose papers are not yet online, please contact the conference organisers so they can make it available.

I felt that this year the balance between technical and non-technical presentations was struck particularly well. While there were streams of non-technical presentations, there were highly technical tracks for the developers among us to attend. Specifically worth a mention was Scott Phillips' Introduction to Manakin, which is something we will all need to get to grips with in the long run, and something which I knew woefully little about. After that session, though, I'm confident about getting stuck in.

The quality of the work going on with DSpace is definitely reaching a high degree of maturity, with increasingly many developments leveraging the latest features of DSpace in new and innovative ways. For me this suggests that our platform has approached a critical point where we must, as a community, find a way to make these developments easier to share and easier to adopt and easier to write.

So thanks from me to the organisers. It was great to see the usual suspects again, but equally great was it to put faces to names from the mailing lists and IRC. See you all next year!(image)

my my where did the summer go


OK, ok, it's been a long long time since I updated. Did I say at the beginning that this was an experiment in seeing if I was capable of maintaining a blog? If I didn't I should have done.

But there's a good reason that I've not updated for a while. That is, that I've been working flat out on the Imperial College Digital Repository: Spir@l, and am pleased to finally announce in a quiet way that we are officially LIVE:

On the outside it doesn't look too serious. A standard looking DSpace, I hear you say, with an Imperial College site template on it. And you'd be right. But only about the tip of the ice-berg.

Without wishing to blow my own trumpet (modesty is the third or fourth best thing about me), please do check out the article which I co-wrote with my good colleague Fereshteh Afshari:

And you may also be interested in my presentation at the recent DSpace User Group Meeting in Rome 2007 (more on that later, maybe):

I could probably be persueded to write a little here about how it works; maybe you'll even get snippets from the monolithic technical documentation that I'm in the middle of writing.

Oh, and there's more news, but now I've got your attention again you have to wait for the next installment.(image)

EThOSnet Kick-Off


On Tuesday of this week the EThOSnet Project Board met for the first time to kick off this significant new project. For background, this project is the successor to the EThOS project, which in turn grew out of the Scottish projects: Theses Alive at Edinburgh, DAEDALUS at Glasgow, and Electronic Theses at the Robert Gordon University.

The aim of EThOSnet is to take the work done under EThOS and bring it up to a point where UK institutions can actually start to become early adopters, to start to digitise the back-catalogue of print theses in the UK, investigate technology for the current and the future incarnations of the system, and to basically kick-start a genuinely viable service for deposit and dissemination of UK theses.

At this stage, the project does not have a Project Manager, which is causing minor hold-ups initially, but Project Director, and Director of Library Services Clare Jenkins of Imperial College Library has stepped in to hold things together until one is appointed (we are expecting to hear very soon). In the interim, the Project Board has also been put in place to check that all the 7 Work Packages have the things they need to get going.

Of these 7 workpackages, the first and last are concerned with project management and exit strategy, and the meat of the project will take place in packages 2 - 6. Details of these work packages are available in the project proposal, which will hopefully be available on the JISC website soon.

A quick summary, then, of some of the changes and more concrete decisions that we made during the meeting:

  • We have set a pleasingly high target of 20,000 digitised theses and 3,000 born-digital theses by the end of the project. This will be sourced from the many institutions who have already expressed an interest in adopting the service, before the project is even going!

  • The first port of call for the technology is to smooth the process of the existing software tools for repository users. I would hope to have something which works well for DSpace available quickly, and general enough to be part of the main distribution. EPrints is already fully compliant, and Fedora has representitives from the University of Hull looking after it.

  • Communications will be done primarily through a soon-to-exist project wiki, and it is hoped that the existing E-Theses UK list will be used more heavily than it is already. Imperial College has agreed to host the existing ethos website, the wiki, and potentially the toolkit if necessary (currently hosted at RGU).

  • Toolkit development will be ongoing, with work being done on it within a wiki, but with the option to move to some XML format for the final product

This is a very big project, and I can't possibly represent everything that came out of Tuesday's meeting here. In the near future expect to see links to the project wiki appear and more information to come out.(image)

vive la revolucion


Today I'm happy to see a major hardward manufacturer teaming up with a major Linux distro, and doing so in a nice visible place like the BBC:

I've been a Linux user for some years now, but when I first made the switch from the competition it was still a very difficult thing to do, even as a professional computer geek. Ubuntu seems pretty good, and hopefully it will help encourage non-expert users to have it installed before they even get their laptop home.(image)