Subscribe: Library spring
Added By: Feedage Forager Feedage Grade B rated
Language: English
data  day  display  fedora  good  interesting  library  new  open source  open  project  repositories  repository  research 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Library spring

Library spring

"On the sunny side of the street." On innovation for academic research libraries, and keeping up with the Googles.

Last Build Date: Wed, 16 Sep 2015 23:43:01 +0000


IDCC14 notes, day 2: keynote Atul Butte

Thu, 06 Mar 2014 16:52:00 +0000

Part 2 in a series of notes on IDCC 2014, the 9th International Digital Curation Conference, held San Francisco, 24-27 feb. Day two kicked off with a fantastic keynote by Atul Butte, Associate Professor in Medicine and Pediatrics, Stanford University School of Medicine: Translating a trillion points of data into therapies,diagnostics and new insights into disease [PDF] [Video on Youtube]. This one was well worth a separate blogpost. Butte starts his presentation with some great examples of how the availability of a wealth of open data has already radically changed bio/medial research. Over one million datasets are now openly available in the GeneChip standardized format. A search for breast cancer samples in NCBI Geo datasets database gives 40k results, more than the best lab will ever have in their stores. And PubChem has more samples than all pharma companies combined, completely open. The availability of this data is leading to new developments. Butte cites a recent study that by combining datasets revealed ‘overfitting’, where everybody does an experiment in exactly the same way leading to reproducable results that are irrelevant to the real world. But this is tame compared to the change in the whole science ecosystem with the advent of online marketplaces. Butte goes on to show a number of thriving ecommerce sites - “add to shopping cart!” - where samples can be bought for competitive prices. Conversant Bio is a marketplace for discarded samples from hospitals, with identifiers stripped off. Hospitals have limited freezer space, have biopsy samples that can be sold, and presto. What about the ethics? "Ethics is a regional thing. They can get away with a lot if stuff in Boston we can't do in Palo Alto." Now any lab can buy research samples for a low price and develop new blood marker tests. This way recently a test was developed for preeclampsia, the disease now best known from Downton Abbey.Marketplaces also have sprung up for services, such as This is a clearinghouse for medical research services, including animal tests. Thousands of companies provide these worldwide. Butte stresses that it's is not just a race to the bottom and to China, but that this also creates opportunity for new specialised research niches, such as a lab specializing in mouse coloscopies. Makes it possible to do real double blind tests by just buying more tests from different vendors (with different certifications, just to spread). This makes it especially interesting to investigate other effects of tested and approved drugs. Which is a good thing, because the old way of research on new drugs is not sustainable when patents run out (the “pharma patent cliff of 2018”). This new science ecosystem is built on top of the availability of open data sets, but there are questions to be solved for the sustainability. Butte sees two players here, funders and repositories themselves.Incentives for sharing are lacking. Altmetrics are just beginning, and funders need to kick in. Secondary use grants are an interesting new development. Clinical trials may be the next big thing. The most expensive experiments in the world, costing $200 mln each. 50% fails and not even a paper is written about them... Butte expects funders to start requiring publications on negative trails and publishing of the raw data.The international repositories are at the moment mostly government funded and this may run out. Butte thinks that mirroring and distributing is the future. He also stresses that repositories need to bring the cost down - outsourcing! - and real show use cases, that will inspire people. The repositories that will win are the ones that yield the best research.[...]

IDCC14 notes, day 1: 4c project workshop

Sun, 02 Mar 2014 18:35:00 +0000

Part 1 in a series of notes on IDCC 2014, the 9th International Digital Curation Conference, held San Francisco, 24-27 feb.In stark contrast with the 'European' 2013 edition, held last year in my hometown Amsterdam, at this IDCC over 80% of the attendees were from the US. That’s what you get with a west coast location, and unfortunately it was not made up by more delegates from Asia and down under. However as the conference progressed it became clear that despite the vast differences in organisation and culture, we’re all running into the same problems.IDCC 2014 Day 1: pre-conference workshop on 4C is an EU financed project to make a inventory of the costs of curation (archiving, stewardship, digital permanence etc.) With a 2 year project span it’s relatively short. The main results will be a series reports, a framework for analysis (based on risk management) and the ‘curation cost exchange’, a website where (anonimized) budgets can be compared.The project held a one-day pre-conference workshop “4C - are we on the right track?” at which a roadmap and some intermediate results were presented, mixed with more interactive sessions for feedback from the participants. It didn’t always work (the schedule was tight) but still it was a day full of interesting discussions.Neil Grindley noted that since the start of the project the goal has shifted from “just helping people calculate the cost” to a wider context. Beyond the actual cost (model) of curation: also the context, risks management, benefits and ROI. ROI is especially important for influencing decision makers, given the limited resources.d3-1 - evaluation of cost models and needs gaps analysis draftCost modelsCost models are difficult to compare and hard to do. Top topics of interest: risks/trustworthiness, sustainability and data protection issues. Some organizations are unable or unwilling to share budgets. Special praise was given to the Dutch Royal Library (KB) for being a very open organisation for disclosing their business costs.The exponential drop of storage costs has stopped. The rate has fallen from 30-40% to at most 12%. It is impossible to calculate costs for indefinite storage. This lead to a remark from the audience: "we're just as guilty as the researchers really, our horizon is the finish of the project.” We have to use different time scales - you have to have some short time benefits, but also keep the long term in scope. However, costs are much more than storage. Rule of a thumb: 1/2 ingest, 1/3 storage, 1/6 access. Preservation and access are not necessarily linked. Example is the LOC twitter archive which they keep on tape. Once (if) legal issues currently prohibiting opening this archive are resolved, access might be possible via amazon’s 'open data sets' where you pay for acces by using EC2. The economics work because amazon keeps it on non-presistent media and provides access, and LOC keeps it on persistent media but no access.Other misc notesA detailed mockup of the cost exchange website was demoed and if all the functionality can be realized, this may be a very useful resource.The workshop included a primer on professional risk management, based on ISO 31000 standard. “Just read this standard, it's not very boring!”. Originally from engineering, risk management is now considered mature for other fields as well. German Nestor project, really clear definitions on what a repository is, a useful resource comparable to the JISC Planets Foundation - great tools.CDL DataShare is online - a really nice, clean interface.[...]

The EJME plugin: improving OJS for articles with data

Wed, 22 Feb 2012 13:46:00 +0000

The EJME project has wrapped up and delivered! To quote the press release from SURFfoundation: "Enhanced Publications now possible with Open Journal Systems - Research results published within tried-and-tested system using plug-ins". That's all great, and so is the documentation, but aimed at those in the know already. A little more explanation is needed.Who is EJME for?Any journal that uses OJS for publishing and that wants to make it possible to have data files attached to articles (and as of December 2011, that's 11,500 Journals!).What does it do?Three things:improves the standard OJS handling of article 'attachments': files are available to editors and peers during the review process, and the submission process has been made (a little) easier; plays nicely with external data repositories: an attachment can be a link to a file residing elsewhere (but work just like an internal OJS attachment in the review and publishing stage), and an internal attachment that an author has submitted with the article can also be submitted to a data repository, creating a 'one-stop-shop' experience for the author;on publication, it automagically creates machine-readable descriptions of an article and its data files (in tech-speak: these are small XML files, so-called Resource Maps, in the OAI-ORE standard). These can be harvested by aggregators such as the Dutch site Narcis that can then do more great and wonderful things with it, for example slick visualizations.Great, but I only want some of that!That's perfectly possible. If you want only improved handling, they're included in the latest OJS version. The other two are in separate plug ins, install only what you need. Though I do recommend to install the resource map plug-in, it won't require any work after installing.What does it cost?Just like OJS itself, the plug-in is open source and free of cost. Installation is as easy as most OJS plug-ins.What does the journal have to do? Of course, software is only a tool. The real question is deciding what to do with it. Does the journal want a mandatory Data Access policy? Is there a data repository in the field to cooperate with? Once these questions are answered, the journal policy and editorial guidelines will need to changed to reflect them.Why would my journal want data along with articles?As science becomes more and more data-oriented (and that includes the humanities), publishing data along with articles becomes essential for the peer review system to function. There have been too many examples lately of data manipulation that would have been found out by reviewers if they would have checked the data. And for that, they need access to the data. Reviewers of course won't change their habits suddenly once data is available to them, but it's a necessary first step.(There are many other reasons, both carrots and sticks, for the greater good or the benefit of journal and author, but IMHO this is the pivotal point).Q: Why name it EJME, such a silly name?Enhanced Journals Made Easy was a little optimistic, I admit. Enhanced Journals Made (A Little) Easier would have been better. You live and learn!Want to know more about EJME? Get started with the documentation..kLink { display: none; } .iAs { display: none;} .kLink { display: none; } .iAs { display: none;} [...]

OR11: Misc notes

Sat, 02 Jul 2011 07:21:00 +0000

I like going to conferences alone, it’s much easier to meet new people from all over the world than when you’re with a group, groups tend to cling together. With a multitracking conference like OR11 however, the downside is that there’s so much to miss. Especially since I like to check out sessions from fields I’m not familiar with. At OR11, I wanted to take the pulse of DSpace and Eprints, and not just faithfully stick with the Fedora talks.In this entry, I focus on bits and bobs I found noteworthy, rather than give a complete description. I skip over sessions that were excellent but have already widely covered elsewhere (for instance at library jester) such as Clifford Lynch closing plenary.“Sheer Curation” of Experiments: data, process and provenance, Mark Hedges slides [pdf]"Sheer curation" is meant to be lightweight, with curation quietly integrated in the normal workflow. The scientific process is complex with many intermediate steps that are discarded. The deposit at the end approach misses these. Goal of this JISC project is to capture provenance experimental structure. It follows up on Scarp (2007-2009).I really liked the pragmatic approach (I've written this sentence often - I really like pragmatism!). As the researchers tend to work on a single machine and heavily use the file system hierarchy, they wrote a program that runs as a background process on the scientists’ computer. Quite a lot of metadata can be captured from log files, headers, filenames. Notably, it also helps that much work on metadata and vocabulary has already been done in the field in the form of limited practices and standards.Being pragmatic also means discarding nice-to-haves such as persistent identifiers. That would require the researchers to standardise beyond their own computer and that’s asking too much.The final lesson learned sounded familiar: it took more, much more time than anticipated to find out what it is the researchers really want.SWORD v2SWORD2: looks promising and useful, and actually rather simple. Keeping the S was a design constraint. Hey, otherwise we’d end up with Word, and one is more than enough!Version 2 will do full Create/Read/Update/Delete (CRUD). Though a service can always be configured to deny a certain actions. It’s modelled on Google’s Gdata and makes an elegant use of Resource Maps and dedicated action URLs.CottageLabs, one of the design partners, made a really introduction video to Sword v2 demonstrating how it works:It looks really useful and indeed still easy (as per Einstein's famous quip, as simple as possible but not simpler). If you’re a techie, dive into If you’re not, just add Sword compliance to your project requirements!Ethicshare & Humbox, two sessions on community buildingTwo examples of successful subject-oriented communities that feature a repository, each with some good ideas to nick.Ethicshare is a community repository that aggregates social features for bioethics:one of the project partners is a computer scientist who studies social communities. Because of this mutual interest (for the programmer it’s more than just a job) they have had the resources to fine tune the site.the field has a strong professional society that they closely work with.glitches at beginning were a strong deterrent to success - so yes, release early and often, but not with crippling bugs!the most popular feature is a folder for gathering links, and many people choose to make them public (it’s private by default).before offering it to the whole site, new features are tried out on a small, active group of around 30 testers.for the next grant phase they needed more users quickly, so they bought ads. $300 for Facebook ads yielded for 500 clickthroughs, $2000 Google ads 5000. This (likely) contributed to number of unique visitors rising from 4k to 20k per month. Tentative conclusion: these ads cost relatively little and are effective for such a specialized subject[...]

OR11: New in EPrints 3.3: large scale research data, and the Bazaar.

Wed, 29 Jun 2011 11:08:00 +0000

As I mentioned in the overview, I was very impressed by what's happening in the Eprints community. The new features of the upcoming 3.3 are impressive as they seem to strike the right balance between pragmatism and innovation. Thanks to an outstandingly enthousiastic and open developer community, they're giving DSpace (and to a lesser extend Duraspace) a run for the money."Energize": could've been the motto of the Eprints communitySupport for research data repositoriesThe new large scale research data is also a hallmark for pragmatic simplicity. EPrints avoid getting very explicit about subject data classification and control, taking a generic approach that can be extended.Research data can come in two container datatypes, ‘Dataset’ and ‘Experiment’. A Dataset is a standalone, one-off collection of data. The metadata reflects the collection. The object can contains one or more documents, and must also have a read-me file attached, which is a human-oriented manifest, as, though machine-oriented complex metadata is possible, it would deter actual use.The other datatype is Experiment. This describes a structural process that may result in many datasets. The metadata reflects process and supports the Open Provenance Model.Where the standard metadata don’t suffice, one of the data streams belonging to the object can be an xml file. If I understood correctly, xpath expressions can then be used for querying and browsing. Effectively this unleashes the shackles of the standard metadata definitions and creates flexibility similar to Fedora. It's very similar to what we're trying to do in the FLUOR project with a SAKAI plugin that acts as a GUI for a data repository in Fedora. Combining user-friendliness with configurable, flexible metadata schemes is a tough one to pull off, I'll certainly keep an eye out on the way EPrints accomplishes this.The BazaarThe EPrints Bazaar is plug-in management system and/or an ‘App Store’ for EPrints, inspired by Wordpress. For an administrator it's fully GUI driven, versatile and pretty fool-proof. For developers it looks pretty easy to develop for (I had no trouble following the example with my rusty coding skills).The primary design goal was that the repository including API must always stay up. They’re clever bastards: they based the plug-in structure on the Debian package mechanism, including the tests for dependencies and conflicts, which makes it very stable. Internally, they’ve run it for six months without a single interruption. Now that’s eating your own dog food!Off the beaten trackEPrints as a CRISThe third major new functionality of 3.3 is CERIF import & export. Primarily this is meant to link eprints repositories automatically to CRIS systems, but for smaller institutions that need to comply with reports in CERIF format but don’t have a system yet, using eprints itself may suffice as pretty much all the necessary metadata is in there. The big question is whether the import/export would allow a full lossless roundtrip, as I joined this session halfway (after an enthousiastic tweet prompted me to change rooms) I might've missed that.This sounds very appealing to me. Unfortuntaly, the situation in the Netherlands is very different, as a CRIS has been mandatory for decades for the Dutch Universities. Right now we’re in the middle of an European tender for a new, nationwide system, and the only thing I can say is that it’s not without problems. How I’d love to experiment with this instead in my institution, but alas, that won't be possible politicallyThe EPrints attitudeAs Les Carr couldn’t make it stateside, he presented it from the UK. The way this was set up was typical for the can-do attitude of the eprints developers: Skypeing in to a laptop which was put before a mike, and whenever the next slide was needed Les would cheerily call out ‘next slide please!’, after which the stateside co[...]

OR11: opening plenary

Wed, 22 Jun 2011 14:49:00 +0000

See also: OR11 overview

The opening session by Jim Jagielski, President of the Apache Software Foundation, focussed on how to make an open source development project viable, whether it produces code or concepts. As El Reg reports today, doing open source is hard. The ASF has a unique experience in running open projects (see also is apache open by rule). Much nodding in agreement all around, as what he said made good sense, but hard to put in practice. Some choice advise:

Communication is all-important. Despite all the new media that come and go, the mailing list still is king. Any communication that happens elsewhere - wikis, IRC, blogs, twitter, FB, etc - needs to be (re)posted to the list before it officially exists and can be considered. A mailing list is a communication channel which is asynchronous and participants can control themselves, meaning read or skip it at their time of choice, not the time mandated by the medium. A searchable archive of the list is a must.

Software development needs a meritocracy. Merit is built up over time. It’s important that merit never expires, as much open source committers are volunteers who need to be able to take time off when life gets in the way (babies, job change, etc).

You need at least three active committers. Why three? So they can take a vote without getting stuck. You also need ‘enough eyeballs’ to go over a patch or proposal. A vote at ASF needs minimally three positive votes and no negatives.
To create a community, you also need a ‘shepherd’, someone who is knowledgable yet approachable by newbies. It’s vital to keep a community open, so not to let the talent pool become too small. To stay attractive, that you need to find out what’s the ‘itch’ that your audience wants to scratch.

The more 'idealistic' software licenses (GPL and all) are "a boon firstmost to lawyers", because the terms ‘share alike’ and ‘commercial use’ are not (yet) clear in juridical context. Choosing an idealistic license can limit the size of the community for projects where companies play a major role. A commenter added that this mirrors the problems of the Creative Commons licenses. In a way, the apache license mirrors CCzero, which CC created to tackle those.

Open Repositories 2011 overview

Tue, 21 Jun 2011 19:01:00 +0000

Open Repositories was great this year. Good atmosphere, lots of interesting news, good fun. It's hard to make a selection from 49k of notes (in raw utf8 txt!). This post is a general overview, more details (and specific topics) will follow later.Texas State History MuseuMy key points:1. Focus on building healthy open source communitiesThe keynote by Jim Jagielski, President of the Apache Software Foundation, set the tone for much what was to come. An interesting talk on how to create viable open source projects from a real expert. The points raised in this talk came back often in panel discussions, audience questions and presentations later.More details here.2. The Fedora frameworks are growing upBoth Hydra and Islandora now have a growing installed base, commercial support available, and a thriving ecosystem. They've had to learn the lessons on open source building the hard way, but they have their act together. Fez and Muradora were only mentioned in the context of migrating away.Also, several Fedora projects that don't use Hydra still use the Hydra Content Model. If this trend of standardizing on a small number of de facto standard CM's, that would greatly ease mixing and moving between Fedora middleware layers.3. Eprints’ pragmatic approach: surprisingly effective and versatileOut of curiosity I attended several EPrints sessions, and I was pleasantly surprised, if not stunned by what was shown. Especially the support for research data repositories looks to strike the right balance between supporting complex data and metadata types, while keeping it simple and very usable out-of-the box. And also the Bazaar, which tops Wordpress in ease of maintainance and installation, but on a a solid engineering base that's inspired by Debian's package manager. Very impressive!More details here.Texans take 'em by the horns!Misc. notesSee part #3: Misc notesElsewhere on the web OR11 Conference program, presentations.Richard Davis, ULCC: #1 overview, #2 the Developers Challenge, #3: eprints vs. dspace.Disruptive Library Technology Jester day 1, day 2, day 3.Leslie Johnson - a good round-up with focus on practical solutions.#or11 Tweet archive on twapperkeeperPhotosets: bigD, keitabando, yours truly, all Flickr images tagged with or11, Adrian Stevenson (warning: FB!).Other observationsUnlike OR09, the audience was not very international. Italians and Belgians were relatively overrepresented with three and six respectively. I spotted just one German, one Swede and one Swiss, and I was the lone Dutchman. The UK was the exception, though many were presenters of JISC funded projects, which usually have budget allocated for knowledge dissemmination.As OR alternates between Europe and the US, the ratio of participants tends to be weighed to the 'native continent' anyway. But the recession seems to be hitting travel budgets hard in Europe now.As there were interesting presentations from Japan, Hong Kong and New Zealand, the rumour floating around that OR12 might be in Asia sounded attractive, I'd be very curious to hear more about what's going on there in repositories and open access. The location of OR12 should be announced within a month, let's see.[updated June 27th, added more links to other writeups; updated June 28, added Hydra CM uptake].kLink { display: none; } .iAs { display: none;} .kLink { display: none; } .iAs { display: none;} .kLink { display: none; } .iAs { display: none;} .kLink { display: none; } .iAs { display: none;} .kLink { display: none; } .iAs { display: none;} [...]

Mon, 20 Jun 2011 11:43:00 +0000

Catching up on old news, I came across an interesting presentation on CNI this spring on the Data Management Plans initiative. Abstract, recording of the presentation on youtube, slides.

DMP online is a great starting point (and one of the inspirations for CARDS) and this looks like the right group of partners to extend it into a truly generic resource. What's notable about the presentation is also the sensible reasons outlined for collaboration between this quite large group of prestigious institutions.All in all, something to keep an eye on.

Don't panic! Or, further thoughts on the mobile challenge

Tue, 05 Oct 2010 15:16:00 +0000

Two weeks ago, I posted some notes on the CILIP executive briefing on 'the mobile challenge', where I presented the effort of my library, the quick-wins 'UBA Mobiel' project. Those notes concentrated on the talks on the day. Now that it's had time to simmer (and a quick autumn holiday), I want to add some reflection on the general theme.Which basically boils down to Don't Panic (preferably in large, friendly letters on the cover).Is there really such a thing as a 'mobile challenge' for libraries? Well, yes and no. Yes, the use of internet on mobile devices is growing fast, and is adding a new way of searching and using information for everyone, including library patrons. The potential of 'always on' is staggering. And it is a challenge.However, it is also just another challenge. After twenty years of continuous disruption, starting with on-line databases, then web 1.0 and web 2.0, change is not new any more. Libraries are still gateways to information, rare and/or expensive (the definition of expensive and rare depending and varying on the context, also changing of course). And the potential of the paperless office may finally come to fruit with the advent of the iPad, but meanwhile printer makers are having a boon selling ever more ink at ridiculous prices.So, what to do?There are three ways to adapt. On one side are the forerunners, with full focus on the new and shiny. Forerunners get the spotlights, and tend to be extroverts that make good presentations. However, not everyone can be in front - it would get pretty crowded. It takes resources, both money and a special kind of staff. Two prominent examples given at several of the Cilip talks were NCSU and DOK Delft. Kudos to them, they're each doing exciting stuff, but they are also the usual suspects, and that's no coincidence.On the other extreme, there's not changing at all. For the institution, a certain road to obsolescence. For a number of library staff the easy way to retirement. Fortunately, their number seems to be rapidly dwindling, but nevertheless, finding the right staff to fulfil the jobs at libraries or publishers when the descriptions of these jobs are in flux was a much talked about topic, both in the talks and in the breaks. In practice, most libraries are performing a balancing act in between. And it is perfectly acceptable to be in the middle. Keep an eye on things. Stay informed. Make sure your staff gets some time to play with the toys that the customers are walking around with, and if they find out what's on offer in the library is out of sync, do something about it.[from tuesday tech]Which is pretty much what we did with UBA Mobiel. Nothing worlds hattering, not breaking the bank. We're certainly not running in front, but we're making sure our most important content (according to the customers) is usable. This way, when the chance comes along to do Something Utterly Terrific (Birmingham) or merely a Next Step Forward (upgrading our CMS) we know what to focus on.The response on our humble little project has been very positive. We may have hit a nerve, and I'm really glad to hear that it is inspiring others to get going. Go-Go Gadget Libraries![...]

Becoming upwardly mobile - a Cilip executive briefing

Fri, 17 Sep 2010 12:43:00 +0000

Cilip office in Bloomsbury, LondonOn September 15, Cilip (the UK Chartered Institute of Library and Information Professionals) and OCLC held a meeting on the challenge that mobile technology proves for libraries, called Becoming upwardly mobile Executive Briefing.The attendees came from the British Isles (UK and Ireland). Some of the speakers however came from elsewhere. Representing The Netherlands, I presented the UBA Mobiel project as a case study, which went well.The mere fact that I was asked to present our small low-key project - which in the end cost less than 1100 euro and 200 hours - as a case study along the new public library in Birmingham with a budget of 179 million pounds sterling shows how diverse the subject 'the mobile challenge' is.Thus the talks varied widely, and especially the panel discussion suffered from a lack of focus. It was interesting nevertheless.Attendees were encouraged to turn their mobiles on and tweet away, and a fair number of them did. See Twitter archive for #mobexec at twapperkeeper.1. Adam Blackwood, JISC RSCA nice wide-ranging introduction in a pleasant presentation, using lots of lego animation. In one word: convergence. To show what a modern smartphone can do, he emptied his pockets, then went on from a big backpack, until the table in front of him was covered with equipment, a medical reference, an atlas and so on. "And one more thing…".  The versatility of the devices coming at us means not only that current practices will be replaced, but also that they are going to merge in unexpected ways. Reading a textbook online is a different experience from reading it on paper, for instance. Augmented reality (in the broad sense of the word, not just the futuristic goggles) is a huge enabler that we should not block by sticking to old rules (such as asking to turn devices off in the library or during lectures).As for the demoes, it's a bit unfortunate that it always seem to be the same that are pointed to (NCSU, DoK), though they're still great. Using widgetbox to quickly create mobile websites was new to me, worth checking out further (the example was ad-enabled, hope they have a paid version, too).All in all, a great rallying of the troops.2. Brian Gambles, BirminghamA talk about the new public library in Birmingham. An ambitious undertaking, inspired by amongst others the new Amsterdam public library. The new library should put Birmingham on the cultural map, and itself become one of the major touristic attractions for the city, opening in 2013. It's also meant to 'open up' the vast heritage collection (the largest collection of rare books and photography of any public library in Europe). And to pay for it, they'll have to monetize those as well.A laudable goal, great looking plans, I wish them luck in these difficult times.The library is not just the books (the new Kansas city library sends all the wrong messages). The mobile strategy comes forth from the general strategy: open up services and let others do the applications. Open data, etc. They are working with apple to get on iTunesU for instance (partnership with the uni). Get inspiration from cultural sector, many interesting & much downloaded apps have come from museums. Notable especially is the Street museum of London (flash-y-website, direct iTunes ap link)Also, can't afford to hire enough cataloguers for the special collections - open up this as well, let crowdsurfing as a helpful tool. Surprised that there are people that like to correct OCR texts, which he thinks is a dreadful chore. So let's use it.3. Panel discussion.This wasn't as good as it could have been unfortunately, due to the wide range of the topic. Still some interesting points:Adrian Northover-Smith from Sony of course very much pro e-ink devices and against the iPad. It's a c[...]

Notes from CNI Spring meeting 2010

Tue, 04 May 2010 14:53:00 +0000

I was fortunate to attend the CNI Spring 2010 Task force meeting in Baltimore, USA. This was my second time at a CNI, the first one being 2007. Compared to my previous experience, it struck me how policy has come to dominate the program, where it used to be technology. Maybe it’s because the direction where we’re heading is clear - complex objects, enriched publications, open access - and the question is now how we to get there.Because the fragmented setup of research and academia in the US differs greatly from the situation elsewhere, this made the meeting more US-centric, which was a tad disappointing. However, it remains an interesting, intense pressure-cooker, of which afterwards it’s hard to believe it barely lasted a day and a half. Worth the jetlag.Two sessions stood out for me. First one was a presentation by Jane Mandelbaum from the Library of Congress on a collaboration with Stanford Institute for Computational and Mathematical Engineering (iCME), to create “Metadata remediation tools” (great name!): generating summaries, short titles and geographical data from wads of text.iCME is located in Silicon Valley, has close ties with companies there - Google, Yahoo, and small start-ups - and deals primarily with algorithms to understand text, especially with taxonomies. (which seems to be exactly what Google is trying, too, according to Steven Levy’s april 2010 article in Wired).Interesting, as we’ve tried this in my organization, and failed miserably. This was made to work, though it took two years (!) to iron out the wrinkles between two very different cultures.  Also, it’s not an equal partnership; most of the coding takes place in summer jobs, paid for by LoC. Main reason is the nature of LoC’s metadata, in which collections exist that differ greatly but are internally consistent, which makes them good candidates to refine algorithms on.Results for LoC: apart from the code (rough around the edges, scripts rather than applications) and the generated geographical and other metadata, insight in the usefulness and value-for-money of metadata. Software via the projectsite: of unexpected results, visualization of keyword patterns: other session I want to mention was on Cornell’s LoL approach. Taking the Library Outside the Library: A Light-weight Innovation Model for Heavy-weight Economic Times.An incubator-approach, outside regular channels, to quickly respond to trends. This presentation struck a chord with the audience, at moments there was an audible roar of keypresses as dozens of people typed in notable phrases in their twitter, blogging clients or notepads. One of those was when a quote from The Simpsons’ Krusty the Clown came up: "It's not just good, it's good enough!", another was the motto “there is no blame in trying something that doesn't work”. Clearly those struck a chord. I like the setup: a small group, consisting of staff from all departments, including circulation and rare books, that spend max 5% of their time. Membership is limited to two years. The group runs 3-5 risky projects, categorized as “from trivial to easy”.Examples: putting PD image collections on flickr and youtube, POD books from those flickr streams with Blurb, maintaining Wikipedia pages, iPhone app (made by a CS student). For mobile devices they use Siruna. Some projects were successful, some not. When projects finish succesfully, they are transferred to the regular organization; if that doesn’t work, they are killed off rather than letting them languish or peter out, as that would be discouraging.Very pragmatic and useful - and worth copying!Finally, the lively Twitter traffic is archived at[...]

The Red Room: workflow photo tour

Mon, 22 Feb 2010 13:34:00 +0000

(part two in a short series)

In response to questions on the RFID_LIB list, I created a short photo tour of the red room, focussing on the staff side of things: the types of crate used, usability issues we encountered etc.

I've used the full range of Flickr metadata to describe the issues, unfortunately the slideshow doesn't show descriptions by default, and notes not at all. So best viewed as set: Flickr Red Room.

Alternatively, when watching the slideshow, in the options turn 'always show description' on, and watch it fullscreen (bottom right).

(object) (embed)

The red crates are made of sturdy plastic. When it became clear that custom crates were way too expensive, we settled for industry standard parts in standard sizes, and we adjusted our shelves accordingly. Same for silkscreening the numbers, so we used industrial strength plastic numbers, which turned out very well, in half a year I haven't even seen one beginning of peeling. The lesson learned: don't try to be special, and look outside the box, err, book world.

For staff determining when to add to an existing crate, and when to pick a new, we use these rules-of-a thumb:
  • The display shows a filling % of each existing crate and the # of items inside. This is enough for staff to figure out if there's still room. If not, new crate. If there is:
  • in peak periods, when the number of empty crates becomes small: always add.
  • otherwise, it depends on the day on which the items in the existing crates were added. If the same, we add; if in the past, pick a new.
This way, we have the flexibility to deal with peak periods with slightly more than 1000 boxes; and in less busy times, we can avoid crates with content from multiple days, which makes the workflow for processing of items not picked up more complicated, or forces us to leave the whole box until all items are expired, causing delays for other patrons.

The Red Room: self-service for a closed stack library

Wed, 17 Feb 2010 14:34:00 +0000

Recently, the Libraries of the University of Amsterdam (UvA, not to be confused with Virginia's UVa - yet another reason to avoid small caps for abbreviations!) and the Amsterdam Polytechnic (HvA) completed the introduction of RFID technology for security and selfservice. It was an interesting project in many of ways. And not just because it finished within budget!European tendering was mandatory as the costs were well above the 200k€ limit. At first, I balked at this as a necessary bureaucratic evil. My personal opinion on this has completely reversed, however: with an unexpected outsider, Autocheck Systems, winning with a clear margin both in price and quality, this was a textbook case for the merit of the tendering process.By clearly committing our demands to paper in a neutral way, prejudice is taken out of the equation, or at least reduced to a minor multiplier. The trick is writing good specifications.Selfservice: for open and closed stacksPublic libraries have used RFID technology for over a decade now. This has created a mature market for open stacks. However, as an academic library, the vast majority of our circulation comes from closed stacks. Here, a different solution is needed, and when we embarked on this journey two years ago, turnkey products that are affordable for the amount of traffic did not exist.We were hoping for a clever, high-tech solution, not limited to our own imagination. We wanted to tap the creativity of the vendors, bring on fresh ideas! But we most certainly also did not want to write a blank check.The tender therefore was split up in lots. One for the mature technology, where the functional requirements were formulated clearly, and the scoring algorithm favoured price over extra features (to be precise, in a 7:3 ratio).For the closed stack solution however, we described our situation, with detailed circulation figures. The nature of the solution - intelligent shelves, lockers, and so on - was left to the vendor. To judge functionality against cost, the vendor would have to supply a detailed description of number of staff still needed to run the closed stacks, and all the actions in their workflow.Closed stack circulation: the old situationIn the old days, patrons would request materials in the online catalogue. The items would be picked up by the warehouse staff and brought to the backoffice to be checked and processed, and in piles on stacks behind the desk, sorter by patron name, accessible only by staff. A few hours or one day later, depending on the location of the items, the patron would come to the desk, and staff would retreive their material.For this system, a large number of staff was needed. Not only because the patron was serviced, but also since in the absence of a proper tracking system, the piles had to be checked time and time again, to add new requests for patrons that had already more material waiting, to remove materials that had not been picked up, and to keep everything sorted on alphabet... There was clearly room for improvement. Self-service was only one aspect of the overall workflow.However, there was one important restriction: privacy. A patron must only be able to borrow items that he or she requested, not items requested by others; and the name of the requesting patron must never be visible to others. In other words, the system must be fully anonymous. We've had run-ins in the past with professors that were spying on each others requested items...To cut a long story short, we're very pleased with the end result of this project, for both the open and closed stack solutions. In the remainder of this post, I'll concentrate on the Red Room, the closed stack.The red roomAutocheck Systems supplied the RFID technology and innovative workflow s[...]

OR 09: three more neat Fedora implementations

Mon, 15 Jun 2009 15:45:00 +0000

Open Repositories 2009, Day 4Three more notable sessions on implementing Fedora. Hopefully, the penultimate post before a final round-up. What a frantic infodump this conference was...Enhanced Content Models for Fedora - Asger Blekinge-Rasmussen (State and University Library Denmark)A hardcore technical talk, though impressive in the elegance of the two points shown: bringing the OO model to Fedora object creation, and a DB style ‘view’ for easy creating searching and browsing UIs.The first is created as an extension of Fedora 3’s standard Content Models, yet backward-compatible, which is a feat. Notable extra’s: declares allowed relations (in OWL lite), schema for xml datastreams. Includes validator service (which is planned as disseminator, too). Open source [sourceforge].Fedora objects can be manipulated at quite high level using API, but population needs to be done at much lower level. Thus most systems roll their own. Our solution: templates, data objects created as instances of CM’s, not unlike OO programming. Makes default values very easy. No need for handcoded foxml anymore, halleluja! Create, discover, clone templates using template web service.Then there are repository views, which bundle atomic objects into logical records. Search engine record might be made up of bundle of Fedora objects.Defined by annotated relations; view angles to create different logical records.‘view = none’: then omitted from results (useful for small particles you don’t want to have show up in queries, for instance separate slides).These simple API additions make it easy to create elaborate, simple GUI’s. Which includes the first one I’ve seen that comes close to a workable interface for relationship management - not quite a full drag’n drop, but getting there.Beyond the Tutorial:Complex Content Models in Fedora 3 - Peter Gorman, Scott Prater (University of Wisconsin Digital Collections Center)[presentation]Summary: A hands-on walk through of the Wisconsin DIY approach. Also, an excellent example of what a well-done Prezi presentation can look like: literally zooming in on details then zooming out on the global context was really helpful to see the forest for the trees.The outset: migrating >1million complex, heterogeneous digital objects into Fedora. Use abstract CM’s, atomistic, gracefully absorb new kinds and new combinations of content. Philosophy: 'fit the model to the content, not the content to the model'.(Not in prodction yet, prototype app; keep eye out for 'Uni Wisconsin digital collections')Prater starts out with the note that it’s humbling to see that the Hydra and escidoc people have been working on the same problem. However IMHO there’s no reason for embarrassment, as their basic solution is very elegant.Using MODS for toplevel datastream (similar approach to Hydra). STRUCT datastream: a valid METS document, tying objects to hierarchy. Important point: CM’s don’t define structure, that’s for STRUCT and RELS-EXT.Every object starts with a FirstClassObject, which points to 0-n child objects of arbitrary types. If zero it’s a citation. To deal with sibling relationships (ie 2 pages in specific order), an umbrella element is put on top with a METS resource map. This allows full METS functionality. Linking using simple STRUCT and RELS-EXT. Advantage over doing everything in RESLEXTS: that doesn’t allow to express sequencing.Now, to tie this ‘object soup’ together in an app (common problem for lots of objects, to turn the soup into a tree), the solution is simple: always use one monolithic disseminator, viewMETS(). This takes PID for FirstClassObject, returns valid METS doc containing object and all its (grand)children.This is brilliant:[...]

OR 09: eScidoc's infrastructure

Thu, 11 Jun 2009 15:00:00 +0000

eSciDoc Infrastructure: a Fedora-based e-Research Framework - Frank Schwichtenberg, Matthias Razum (FIZ Karlsruhe)

I had not expected this presentation to be as good as it was - it was a real eye-opener for me. It dealt solely and bravely on the underlying structure of eScidoc, not the solutions built on top of them (such as PubMan). So, delving into the technical nitty gritty.

So far, to me eSciDoc has been an interesting promise that seemed to take forever to materialize into non-vaporware. DANS wanted to use it as the basis for the Fedora-based incarnation of their data repository EASY, a plan they had to abandon when their deadline was looming near and the eScidoc API's were still not frozen. Apart from that, the infrastructure seemed also needlessly complex - why was another content model layer necessary on top of Fedora's own?

The idea behind the eScidoc approach is to take a user-centric approach, which in case of the infra, that's the programmer. What would she like to see, instead of Fedora's plain datastreams?
Tentative answer: an application-oriented object view.

eScidoc takes a full atomistic approach to content modelling: an Item is mapped to a fedora object (without assumption about the metadata profile - keeping it flexible). Then, Item has Component. An Item in practice consists of two fedora objects, with a ‘hasComponent’ relation between.

Object can be in arbitrary hierarchies: except the top hierarchies which are reserved for ‘context’, which can be used for institutional hierarchies (a common approach, I can live with that). All relationships are expressed as structmaps.

So far so good, but now the really neat part.

Consequences of the atomistic content model for versioning: a change can occur in any of the underlying fedora objects of a compound object, with consequences for both.
The eScidoc API's store the Object lifecycle automatically. And when one Component changes or is added, the Item object also changes version, but not the other Components.
(the presentations slides are really instructive on this, worth checking out when they're online).

This also delivers truly persistent ID’s (multiple types supported: DOI, handle, etc), separate from fedora’s PID’s which are not really persistent. And every version has one - both of the compound and the separate Item objects. All changes (update/release/submit events etc.) are logged in version log has events, if I remember correctly this log can be used for rollback ie it is a full transaction log.

This is the reason that the security model has to be in the escidoc layer, not fedora's (though the same policies & structures xacml are used). This is eScidoc's answer to the question common to many fedora projects: how to extend fedora's limited security? It might be best to take the whole security layer out of Fedora.

IMHO this is very exciting. This is about the last thing that a project would need to roll yourself - it is incredibly complex to get working correct and durable - and here it is, backed by a body of academic research - it is a German project after all. For me, this puts eScidoc firmly on the shortlist of frameworks.

OR 09: blogosphere links

Wed, 10 Jun 2009 15:37:00 +0000

Nearly three weeks afterwards, it's time to round up the OR 09 posts... Unfortunately, library life got in the way. Meanwhile, why not read the opinions of these honoured colleagues, that are undoubtly better informed: (Mark Leggott)

Open Repositories 2009 - Peter Sefton's trip report (
Open Repositories 2009 – Peter Sefton's further thoughts (

Leslie Carr (

John Robertson (Strathclyde) (Elliot Metsger, Johns Hopkins)

Finally, another bunch'o'links:

OR09: Four approaches to implementing Fedora

Fri, 05 Jun 2009 13:08:00 +0000

Open repositories 2009, day three, afternoon.So far, the conference had not been disappointing, but now it got really interesting. The sessions I followed in the afternoon each highlighted a specific approach of the problem that IMHO has been standing in the way of wider Fedora acceptance: middleware.What these four have in common, is that they all take leverage an existing OSS product and adapt it to use Fedora as datastore.1. Facilitating Wiki/Repository Communication with Metadata - Laura M. BartoloSummary: interesting approach, a traditional Fez spiced up with Mediawiki. With minimal coding a relative seamless integration.For this to work, contributors need to know MediaWiki markup, and to really integrate, must learn the fez-specific search markup. Also, I'm not sure how well this can be scaled up to true compound objects, given Fez' limitations.Notes:Goal: disseminating of research resources. Specific sites for specific science fields, ie soft matter wiki, materials failure case studies.MatDL repository: has a repository (Fedora+Fez), want to open up two-way communicating. Example: Soft matter expert community, set up with MediaWiki. "Mediawiki hugely lowers the barrier for participating": familiarity gives low learning curve.The question: how to integrate the repository with the wiki two-way.Thinking from user-centric approach. Accommodate user; support complex objects (more useful for research & teaching) thus describe them parts as individual objects.Components:- Wiki2FedoraBatch run. Finds wiki upload file, converts referencing wiki pages to DC metadata for ingest in rep. (wiki has comment, rights, author sections -> very doable) Manual post-processing (Fez Admin review area function)-Search results plug-in for wiki: display repository results in wiki search. Adds to mediawiki markup, to enable writing standard fez queries in the content.Sites: Repository - Wiki 2. Fedora and Django for an image repository: a new front-end - Peter Herndon (Memorial Sloan-Kettering Cancer Center)Summary: using Django as a CMS, internally developed adapters to Fedora 3.1.My gut feeling: A specific use case, images only, so rather limited in scope. Despite choosing the 'hook up with mainstream package' strategy, effectively still a NIH-based rolling their own. That makes the issues even more instructive.Notes:Adapting a CMS that expects SQL underneath is challenging - the plugin needs to be a full object-to-relational database mapper.Also, Fedora 'delete' caused 'undesired results', 'inactive' should be used.Further, some more unexpected oddities: had to write their own LDAP plugin to make it work, django has tagging but again plugin was needed to limit this to controlled vocabularies. Performance was not a problem.Interesting: repository for images only, so exif and the like can be used - tags added using Adobe Bridge! The tested, successful strategy: make use what is already familiar.In the Q&A the question came up: why use Fedora in this case anyway? Indeed the only reason would be preservation, otherwise it would have saved a lot of trouble to use Django Blobstore.The django-fedora plugins are available at Islandora: a Drupal/Fedora Repository System - Mark A Leggott (University of PEI)Summary:Islandora looks *very* promising. I noted before (UPEI's Drupal VRE strategy) that UPEI is a place to watch - they are making radical choices with impressive outcomes.Notes:UPEI's culture is opensource friendly. They use Moodle and Evergreen (apparently, they were the first Evergreen site in production).Rationale: opensourcing an in-house system reinforces good behaviour: full d[...]

OR09: On the new DuraSpace Foundation, and Fedora in particular

Mon, 01 Jun 2009 17:53:00 +0000

Open Repositories 2009, day 3, morning: three sessions on Fedora. The morning started with a joint presentation by Sandy Payette (Fedora Commons) and Michele Kimpton (DSpace Foundation), focussing on strategy and organisation; after caffeine break, Fedora+DSpace tech overview by Brad McLean; finally, developers' open house.I'll cover it in one blog post (this or09 series is getting a bit long in the tooth, isn't it?). For the actual info on DuraSpace and all, see the DuraSpace website. The tech issues were covered more in depth in further sessions. The merger, by new almost old news, though the incorporation lies still in the future: Fedora Commons and the Dspace user Group will become DuraSpace. The 'cloud' product, that originally had the same name, is renamed DuraCloud. Not the easiest of presentations, as there is a good deal of scepticism around the merger, and not just on the twitter #or09 channel. Payette and Kimpton handled it very professionally, dare I say gracefully. Both standing on the floor, in front of the audience, talking in turns (did I imagine it, or did I really hear them taking over a sentence, in Huey & Dewey style?), while an assistant standing behind the laptop was going back and forth through the slides in perfect timing.All in all, they pulled it off to come across as a seamless team. That bodes well. Also well was a frankness in the Q&A (as well as later in the developers open house). After noting some difficulties in finding the right strategy for open source development: "we do not aim to mold DSpace's opensource structure to the Fedora core committer, on the contrary". "We have to ask ourself: are we really community driven in the Fedora project? We've been closed in the past, we're opening up." Fedora has started using a new tracker, actually modelled on DSpace's model; "please use it, our tracker is our new inbox." On the state of Fedora - many and diverse new users. Escidoc is now deployable. WGBH OpenVault - including annotated videoForced Migration Online Jewish Women Archive - runs in EC2, first of a new wave of smaller archives now coming online using limited resources. Notably missing on a slide listing 'major contributors': Mediashelf, Sun, and Microsoft Research: VTLS. Possibly a sponsoring issue? It was more than a bit odd, given their standing in the past. Q: "How do yo see the future of DSpace vs. fedora - do they compete?" A: "Fedora’s architecture is great, but we also need ‘service bundles’. CMS style on top for instance. The architecture will stay open for any kind of app on top. DSpace is going the other direction. Opportunity is to make sure we're not doing identical things with different frameworks." It is *so* easy to read this as 'the products will meet in the middle', but this was carefully avoided. However, in the tech talk later it was mentioned that Fedora-DSpace replication back and forth experiments are actively worked on.I think I'm not alone in thinking that the products will merge eventually. It will take some time, but they will. Q: (cites another software company merger, IIRC Oracle and Peoplesoft) – merger brings great unrest in communities, which one is going to die? Are F&D moving together? Technical and cultural changes for both communities? etc. A: Payette: any kind of software eventually becomes obsolete. We are determined not to let that happen, and for that it needs to be modular and organic. Side by side, cause they both do things well. When overlap starts to happen, that may change, but by the module. Peter Sefton chimed in: very positive. Right deci[...]

OR09: Repository workflows: LoC's KISS approach to workflow

Sat, 30 May 2009 14:24:00 +0000

Open Repositories 2009, day 2, session 6b.

Leslie Johnston (Library of Congress)

My summary:

A practical approach to dealing with data from varying sources, keep it as simple as possible, but not simpler.
The ingest tools look very useful for any type of digitization project, especially when working with an externel party (such as a specialized scanning company).
The inventory tool may be even more useful, as lifecycle events are generally not  well covered by traditional systems, be it CMS or ILS.


LoC acts as durable storage deposit target for widely varying projects and institutions. Data transfers for archiving range between an usb stick in the mail to 2Tb transferred straight over the network. The answer to dealing with this: simple protocols, developed together with uc digilib (see also John Kunze).

Combined, this is not yet full a repository, but it covers many aspects of ingest and archive functionality. Rest will come. Aim: provide persistent access at file level.

Simple file format: BagIt

Submitter is asked to describe files it in BagIt format. 

BagIt is a standard for packaging files; METS files will fit in there, too. However, BagIt wascreated because we needed something much, much, much simpler. It’s not as detailed; description is a manifest, it may omit relationships, individual descriptions, etc. It is very lightweight (actually too light: we’ve started creating further profiles for certain types of content).

LoC will support Bagit similarly and simultaneously to MODS & METS.

Simple tools

Simple tools for ingest:
- parallel receiver (can handle network transactions over rsync, ftp, http, https)
- validator (checks file format)
- verifyit (checksums files)
These tools are supplied as java lib, java desktop application, and LocDrop webapp (prototype for SWORD ingest).

Integration between transfer and inventory is very important: trying to retrieve the correct information later is very hard.

After receiving, inventory tool records lifecycle events.
Why a standardized tool: 80% of workflow overlap between projects.

All tools availble open source [sourceforge]. What's currently missing will be added soon.

OR09: Repository workflows: ICE-TheOREM, semantic infra for theses

Sat, 30 May 2009 13:54:00 +0000

Open Repositories 2009, day 2, session 6b.ICE-TheOREM - End to End Semantically Aware eResearch Infrastructure for ThesesJim Downing (University of Cambridge), Peter Sefton (University of Southern Queensland)Summary: great concept, convincing demonstration. Excellent stuff.Part if ICE project, a JISC funded experiment with ORE.[paper] (seems stuck behind login?)Importance of ORE: “ORE is a really important protocol – it has been missing for the web for most of its life so far.” (DH: Amen!)Motivations for TheOREM: check ORE – is it applicable and useful? What are different ways of using? How do SWORD and ORE combine?Pracitally: improving theses visibility, embargoes as enabler.Interesting: in the whole repository system, the management of embargoes is separated from the repository by design. A special system serves resourcemaps for the unembargoed, IR polls these regularly. Interesting: this reflects the real-world political issues, and makes it easier to bring quite radical changes.Demonstrator (with the Fascinator) with one thesis, with reference to data object: molecule description in chemical markup language (actual data).Simple authoring environment in openoffice Writer (Word is also supported), stylesheet + convention based approach. When uploaded, the doc is taken apart to atomistic xml objects in Fedora. The chemical element is a separate object with relation to the doc, versioning etc.Embargo metadata is written as text in the doc (on title page; date noted using convention,KISS approach), and a style (p-meta-date-embargo) is applied. The thesis is again ingested - and voila, the part of the thesis with embargo is now hidden.This simple system also allows dialogue between student and tutor - remarks on the text - to be embedded in the document itself (and hidden to the outside by default). It looks deceivingly like Words's own comments, which I imagine will ease the uptake. Sidenote: policy in this project is that only submitter can ever change embargo data. So it is recommended to use openID rather than institutional logins, as PhD graduates tend to move on, and then nobody can change it anymore.Q (from Les Carr): supervisors won’t like to have their interaction with students complicated by tech. What is their benefit?A: automatic backing up is a big benefit, also of the workflow (ie. the comments in the document text). We *know* students appreciate it. Supers may not like it but everyone else will, and then they’ll have to.(note DH: this is of course in the sciences, it will be an interesting challange to get the humanities to adhere to stylesheet and microformatting conventions)Q: can this workflow also generate the ‘authentic and blessed copy’ of the final thesis?A: Not in project scope, we still produce the pdf  for that. In theory this might be a more authentic copy, but they might scream at the sight of this tech.[...]

OR09: Social marketing and success factors of IR’s.

Sat, 30 May 2009 13:21:00 +0000

Open Repositories 2009, day 2, session 5b. 

Social marketing and success factors of IR’s: two thorough but not very exciting sessions. Though the lack of excitement is maybe also because the message is quite sobering: we already know what needs to be done, but it is very hard to change the (institutional) processes involved.

(where social marketing doesn’t stand for web2.0 goodness, but for marketing with the aim of changing social behaviour, using the tools of commercial marketing).

Generally, face to face contact works best - on faculty scale, or in smaller institution like UPEI.

One observation that stuck with me is that the mere word repository is passive, where we want to emphasize exposure. This is precisely our problem as a whole in moving the repository into an active part at the center of the academic research workflow, instead of a passive end point.

Finaly, the list of good examples started out with Cream of science! We tend to take it for granted here in the Netherlands, and focus on where we're stuck; it’s good to be reminded how well that has worked and still does.

Interim news from uMich Miracle project (Making Institutional Repositories A Collaborative Learning Environment).
Not very exciting yet, might change when they’ve accumulated more data (it’s a work in progress, five case studies of larger US institutions, widely varying in policy, age, technology). 

Focus on “outcome instead of output”.
Focus on external measurements of success, instead of internal (ie number of objects etc). Harder to enumerate, less easy, but gets more honest results.

OR09: Keynote by John Wilbanks

Wed, 27 May 2009 15:23:00 +0000

Open Repositories 2009, day 1, keynote.

Locks and Gears: Digital Repositories and the Digital Commons - John Wilbanks, Vice President of Science, Creative Commons

Great presentation - in content as well in format. Worth looking at the slides [slideshare - of a similar presentation two weeks earlier]. [Which was good, because it was awkwardly scheduled at the end of the afternoon, that's great with a fresh jetlag, straight after the previous panel session without as much as a toilet break.]

The unfortunately familiar story of journals on the internet, scholars' rights eroding, which causes interlocking problems that prevent the network effect.

Choice quotes:
“20 years ago, we would have rather believed there be a worldwide web of free research knowledge, than Wikipedia.”
"The great irony is that the web was designed for scientific data, and now it works really well for porn and shoes."

The CC licenses are a way of making it happen with journals. However, for data even CC-BY is making it hard to do useful integration of different datasets. Survey of 1000 bio databases: >250 different licenses! Opposite law of open source software: the most conservative license wins.

Example of what can happen if data is set free: bittorent for genomes. Thanks to CC Zero.

What can we do?
Solve locally, share globally.
Use standards. And don’t fork them.
Lead by example.

Q: opinion on wolfram alfa? Or Google Squared?
A: pretty cool, doubts about scaling. It may be this or something else, rather open source than ‘magic technology’. But it’s a sign that the web is about to crack.
“The only thing that’s proven to scale is distributed networks.”

(my comment - with an estimated 500.000 servers, that is precisely what Google is...)

OR09: Panel session - Insights from Leaders of Open Source Repository Organizations

Wed, 27 May 2009 14:42:00 +0000

Open repositories 2009, day 1, session 4.A panel with the big three open source players (Dspace’s Michelle Kimpton and Fedora Commons’ Sandy Payette, freshly merged into Duraspace, ePrints’ Les Carr) and Lee Dirks from Microsoft. Zentity (no, not Zentity - 1.0 was officially announced at this conference) brings up lots of good questions. Unfortunately it didn’t get to an interesting exchange of ideas.I’ll concentrate on Microsoft, as they were the elephant in the room. Warning: opinions ahead.Microsoft is walking a thin line, their stance has been very defensive. Dirks started out quipping that “We wanted to announce Microsoft merging with ePrints, we got together yesterday, but we couldn’t agree on who was going to take over who.”He went on stressing that this is Microsoft Research and they're not required to make a profit. Putting on a philanthropist guise, he went on that their goal is to offer an open source repository solution to organizations that already have campus licenses. “How can we help you use software that you already paid for but maybe don’t use?”. They claim they don't want to pull people away from open source solutions.The most interesting parts were what he was *not* saying. Which open source does MS not want to pull us away from - Java? MySQL? Eclipse? Or did he only mean open source repository packages?Yeah right… getting visual studio, IIS, SQL server and the most dangerous of all, Sharepoint a foot in the door.An audience question that nailed the central issue: "The question will be lock-in. commitment in other parts of the lifecycle are therefore more important. Zentity hooks you up everywhere in the MS stack."Dirks responded with "Everything we’ve done, is built on open API’s, be it Sharepoint or Office or whatever. You could reconstruct it all yourself."Well with all respect to the Mono and Wine efforts, I wouldn't call Sharepoint and Office API's you could easily replace. The data will still be in a black box. Especially if you want to make any use of the collaboration facilities. Having open API's on the outside is fine and dandy, but one thing we're learned so far with repositories is that it is hard to create an exchange (metadata)format that is neither too limited nor so complicated it hinders adoption.On an audience question his stance on data preservation, Dirks initially replied that ODF would solve this, including provenance metadata. No mention of the controversy around this file format - what use is an xml format that cannot be understood? - or on filetypes outside the Office Universe.When this debate stranded, Sandy Payette turned the mood around by mentioning that MS has contributed much to interoperability issues. It is indeed good to keep in mind that MS is not just big and bad - they aren't. A company that employs Accordionguy can't be all that bad. The trouble is, you have to stay aware and awake, for they aren't all that good, either. Imagine an Office-style lock-in for collaboratories..kLink { display: none; } .iAs { display: none;} [...]

OR09: NSF Datanet-curating scientific data

Tue, 26 May 2009 12:29:00 +0000

Open Repositories 2009, Day 1, session 3. NSF Datanet-curating scientif data, John Kunze and Sayeed Choudhury.

The first non-split plenary (why a large part of the first two days consisted of 'split plenaries' baffled me, and I was not the only one).

Two speakers, two approaches. First John Kunze from UCDL, focussing in the microlevel with a strategy of keeping it simple. "Imagining the non-repository", "avoid the deadly embrace" of tight standards: decouple by design, lower the barrier of entry.

One of the ways to accomplish this is by staying lo-tech: instead of fullblown database systems, use a plain file system and naming conventions: pairtree. I really like this approach. I've worked in large digitization projects with third parties delivering content on harddisks. They bulk at databases and complicated metadata schemes, but this might just be doable for them. Good stuff.

CDL has a whole set of curation microsystems, as they call it. I'm going to keep an eye out for this.

The second talk, by Sayeed Choudhury (Johns Hopkins), focussed on the macro level of data conservancy. This was more abstract, and he started out with the admission that "we don’t have the answers, there are unsolved unknowns - otherwise we wouldn’t have gotten that NSF grant".

Interesting: one of the partner institutions (not funded by NSF) is Zoom Intelligence – a venture capital firm, interested in creating software services on research data. First VS's bought into ILS, now they pop up here... we must be doing something right!

Otherwise, the talk was mostly abstract and longer term strategy.

OR09: Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons

Mon, 25 May 2009 15:47:00 +0000

Day 1, session 2b.

Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons - Wendy White (University of Southampton)

Insightful, passionate kick-ass presentation, with some excellent diagrams in the slides (alas I found no link yet), especially one that puts the repository in the middle of the scientific workflow. The message was clear: tough times ahead for repositories – we have to be an active part of the flow, otherwise we may not survive.

Current improvements (see slides: linking into HR instead of LDAP to follow history of deployment, lightbox for presentation of nontext material) are strategy-driven, which is a step forward from tech-driven, but still piecemeal.

Predicts grants for large scale collaboration processes could be tipping point for changing lone researcher paradigm.

(in my opinion, this may well be true for some fields, even in the humanities, but not for all. Interesting that for instance The Fascinator Desktop aim to serve those ‘loners’).

Stress that Open access is not just idealism, it can also benefit in highly competitive fields – cites a research group that got a contract because the company contacted them after they could see what their researchers where doing.

“build on success stories: symbols and mythology”.
“Repository managers have fingers in lots of pies, we are in a very good position to take on the key bridging role.”
It will however require a culture change, also in the management sphere. In the Q&A she noted that Southhampton is lucky to have been through that process already.

All in all, a good strategic longer term overview, and quite urgent.