Bryan Lawrence : Bryan's Blog 2007/10/12

Bryan Lawrence

... personal wiki, blog and notes

Bryan's Blog 2007/10/12

All for one and one for all

On Wednesday and Thursday I attended the first ever NERC-wide data management workshop1, where we tried to bring together everyone in the NERC designated data centre2 extended family. The event was a great success, and I hope it will be repeated! There is great scope for improving the delivery of our individual programmes from learning about how others do things, and there is great scope for enabling new interactions!

Years ago (2001), NERC had a review of its data centres, and the review concluded that the designated data centres should be combined into one distributed data centre. NERC didn't follow that advice, and for good reasons: each of the data centres is embedded in its community, and their strength depends on their disciplinarity. Nonetheless, that strength is also a point of weakness, the existing funding fault lines aren't good at enabling information transfer across discipline boundaries, yet it's just such information transfer that is crucial in dealing with the big interdisciplinary problems we are facing today3!

The upcoming NERC strategy (due in a few weeks) addresses the interdisciplinary problem head on by recomposing NERC's attention in a number of specific cross-cutting themes, rather than down discipline specific lines, but, and this is a key but, the delivery plan still expects this to be delivered by discipline specific institutes (whether NERC owned or not). I think this is the right approach, by definition, inter-disciplinary research depends on there being foundations of discipline specfic expertise and funding methodologies that encourage both that discipline specific expertise and activities on the margins!

It's nice that the NERC data centre community is ahead of the game, we're trying hard to improve our interdisciplinarity with meetings like this one, and projects like the NDG.

1: The workshop webpage is unfortunately not public (ret).
2: NERC has seven designated data centres plus some other major activities (ret).
3: Yes, I know, we've always been facing them, but now these problems are being actively confronted ... (ret).

by Bryan Lawrence : 2007/10/12 : 0 trackbacks : 0 comments (permalink)

Why not dinosaurs?

One of the talks at the data conference was given by Lee-Anne Coleman the head of Science, Technology and Medicine at the British Library.

In her talk she was positing a role for the British Library as a host to digital data archives. Given that I'm on record as thinking that institutional repositories and data are a bad idea, what did I think about this?

Well, my first question was: Why not dinosaurs?

You (the reader) are not alone in finding that cryptic. So did everyone else in the hall! But what I meant was, we don't put our dinosaur bones in libraries, we put them in museums, and we do that for a purpose! So, why not digital data? Well, if we take digital away from that sentence, and we (legitimately) consider bones as data, and we know we don't put them in libraries, then the conclusion a priori ought to be there isn't a role for (implicitly all) digital data and libraries. So the question is better posed as "Is there a role for some digital data and libraries?"

Of course there is: books are now digital data, and as she said, so are recordings and videos etc1. So, there is a definite role for some digital data, but all types of data, definitely not!

She went on to admit no intention to curate primary and discipline specific digital data, but a desire to "link to curated data" and to "identify and hold" reference datasets. Well, I was still in the dark, because (as I explained in a question from the floor), the problem is that while the library uses woolly phrases like "digital data", they will get resistance from those of us who understand the complexities of the problem. I pleaded for the library to start being explicit about the types of data (and by type I mean explicitly rather more and less than the format of the data). I mean type in the same sense as I did when I discussed interoperability, something I can name which has real word meaning. Types such as "recording" or "mp3"2 carry enough information that I can assess the ability of the BL to hold data conforming to those types. What I want is a countable list of types of data that the library are interested in, and why, and how they are going to maintain them. When I get such a list, I'd be happy to weigh in with my opinion, but until then: why not dinosaurs?

There are many institutions who want to become bit buckets, and while that's an important role, the contents of those bit buckets are useless without a custodian community. If libraries can incorporate such custodian expertise (or rely on it being so pervasive in society that when format and semantic migration are necessary that the expertise will be available and affordable), then absolutely, get involved! But if not, by actively claiming to hold the data, they're providing a false impression of information persistence!

All that said, I do believe in the idea of the BL holding reference datasets, but they have to make sure they understand what they've got and how they're going to persist them. Books and documents are easy3!

I believe in the proposed role for improving search and navigation (better resource discovery). I've invested enough of my life in this area already to know that there is much to do, and that libraries already have much to give!

I also think there's a potential role for libraries in ontology and standards governance (not, you will notice, in constructing or devising these things). Too many standards bodies have business models that get in the way of their function (ISO: are you listening?), whereas libraries, and in particular the BL, understand their duty to society! I was particularly heartened that Coleman introduced this idea herself in the context of a role for naming the unique associations between authors and their products in the "dataspace" (a role that they already play for books). Doing this of course would have them taking a role in URI maintenance!

Finally, I'm in two minds about using libraries as "bit buckets of last resort", that is, "the place to give your data when you're about to be closed down". This idea was floated from the floor, twice! If in this situation, one can't find an organisation with the knowledge and capacity to take thedata, and it's in any way "specialised", then the reality is that it's probably dead already, and giving it to the library may only be buying time - fine if there is a white knight on the horizon, but otherwise a cost with little prospect of reward (this is not the same as preserving a document or anything else with commonly understood semantics and syntax)! As a tax payer I'm not convinced that my taxes should pay for the storage of impenetrable bits!

1: The list could be much longer, I in no way mean to imply a limited list here. (ret).
2: Although it's a format, we know that they are recordings, so in this case the format name carries the feature type semantics. (ret).
3: Or are they? Amusingly, the powerpoint of the BL presentation online on the conference website has fonts that I don't have, and I can't easily read it - ironic really, given the topic and venue, but I've been there too. Roll-on pervasive use of odf for "powerpoint". (ret).

by Bryan Lawrence : 2007/10/12 : Categories curation (permalink)

DISCLAIMER: This is a personal blog. Nothing written here reflects an official opinion of my employer or any funding agency.