Bryan Lawrence

... personal wiki, blog and notes

Bryan's Blog 2006/10/20

Citation, Hosting and Publication

Returning to my series on citation (parts one, two, and three).

My last example was an MST data set held at the BADC, and I was suggesting something like this (for a citation):

Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth, [Internet], British Atmospheric Data Centre (BADC), 1990-, urn badc.nerc.ac.uk/data/mst/v3/upd15032006, feature 200409031205 [http://featuretype.registry/verticalProfile] [downloaded Sep 21 2006, available from http://badc.nerc.ac.uk/data/mst/v3/]

which I could also write like this to give some hint of the semantics:

<citation>
<Author> Natural Environment Research Council </Author>
<Title> Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth </Title>
<Medium> Internet </Medium>
<Publisher> British Atmospheric Data Centre (BADC) </Publisher>
<PublicationDate status="ongoing"> 1990 </PublicationDate>
<Identifier> badc.nerc.ac.uk/data/mst/v3/upd15032006 </Identifier>
<Feature>
<FeatureType>http://featuretype.registry/verticalProfile </FeatureType>
<LocalID>200409031205 </LocalID>
</Feature>
<AccessDate> Sep 21 2006 </AccessDate>
<AvailableAt>
<url>http://badc.nerc.ac.uk/data/mst/v3/</url>
</AvailableAt>
</citation>

The tags are made up, but hopefully identify the important semantic content of the citation. As I said last time, there is some redundant information there, but maybe not (there is no guarantee that the Identifier and the AvailableAt carry the same semantic content).

Inherent in that example, and my meaning, was a concept of publication, and I introduced that distinction by comparing the MST and our ASHOE dataset (which is really "published" elsewhere). In the library world, there is a concept of "Version of Record", which isn't exactly analogous, but I would argue BADC holds the dataset equivalent of the version of record for the MST, and NASA AMES the equivalent for the ASHOE dataset.

Generally, in scholarly publication, in the past one distinguished between the refereed literature, the published literature and the grey literature1, where the latter might not have been allowed as a valid citation. The situation has become more complicated with the urge to cite digital material, but one of the reasons for the old rules was about attempting to ensure permanence and access - something that is obviously becoming a problem again. Thus, we should explore the concepts of publication and version of record a bit further, before we create new problems. Cathy Jones, working on the CLADDIER project, has made the point in email that a publisher does something to the original that adds value, and I think in the case of digital data, that something should include at least:

And so, in a reputable article (whatever that means), or in the metadata of a published dataset, I wouldn't allow the citation of a dataset that didn't meet at least those criteria, but once we have met those criteria, then that first version should be the version of record, and copies held elsewhere should most definitely distinguish between the publisher and the availability URI.

Arguably the 2nd and 4th of these criteria could be collapsed down to the use of a DOI. While that's true, I think the use of both helps the citation user (just as I think it best to do a journal citation with all of the volume, page number and DOI). However, if the publisher does choose to use a DOI, it would help if the holders of other copies did not! Whether or not it's true, the use of a DOI does imply some higher level of ownership than simply making a copy available.

Implicit in my discussion of the metadata of a published dataset, is the idea that just as in the document world, we could introduce the concept of some sort of kite-mark or refereeing of datasets. A refereed dataset would be

and it would have been

There might or might not be a special graphical interface to the data and other well known interfaces (e.g. WCS etc) ought probably be provided.

Datasets published after going through such a procedure would essentially have come from a "Data Journal", and so in my example above, such the <Publisher> would become the name of the organisation responsible for the procedure, and the <Title> might well become the title of the "Data Journal".

1: Grey Literature: i.e. documents, bound or otherwise, produced by individuals and/or institutions, but which were not commercially available, and therefore, by implication, not very accessible. (ret).

by Bryan Lawrence : 2006/10/20 : Categories claddier metadata ndg curation : 3 trackbacks : 7 comments (permalink)

The Economist Goes Green

I'm a bit slow to find out about this (which is down to not enough time reading my feeds): Anyway, The Economist is arguing for action on greenhouse gas emissions: Editorial (7 Sep).

Is this a tipping point of sorts?

(Thanks to Andrew Dessler).

by Bryan Lawrence : 2006/10/20 : Categories climate environment (permalink)


DISCLAIMER: This is a personal blog. Nothing written here reflects an official opinion of my employer or any funding agency.