Bryan Lawrence : Bryan's Blog 2006/10

Bryan Lawrence

... personal wiki, blog and notes

Bryan's Blog 2006/10

The Stern Way Forward

Yesterday I waded through the first two thirds of the Executive Summary ... I thought it best to finish it today, otherwise I would be at risk of not only not reading the whole thing, but not making it through the executive summary :-)

The way forward he proposes depends on three pillars:

  • Establishing a price for Carbon,

  • A Technology Price

  • The removal of barriers to behavioral change.

In terms of pricing Stern recommends use of one or more of tax, trading or regulation, with the mix depending on choices within specific jurisdictions. I have to say, without reading the main text, I don't understand how he can argue that different jurisdictions can achieve carbon prices in different ways (surely it would lead to some form of carousel fraud) ... but I'm no economist, so ok ...

Policy incentives include

... technology policy, covering the full spectrum from research and development, to demonstration and early stage deployment ... but closer collaboration between government and industry will further stimulate the development of a broad portfolio of low carbon technologies and reduce costs.

I find it (amusing, sad, worrying) that the report suggests that existing policy incentives to support the market should only increase by two to five times. This at a time when the UK incentives run out half way through the year. That suggests to me that an order of magnitude increase is necessary (after all, the take up is relatively low, and even with incentives one has to be pretty wealthy to get into home generation).

In terms of behavioural change, he makes the point that:

Even where measures to reduce emissions are cost-effective, there may be barriers preventing action. These include a lack of reliable information, transaction costs, and behavioural and organisational inertia. ... Regulatory measures can play a powerful role in cutting through these complexities, and providing clarity and certainty. Minimum standards for buildings and appliances have proved a cost-effective way to improve performance, where price signals alone may be too muted to have a significant impact.

The clear message throughout is that the market can't do this alone! From the obvious point that carbon costs are an "externality" (the producer of carbon dioxide does not themselves pay the costs), through to the reality that regulation and taxation are going to be necessary to begin to change minds - I reckon hearts will follow (if they're not already there!)

Another obvious (to me) point is that we need to start thinking and planning about adaptation now! We have some decades of climate change ahead of us, regardless of what we can achieve in changing emissions!

We hear a lot about how there is no point in the UK doing anything because it contributes only 2% of the global emmissions. However, it was good to read

... China's goals to reduce energy used for each unit of GDP by 20% from 2006-2010 and to promote the use of renewable energy. India has created an Integrated Energy Policy for the same period that includes measures to expand access to cleaner energy for poor people and to increase energy efficiency.

It would be nice to hear concrete proposals from the U.S. and Australia!

I'll leave the last word to Stern:

Above all, reducing the risks of climate change requires collective action. It requires co-operation between countries, through international frameworks that support the achievement of shared goals. It requires a partnership between the public and private sector, working with civil society and with individuals. It is still possible to avoid the worst impacts of climate change; but it requires strong and urgent collective action. Delay would be costly and dangerous.

by Bryan Lawrence : 2006/10/31 : Categories environment (permalink)

Stern Facts

Like William Connolley I doubt I'll ever read the whole thing, but it's intriguing to wade through the 27 page executive summary at least.

The Bad News

Under a BAU scenario, the stock of greenhouse gases could more than treble by the end of the century, giving at least a 50% risk of exceeding 5?C global average temperature change during the following decades. This would take humans into unknown territory. An illustration of the scale of such an increase is that we are now only around 5?C warmer than in the last ice age.

Well, that 5 degree figures is a bit hard to fathom (although I think he uses that with respect to the period 2100-2200), but the comparison with the scale to the last ice age is rather a good one. Even if the real number might be 2 to 3 degrees C, it puts things in perspective somewhat - even those of us who may claim to be professionals still have to get a grip on the emotional reaction that a few degrees C isn't that much really. Put like that, it obviously is!

The disaster list is pretty awesome:

  • Water supply problems (Melting glaciers: initially a flood risk, lead to a fall in water availability, not to mention changes in water availability associated with changing weather patterns) ...

  • Declining crop yields (particularly in the higher range of predictions)

  • Death rates from malnutrition, heat stress and vector borne diseases (malaria dengue fever etc) increase ...

  • Rising sea levels ... threatening the homes of 1 in 20 people!

  • Ecosystem melt down (15-40% of species for only a 2C increase!) (Plus ocean acidification with unquantifiable impact on fish stocks)

Then:

Impacts on this scale could spill over national borders, exacerbating the damage further. Rising sea levels and other climate-driven changes could drive millions of people to migrate ...rise in sea levels, which is a possibility by the end of the century... Climate-related shocks have sparked violent conflict in the past, and conflict is a serious risk in areas such as West Africa, the Nile Basin and Central Asia.

But maybe this will get more attention in the City:

At higher temperatures, developed economies face a growing risk of large-scale shocks - for example, the rising costs of extreme weather events could affect global financial markets through higher and more volatile costs of insurance.

I find all the arguments about changes in GDP difficult to follow, possibly because one never really knows where the baseline is (unless one is an economist), but this seems a pretty straight forward statement:

In summary, analyses that take into account the full ranges of both impacts and possible outcomes - that is, that employ the basic economics of risk - suggest that BAU climate change will reduce welfare by an amount equivalent to a reduction in consumption per head of between 5 and 20%. Taking account of the increasing scientific evidence of greater risks, of aversion to the possibilities of catastrophe, and of a broader approach to the consequences than implied by narrow output measures, the appropriate estimate is likely to be in the upper part of this range.

I've blogged before (Jan 2005a, Jan 20005b, Jul 2005, and Aug 2005) about the future of the oil economy, but maybe I've been on the wrong tack:

The shift to a low-carbon global economy will take place against the background of an abundant supply of fossil fuels. That is to say, the stocks of hydrocarbons that are profitable to extract (under current policies) are more than enough to take the world to levels of greenhouse-gas concentrations well beyond 750ppm CO2e, with very dangerous consequences. Indeed, under BAU, energy users are likely to switch towards more carbon-intensive coal and oil shales, increasing rates of emissions growth.

The economic analysis makes it clear that there is a high price to delay. As he says:

Delay in taking action on climate change would make it necessary to accept both more climate change and, eventually, higher mitigation costs. Weak action in the next 10-20 years would put stabilisation even at 550ppm CO2e beyond reach ? and this level is already associated with significant risks.

The Good News

He thinks there is a way out:

Yet despite the historical pattern and the BAU projections, the world does not need to choose between averting climate change and promoting growth and development. Changes in energy technologies and the structure of economies have reduced the responsiveness of emissions to income growth, particularly in some of the richest countries. With strong, deliberate policy choices, it is possible to ?decarbonise? both developed and developing economies on the scale required for climate stabilisation, while maintaining economic growth in both.

I think the Aussie and the American governments understand the last part of this (opportunities), but they want to somehow avoid the fist part ... (costs):

Reversing the historical trend in emissions growth, and achieving cuts of 25% or more against today?s levels is a major challenge. Costs will be incurred as the world shifts from a high-carbon to a low-carbon trajectory. But there will also be business opportunities as the markets for low-carbon, high-efficiency goods and services expand.

For those of us in paranoid Europe, worried about Russian control of our gas supplies:

National objectives for energy security can also be pursued alongside climate change objectives. Energy efficiency and diversification of energy sources and supplies support energy security, as do clear long-term policy frameworks for investors in power generation.

In Britain today the headlines are all about the taxation that will result from doing something about this, but Stern also points out that while

the social cost of carbon will also rise steadily over time ... This does not mean that consumers will always face rising prices for the goods and services that they currently enjoy, as innovation driven by strong policy will ultimately reduce the carbon intensity of our economies, and consumers will then see reductions in the prices that they pay as low-carbon technologies mature.

I've got that in the good news section on the grounds that the clear message is that the cost of doing something about this isn't going to rise and rise, but it isn't good news for our next thirty years. I fear for the tourism and export agriculture of countries a long way from anywhere else (e.g. New Zealand!)

At this point I'm on page seventeen, and I'm tired ... more soon!

Update, 31 Oct: See James Annan for a critique of the science part ..

by Bryan Lawrence : 2006/10/30 : Categories climate environment : 1 trackback : 0 comments (permalink)

Subtle Discipline Drift

Recently Oxford University advertised for a Lectureship in Atmospheric Physics, and Imperial College is currently advertising for a raft of positions, from lectureships to professorships. In some ways I was and am tempted (despite having supped at the font of "proper" academia before, and having been burned ... a lectureship, in NZ at least, being far far more stressful than what I do now :-). I do love atmospheric science.

However, while I'm tempted, I'm also realistic. In the past five years, I've drifted (and sometimes been pushed) towards what has recently in the UK been called e-science (that name is now deprecated, who knows what we'll call it next year!). I no longer read the atmospheric science journals, not even the title pages, so I haven't a clue what's in the literature. One has only to look at what I blog about nowadays to realise that whatever I'm doing now, it's not atmospheric science per se - although everything I do is predicated towards making the doing of atmospheric science easier.

I keep telling my staff that one should always be appraising one's career options, assessing what doors are closing and opening as time goes by, and carpa diem etc. So, here's my assessment of one of my options right now. Given I'm so out of touch, I'm not sure I'll even feel comfortable supervising atmospheric science students, which is really scary. I think that's the sound of my atmospheric science door closing, if not permanently, pretty tightly anyway - it'd take some effort and time to prise it open again! But I like working with students, so if you're an academic in a computer science department in my part of the world, and fancy getting them working on some atmospheric science related problems, get in touch, we could talk about co-supervision. Hopefully that's another door opening ...

by Bryan Lawrence : 2006/10/26 : 0 trackbacks : 0 comments (permalink)

More Stupid Patent Litigation

I'm with Tim Bray on this. Why isn't the internet in an uproar? IBM is litigating Amazon on patent violations, it's all pretty incredible, but the two most silly are:

If Amazon is found guilty of this, then the entire ediface of data distribution in science will be violating these patents too. In fact, pretty much all e-commerce is covered in these patents (and the others they're claiming are violated).

IBM should be ashamed, even if it's only to over-turn that ludicrous one-click patent ...

by Bryan Lawrence : 2006/10/26 : 0 trackbacks : 0 comments (permalink)

Exploring Web Server Backends - installing fastcgi and lighttpd

A few months ago, I was investigating web server options (one, two, three). I finished that series saying I needed to investigate wsgi. Well that time has come, there are a number of reasons why wsgi and fastcgi (or scgi) may be important to us. However I'm a little bit wary about Apache and fastcgi after getting the impression that lighttpd may be the way to go for fastcgi. So, this note is a list of my experiences getting a wsgi hello world going on my dapper laptop. (As usual, my interest in doing this for myself is to understand the major issues, not because I'm personally going to be working on this).

(I'm doing this using my own /usr/local/bin/python2.5 rather than the system default python.)

Got Lighttpd:

sudo apt-get install lighttpd

(This started a process running under www-data, and put scripts in /etc/init.d, so I may well have this automatically starting when I boot, which isn't really what I want on a laptop ... I'll investigate that later).

Got flup, noting that no 2.5 egg existed, I had to get a tar ball, and setup install it (nb: remember using my local python):

wget http://www.saddi.com/software/flup/dist/flup-r2030.tar.gz
tar xzvf flup-r2030.tar.gz
cd flup-r2030
sudo python setup.py install

Following cleverdevil (Jonathan Lacour) I grabbed scgi while I was at it.

sudo easy_install scgi

but for my first steps, I'm planning on getting vanilla fastcgi working. I may play with scgi later. Meanwhile for fastcgi, I'm basically following cleverdevil again, adjusted for my ubuntu apt-installed lightty.

I modified the file ''10-fastcgi.conf in /etc/lighttpd/conf-available to be

## FastCGI programs have the same functionality as CGI programs,
## but are considerably faster through lower interpreter startup
## time and socketed communication
##
## Documentation: /usr/share/doc/lighttpd-doc/fastcgi.txt.gz
##                http://www.lighttpd.net/documentation/fastcgi.html

server.modules   += ( "mod_fastcgi" )

## Start a FastCGI server for python test example
fastcgi.debug = 1
fastcgi.server    = ( ".fcgi" =>
                      ( "localhost" =>
                                        (
                          "socket" => "/tmp/fcgi.sock",
                          "min-procs" => 2
                                        )
                                      )
                                )

and put a sym link to this file into my /etc/lighttpd/conf-enabled directory. (Update 27 Oct: Oops, I had a non-working version of 10-fastcgi.conf here until today. The one above is the one I have working ... today).

I put the test file in my /var/www directory as test.fcgi:

#!/usr/local/bin/python
from flup.server.fcgi import WSGIServer

def myapp(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['Hello World!\n']

WSGIServer(myapp, bindAddress = '/tmp/fcgi.sock').run()

***
highlight file error
***

And I ran it:

python test.fcgi

and it sits there running.

Now, trying to access it on http://localhost.localdomain/test.fcgi results in a 500 Internal Server Error. A check in the access log showed many instances of this (associated with much head scratching and time wasting):

2006-10-25 21:54:06: (mod_fastcgi.c.2669) fcgi-server re-enabled: unix:/tmp/fcgi.sock
2006-10-26 08:09:30: (mod_fastcgi.c.1739) connect failed: Permission denied on unix:/tmp/fcgi.sock
2006-10-26 08:09:30: (mod_fastcgi.c.2851) backend died, ...

Eventually the penny dropped. The server is running as www-data which has no access permissions to the unix domain socket (/tmp/fcgi.sock) created by the user (whether me or root) running the python fast.cgi server code ...

So, I changed the permissions on /var/www to allow www-data access, and reran the python command:

sudo su www-data
python test.fcgi

And lo and behold, I get a "Hello World" on http://localhost.localdomain/fast.cgi.

by Bryan Lawrence : 2006/10/26 : Categories badc ndg computing python : 1 trackback : 0 comments (permalink)

Citing data with ISO19139

I thought I might try and work out exactly what tags I might use for my previous citation example, if I was using ISO19139 (i.e. in the metadata of another dataset).

The appropriate piece of ISO19139/19115 is the CI_Citation element, which defines the metadata describing authoratative reference information ... which in my mind should also include other datasets!

Some if it is "straight forward" (I don't plan to admit how long it took to work this out :-) :

<CI_Citation xmlns="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/citation.xsd">
<title>
<gco:CharacterString>Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth </gco:CharacterString>
</title>
<alternateTitle>
<gco:CharacterString>MST </gco:CharacterString>
</alternateTitle>
<date>
<CI_Date>
<date>
<gco:Date>2006 </gco:Date>
</date>
<dateType>
<CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/CodeList/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication"> </CI_DateTypeCode>
</dateType>
</CI_Date>
</date>
<identifier>
<MD_Identifier>
<code>
<gco:CharacterString>badc.nerc.ac.uk/data/mst/v3/upd15032006 </gco:CharacterString>
</code>
</MD_Identifier>
</identifier>
<citedResponsibleParty>
<CI_ResponsibleParty>
<organisationName>
<gco:CharacterString>Natural Environment Research Council </gco:CharacterString>
</organisationName>
<role>
<CI_RoleCode codeList="http://www.isotc211.org/2005/resources/CodeList/gmxCodelists.xml#CI_RoleCode" codeListValue="Author"> </CI_RoleCode>
</role>
</CI_ResponsibleParty>
</citedResponsibleParty>
<citedResponsibleParty>
<CI_ResponsibleParty>
<organisationName>
<gco:CharacterString>British Atmospheric Data Centre </gco:CharacterString>
</organisationName>
<role>
<CI_RoleCode codeList="http://www.isotc211.org/2005/resources/CodeList/gmxCodelists.xml#CI_RoleCode" codeListValue="Publisher"> </CI_RoleCode>
</role>
</CI_ResponsibleParty>
</citedResponsibleParty>
<citedResponsibleParty>
<CI_ResponsibleParty>
<organisationName>
<gco:CharacterString>British Atmospheric Data Centre </gco:CharacterString>
</organisationName>
<contactInfo>
<CI_Contact>
<onlineResource>
<CI_OnlineResource>
<linkage>
<URL>http://badc.nerc.ac.uk/data/mst/v3/</URL>
</linkage>
<function>
<CI_OnLineFunctionCode codeList="http://www.isotc211.org/2005/resources/CodeList/gmxCodelists.xml#CI_OnlineFunctionCode" codeListValue="download"></CI_OnLineFunctionCode>
</function>
</CI_OnlineResource>
</onlineResource>
</CI_Contact>
</contactInfo>
<role>
<CI_RoleCode codeList="http://www.isotc211.org/2005/resources/CodeList/gmxCodelists.xml#CI_RoleCode" codeListValue="custodian"></CI_RoleCode>
</role>
</CI_ResponsibleParty>
</citedResponsibleParty>
<presentationForm>
<CI_PresentationFormCode codeList="http://www.isotc211.org/2005/resources/CodeList/gmxCodelists.xml#CI_PresentationFormCode" codeListValue="profileDigital"> </CI_PresentationFormCode>
</presentationForm>
</CI_Citation>

OK, it's pretty nasty in terms of verbiage, but as (some) folk keep saying, this is for computers not humans - never mind that a human has to write some code to handle it - but it's not as bad as I feared!

In getting that far, we see that I've nearly managed to get the same information content, but there are some pretty important omissions (I think, caveat emptor, I'd glad to be wrong about this):

  1. I don't see any way to indicate that the dataset is being updated (the "ongoing" tag in my previous example", ideally this would require a spot for an MD_MaintenanceFrequencyCode in the citation).

  2. I don't see any way of indicating a particular part of a dataset (that is, having separate identifiers both for the dataset and for particular features within it).

  3. Despite support for feature-type descriptions within an ISO19139 document proper (in the MD_FeatureTypeDescription tag), one can't identify which features are in a cited dataset. We're reduced to using CI_PresentationFormCode, which strikes me to be a completely ugly compromise between feature descriptions and a text element. The one I've chosen here (profile) is partly right, but doesn't get across that this dataset consists of timeseries of vertical profiles!

  4. One can't, as far as I can see, identify when the dataset was accessed (or the date of the citations validity) and I think this is rather crucial for citation of online material.

I guess those are the minimum extensions we'd need to support citeable datasets! (By the way, I've ignored the option of using otherCitationDetails as one is only allow one of those in the citation!)

Update: Note that BADC appears as both a publisher and a custodian, actually, following my discussion of the distinction, I think at the moment, one would want to remove the publisher role ... and leave only the custodian role (in the ISO19139, the text citation form can't distinguish between these roles).

by Bryan Lawrence : 2006/10/25 : Categories iso19115 metadata ndg claddier : 911 trackbacks : 1 comment (permalink)

Wierd unicodeness

For some reason my blog has suddenly developed some sort of unicode problem, which is making a large number of pages core dump. I don't know why. I'm investigating ... meanwhile, I'm trapping the error, but you may assume strange things will happen today!

Update (10-15am): At the moment, I'm trapping the page content in the wiki formatter, forcing it to utf-8, doing my wiki formatting (in wikiBNL), and then forcing it back to ascii before returning the content to the leonardo page provider. I need to do the last step, because there is some error higher up which is breaking with a strict asci code conversion. What is utterly wierd is that this was fine yesterday! Since then I have changed my embedhandler (to support namespaces in xml pretty printing, something I'll blog about sometime), but I fail to understand how that would lead to this problem ...

Update (10-35am): Well, it's not the new embedhandler. I commented out the new xml stuff, and the problem still exists ... (I never thought it was, but I think it's the only thing I've touched ...

Update (11am): I don't understand this, and haven't time to fix it right now. This means that some pages with non-asci may have some spurious ? until I can fix this ...

by Bryan Lawrence : 2006/10/25 : Categories python : 0 trackbacks : 0 comments (permalink)

Persistence

Just after I wrote my last post on data citation, I found Joseph Reagle's blog entry on bibliography and citation. He's making a number of points, one of which was about transience. In the comments to his post, and in Joseph's comment on my post, two solutions to deal with internet transience are mentioned: the wayback machine and webcite.

I've looked at the wayback machine in the past, but there is no way that it represents any realistic full sample of the internet (for example, as of today, it has exactly one impression of home.badc.rl.ac.uk/lawrence - from 2004!) ... but how could it? It's an unrealistic task. What I do see it as is a (potentially very useful) set of time capsules ... that is samples!

By contrast, webcite allows the creater of content to submit URLs for archival, thus ensuring when one writes an academic document, the material will be archived, and the citation will be persistent. This is a downright excellent idea, provided you believe in the persistence of the webcitation consortium (and I have no reason not to). The subtext however, is that the citation is a document, it wont help us with data - and not just because data may be large, the other issue is that the webcitation folk would have to take on support for data access tools, and I think the same argument applies to them as applies to libraries in this regard!

This brings me back to my point about data citation: we had better only allow it when we believe in the persistence of the organisation making the data available, and that will consist of rather more than just having the bits and bytes available for an http GET!

by Bryan Lawrence : 2006/10/23 : Categories curation claddier (permalink)

Citation, Hosting and Publication

Returning to my series on citation (parts one, two, and three).

My last example was an MST data set held at the BADC, and I was suggesting something like this (for a citation):

Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth, [Internet], British Atmospheric Data Centre (BADC), 1990-, urn badc.nerc.ac.uk/data/mst/v3/upd15032006, feature 200409031205 [http://featuretype.registry/verticalProfile] [downloaded Sep 21 2006, available from http://badc.nerc.ac.uk/data/mst/v3/]

which I could also write like this to give some hint of the semantics:

<citation>
<Author> Natural Environment Research Council </Author>
<Title> Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth </Title>
<Medium> Internet </Medium>
<Publisher> British Atmospheric Data Centre (BADC) </Publisher>
<PublicationDate status="ongoing"> 1990 </PublicationDate>
<Identifier> badc.nerc.ac.uk/data/mst/v3/upd15032006 </Identifier>
<Feature>
<FeatureType>http://featuretype.registry/verticalProfile </FeatureType>
<LocalID>200409031205 </LocalID>
</Feature>
<AccessDate> Sep 21 2006 </AccessDate>
<AvailableAt>
<url>http://badc.nerc.ac.uk/data/mst/v3/</url>
</AvailableAt>
</citation>

The tags are made up, but hopefully identify the important semantic content of the citation. As I said last time, there is some redundant information there, but maybe not (there is no guarantee that the Identifier and the AvailableAt carry the same semantic content).

Inherent in that example, and my meaning, was a concept of publication, and I introduced that distinction by comparing the MST and our ASHOE dataset (which is really "published" elsewhere). In the library world, there is a concept of "Version of Record", which isn't exactly analogous, but I would argue BADC holds the dataset equivalent of the version of record for the MST, and NASA AMES the equivalent for the ASHOE dataset.

Generally, in scholarly publication, in the past one distinguished between the refereed literature, the published literature and the grey literature1, where the latter might not have been allowed as a valid citation. The situation has become more complicated with the urge to cite digital material, but one of the reasons for the old rules was about attempting to ensure permanence and access - something that is obviously becoming a problem again. Thus, we should explore the concepts of publication and version of record a bit further, before we create new problems. Cathy Jones, working on the CLADDIER project, has made the point in email that a publisher does something to the original that adds value, and I think in the case of digital data, that something should include at least:

  • provision of catalogue metadata

  • some commitment to maintenance of the resource at the AvailableAt url

  • some commitment to the resource being conformant to the description of the Feature

  • some commitment to the maintenance of the mapping between the identifier and the resource.

And so, in a reputable article (whatever that means), or in the metadata of a published dataset, I wouldn't allow the citation of a dataset that didn't meet at least those criteria, but once we have met those criteria, then that first version should be the version of record, and copies held elsewhere should most definitely distinguish between the publisher and the availability URI.

Arguably the 2nd and 4th of these criteria could be collapsed down to the use of a DOI. While that's true, I think the use of both helps the citation user (just as I think it best to do a journal citation with all of the volume, page number and DOI). However, if the publisher does choose to use a DOI, it would help if the holders of other copies did not! Whether or not it's true, the use of a DOI does imply some higher level of ownership than simply making a copy available.

Implicit in my discussion of the metadata of a published dataset, is the idea that just as in the document world, we could introduce the concept of some sort of kite-mark or refereeing of datasets. A refereed dataset would be

  • available at a persistent location

  • accompanied by more comprehensive metadata (which might include calibration information, algorithm descriptions, the algorithm codes themselves etc)

  • quality controlled, with adequate error and/or uncertainty information

and it would have been

  • assessed as to it's adherence to such standards.

There might or might not be a special graphical interface to the data and other well known interfaces (e.g. WCS etc) ought probably be provided.

Datasets published after going through such a procedure would essentially have come from a "Data Journal", and so in my example above, such the <Publisher> would become the name of the organisation responsible for the procedure, and the <Title> might well become the title of the "Data Journal".

1: Grey Literature: i.e. documents, bound or otherwise, produced by individuals and/or institutions, but which were not commercially available, and therefore, by implication, not very accessible. (ret).

by Bryan Lawrence : 2006/10/20 : Categories claddier metadata ndg curation : 3 trackbacks : 7 comments (permalink)

The Economist Goes Green

I'm a bit slow to find out about this (which is down to not enough time reading my feeds): Anyway, The Economist is arguing for action on greenhouse gas emissions: Editorial (7 Sep).

Is this a tipping point of sorts?

(Thanks to Andrew Dessler).

by Bryan Lawrence : 2006/10/20 : Categories climate environment (permalink)

On substitution groups and ISO19139

I have bleated already about the difficulties of using ISO19139 with restrictions which introduce new tag names.

Now the official way to do this is probably to exploit substitution groups in the new xml schema associated with your restriction. So, if one wanted to restrict, for example gmd:MD_Metadata one might start in your new schema with something like

<element substitutionGroup="gmd:MD_Metadata" type="ourMD_Metadata" name="restricted_MD_Metadata"> stuff </element>

(See w3schools, the xml schema primer, or Walmsey, 2001, to explain the syntax). Then in the instance document, one would have

<restricted_MD_Metadata> stuff </restricted_MD_Metadata>

At this point one could use the xml schema validation machinery to ensure one had a nice valid instance of the new restricted schema.

My beef is how we use this. The gurus will tell you there is no problem, and maybe there isn't if one wants to invest an enormous amount of time in complex handlers (even so, maybe it's not that straight forward for data binding, and perhaps the tools aren't really that mature - or weren't in 2004).

So, if I'm writing code to handle ISO19139 documents, I'm going to be writing xslt or using xpath or xquery to get at particular content or I'm going to have to invest in brute force if I want to handle things in a high level language like python (as far as I know there are no pythonic tools that get close to this sort of requirement internally).

Let's just explore the brute force method, and a simple use case: I have harvested ISO19139 profiles (I'm starting to think "variants" - complete with quotes - would be a better term :-) from a number of places, and want to deliver the titles to a web page ... so I need to find the titles. I can't assume I can use a simple xpath expression (which is supported in python) to find all the titles. I have to parse all the relevant schemas, and do something complex to find the new title elements. In practice, I have to support each profile as a completely different schema, they might as well not share the ISO19139 heritage - even though there are advantages in the ISO19115 content heritage. Yuck.

OK, now suppose I hand this off to an xquery engine. How easy is that? Let's assume it's not buggy ... This is essentially the use case described as 1.9.4.7 Q7 in the June 2006 use case document. I'm not that familiar with xquery, but it appears that

schema-element(gmd:MD_Metadata)

should then match any element which is linked to it via a substitution group declaration like that above. If it really is that simple, then this is much easier than the brute force method, and a good reason for passing my problems to a real xquery engine.

However, this may well work fine for handling document reshaping type tasks, but returning to the use case, I could well have tens of thousands of harvested documents, if not millions, and so I may well be considering indexing. I don't know, but it would appear that one has to rewrite all substitution group elements when producing an index - does our eXist native xml database technology do this automatically for me? I don't know, and that's the point.

All this marvellous xml technology is bloody complicated, and all to handle the case that a community wants to restrict the usage of some tags or lists! Why make all this grief? Wouldn't it be far easier to give community guidance, but accept perfectly valid ISO19139 documents which fall outside that guidance, because we could all simply follow the simple rules in David Orchard's article article, and especially the one I've highlighted before:

Document consumers must ingore any XML attributes or elements in a valid XML document that they do not recognise.

We could rewrite that as:

Communities should give guidance on those ISO19139 attributes or elements that need populating for usage within the community (and which might need to be handled by community tools).

Job done. No complicated machinery. More tools available, easier indexing, and much easier human parsing of everything ... (from the schemas, to the instances, and all the code that handles them).

by Bryan Lawrence : 2006/10/19 : Categories xml iso19115 metadata ndg : 1 trackback : 0 comments (permalink)

Two completely unconnected things

I've just had a burst of rather intensive work over the last couple of weeks (hence the silence), and this lunch time I've rather ground to a halt. So, by way of light entertainment, I clicked on my akregator and started reading from the enormous number of unread things from the various feeds I think/thought I need/want to follow ...

Herewith are two links which apart from how I found them, are completely unconnected. I'm drawing to your attention because for very different reasons I valued reading them.

1) The first, is from John Fleck's blog, and is actually a higher profile restatement of a comment on an earlier entry by Nick Brooks discussing cultural responses to climate change. Go read the entire thing, but this is a taster:

... even if we accept that environmental crises led to the emergence of civilisation, we are still looking at collapse - the collapse of the societies that preceded these new cultures. In the Sahara we know that lifestyles based on mobile cattle herding collapsed, as did hunting and gathering. It?s the survivors who adapt, after the event.

It's that last sentence that got me thinking ...

2) And now from the April the first collection at the bmj, where quite clearly there is some sort of calendar problem, we have a series of fabulous articles and responses, but John Fleck (again), pointed out this one on the half-life of teaspoons ...

I wonder if they would get the same results with plastic teaspoons!

One serious. One frivolous. Viva Blogging! How how else would one find this sort of stuff?

by Bryan Lawrence : 2006/10/12 : Categories climate : 0 trackbacks : 2 comments (permalink)


DISCLAIMER: This is a personal blog. Nothing written here reflects an official opinion of my employer or any funding agency.