... personal wiki, blog and notes
Bryan's Blog 2009
Oh no I can't keep up ..
As you have all noticed, I'm a tad busy at the moment, and blogging has been the obvious thing missing out. The other thing missing out is new technology:
Google Wave I hear you say? No ...
Google Fusion Tables API... oh no not another thing I can't pay attention to.
Tony Hey, a veteran British computer scientist now at Microsoft, said ... "In the U.K I saw many generations of graduates students really sacrificed to doing the low-level IT."
Except it wasn't, and isn't, just the UK!
So timely. I gave a talk today about the pressure of big data, from models and earth observation ... and having to take it all seriously from a national infrastructure point of view ... I'll eventually stick the talk up on my publications page (it's too big to upload from home), but the slides alone won't tell the story ...
Actually there is a lot of other timely stuff going on. I'd like to be talking about the live blogging from AGU ... but I have time to skim the content, but not take it and comment here ...
by Bryan Lawrence : 2009/12/16 : 0 trackbacks (permalink)
I've just been reading "How well do we understand and evaluate climate change feedback processes", by Bony et.al. (2006) which appears in the Journal of Climate.) While I've delved into GCM cloud physics in the past, I've never really taken the trouble (beyond this) to get into cloud feedbacks in the climate sense, I've been happy to accept the received wisdom that cloud feedbacks are the dominant uncertainty in climate sensitivity, but that most folks (Lindzen apart) believe that despite their uncertainty, the sign at least is very probably positive, that is to enhance the affect of increasing CO2 on surface temperature.
This post is by way of notes from half a day following my nose down the rabbit hole, because for various reasons, I need to educate myself on the issue.
Storm intensity and Frequency
(Nothing to do with why I wanted to read the paper, but something I've been interested in for a while.)
As far as I can tell, this paper is close to the basis (see something like 90 citations) of the oft repeated statement that we expect storms to be less frequent but more intense in a future climate.
There have obviously been many follow up papers with other models, including for example, Leckebush and Ulbrich 2004) who in an analysis of GCMS and RCMs found (according to their abstract):
Although the overall number of modelled tracks is underestimated in the control period of the global model's simulation with present-day greenhouse gas forcing, compared to reanalysis data, realistic patterns of the track density over the investigation area are simulated.
Changes occur in particular with respect to the A2 scenario for extreme cyclone systems, while for B2 the changes are less pronounced. Especially over western parts of Central Europe, the track density of extreme cyclones increases for A2, accompanied by a tendency towards more intense systems. With respect to the A2 scenario, a tendency towards more extreme wind events caused by deepening cyclones is identified for several regions of Western Europe such as Spain, France, United Kingdom or Germany.
Additionally, the climate change signal in the regional climate model (RCM) HadRM3H is analysed. In accordance with the signal of the wind speed changes in the GCM simulation, the RCM reveals an increase of the 95th percentile of the daily maximum wind speed over extended parts of Western Europe related to the areas of increased track density of extreme cyclones under the A2 scenario. Changes with respect to the SRES B2 scenario are similar in their structure, but less pronounced in their amplitude.
A bit of googling resulted in an interesting powerpoint by Ruth McDonald which reviews a lot of similar studies ... and then I managed to get hold of Lambert and Fyfe, 2006, which has an analysis making the same point using a CMIP3 multi-model ensemble:
Key changes in clouds in a future climate
(This is what I was after):
Several analyses show no consensus in the global response to clouds and cloud radiative forcing to a given climatic perturbation.
Two analyses showed that the global cloud feedback is positive in all models, but there are large intermodel differences in the magnitude of the feedback.
The frequencies of occurrence of different cloud types (both observed and moeled) are highly unequal and so the behaviour of certain clouds may matter more than that of others in explaining the range of feedbacks.
Several studies show that the responses of deep convective clouds and of low level clouds differ among GCMs.
Changes in the water content of different types of clouds also differ among GCMs ...
Differences in cloud feedbacks in areas dominated by low-top cloud responses make the largest contribution to the variance in the global feedbacks, with
in the tropics, the response differs most between models in subsidence regions (which is consistent with low cloud anyway)
the large fraction of the earth covered with this type of cloud being an important contributor ...
At the same time however, we have reference to Zhang et al, 2005 which took me off on another riff, but that's a topic for another day.
Cloud Physics Parameterisations
(this is what I remain most interested in, science-wise, should I ever get any time to do any science myself ever again ....)
One of the things I'm looking forward to getting out out of metafor is a decent summary of what the current state of play in cloud parameterisations is in GCMs ... I've been out of it for just long enough that it's hard to get back in ... a souped up version of the following table (for just three models, from Wyant. et.al.2006) for all the CMIP5 models should be just the ticket:
Papers I now want to read
For these I could only read the abstracts, due to the paucity of support for climate science in my institutional library (which is great for high energy physics, apparently, but crap for climate) ... I'm glad that I could get to most of the others via self-archived pdfs (yes, I could get to some via our library subscription!)
Uncertainty in Projections of UK Climate Change Resulting from Regional Model Formulation, Rowell, 2006
Changes in mid-latitude variability due to increasing greenhouse gases and sulphate aerosols, Carnell and Senior, 1998
(Note that the figures and table have been downgraded in quality by removal of information, consistent with my fair use policy.)
by Bryan Lawrence : 2009/12/14 (permalink)
The Back of the Envelope and the Removal of Guilt
Some of the most important things one learns in a physics degree are:
Dimensional Analysis (have I stuffed up anything obvious?)
Scale Analysis (what terms in this equation actually matter for this problem?), and
Start with an Envelope (know what your answer is roughly, before further calculation).
The last is pretty fundamental. I used to teach a course called "Nursing Mathematics" at my local polytechnic, for, you'll not be surprised to know, nurses. The entire point of the course was to give nurses the mental arithmetic tools (supplemented on occasion by an envelope and a pen) to know what the answers to most of their day-to-day calculations are, before using a calculator to get things right. You may or may not be surprised to know how important this is, folk have died because of incorrect factors of ten in IV flow rates ... Scale problems happen often with calculators, but less often when individuals have a grip on the scale of answers before attempting the "real" calculation (either from experience, or explicit "pre-calculation").
Which is a long winded way of introducing a post which both validates my original decision to buy an electric lawn mower instead of a petrol mower, and removes my guilt over not buying a push-mower.
I spend a lot of time thinking about this sort of issue, but never get down to writing it out. Well done "King of the Road". My current suspicion is that it would be a good thing to get an electric garden shredder, rather than pile the stuff on a regular basis in a car and drive to the dump ... one day, I will do that calculation ...
Meanwhile, along David MacKay's fabulous book, I feel I now have two "back to basics" places to go to find some numbers about practical ways of dealing with our energy futures.
Drawn into climategate
It's a long time since I bothered to write a letter to the editor, but the cru email controversy pushed me over the edge. Or to be more precise, Anne McElvoy's opinion piece did. Of course I would never have known anything about it, not normally having access to London Evening Standard , but their letters editor actually approached me to solict an intervention in the next edition (by which I assume that a) he was desperate, and/or b) he figured I might say something silly, or both).
In the event, I must say I was pretty inpressed with the way he elicted the following from me:
Anne McElvoy's opinion piece (25 November) strikes an unrealistic view of what has occurred in the University of East Anglia hacked emails controversy. Extended scientific conversations occur in parallel through various channels, and it's completely unreasonable to expect these conversations to be comprehensible based on a small subset without all the previous baggage. The point of scientific record and peer review is for scientists to stand up their claims and have them evaluated. Of course scientists have a prior view of what data could be telling them but the difference between scientists and most "sceptics" is scientists change their hypotheses when necessary. Most practising scientists will go out of their way to have an honest discussion about issues in their work. Among the sceptics are those who frame their questions in terms which can be addressed; but there aren't enough climate scientists or hours in the day to educate those who believe in the climate physics equivalent of a flat Earth.
with Most practising scientists will go out of their way to have an honest discussion about issues in their work being bit that they chose to highlight.
Of course I never had time to write anything that brief, the following is what I actually wrote (in a tiny amount of time, obviously while doing other things):
The Original(decorated with a link to my favourite parody)
Anne's opinion piece strikes a harsh and unrealistic view of what has occurred in this recent controversy.
My personal opinion is that extended scientific conversations occur in parallel using multiple mechanisms (phone, email, actual meetings etc), and we are only seeing a part of conversations where the correspondents have sensibly used shorthand like "trick" ... and it's completely unreasonable to expect these sort of conversations to be comprehensible in their entirety based on just the email subset without all the other baggage from prior and parallel conversations that's not explicitly included. Indeed, we can construct versions of reality from these subsets of actuality which are completely bogus, and that's what we are seeing happen ... (and the fatuousness of doing so is what some of the parodys that Anne is so dismissive of are trying to demonstrate, for example this).
Further, the entire point of the scientific record and peer review is for folk to be able to stand up their claims and have them evaluated (and to put effort into defending if that's appropriate). The process of doing science, however, involves the construction of hypotheses and their evaluation without documenting every blind alley to a publishable level for the benefit of other folk who steal notes or emails. Of course scientists have a prior view of what the data is or could be telling them, but the difference between scientists and sceptics is that scientists evaluate the evidence and change their hypotheses when necessary. There seems to be no evidence that the sceptics are doing this ... and there is no evidence that I'm aware of in these emails that Phil Jones or any of the other key participants are not.
Finally, most practising scientists will go out of their way to have an honest discussion about the facts and issues of their work, and it's my opinion that that is particularly true of those working in climate sciences. However, what we can't do is make the time for every argument from a community which does indeed seem to "deny" basic physics. Of course, amongst the sceptics are those who can and do frame their questions in terms which can and are addressed, but there aren't enough climate scientists or hours in the day, to educate those who believe in the climate physics equivalent of "a flat earth".
A few days later
I had clearly realised by writing something so long, I had made a hostage to editorial fortune, so I have to say, all kudos to their editorial team; their abstraction is a fair reflection of what I was trying to say (and by virtue of its brevity, probably a lot more effective).
However, despite thinking brevity is a good thing, I can't quite leave it alone because I think some things still need to be said. But, I can't do justice to what I want to say either, because I don't have time ...
In the best of possible worlds I'd like to spend time stressing and explaining how unreasonable this deconstruction of the CRU email is, but Gavin has already done that (context and original), and Anne didn't buy that.
I'd like to spend some time explaining just how time consuming debating the issue is when one has to spend most of the time dealing with basic scientific issues rather than the issue of global warming. Frankly, I don't think that's a good use of tax payers money, I'm paid to do climate science, not teach high school and/or undergraduate physics. Which is not to say I don't think more communication should happen, just that I'm not the best person to do it, and neither are most of my colleagues. But she'd probably think I was in my ivory tower, rather than just trying to be practical ...
I'd like to spend some time explaining what we know, and what we think, what we mean by probability and uncertainty, and give her that rigour, but like all communication, it takes two to party, and given that she appears not to have actually herself talked to a climate scientist about the import of these emails, I'm not sure she'd listen.
I'd like to explain why "deniers" is exactly the right word for some of the "sceptic" community, and why it's not sloppy to use slang and abbreviations when the other party to your communication knows what you mean, and you have no expectation that anyone else is ever going to see what you have written. If I really thought posterity was going to judge everything I wrote, I would be much less efficient ... is that lack of efficiency worth the price? Is it really? Of course, when I do try to communicate to a wider audience, then of course it's reasonable to expect a professional effort ...
And there is much much more.
However, it's not the best of possible worlds, because I haven't the time to take on all thse conversations and simultaneously do what I'm paid to do ... this week alone, I have no time that is not already committed to meetings and actions, and it's a rather typical week.
Quality communication and education takes time, lots of it, and that's why I wish for more quality journalism ... and, if society wants it, more professional scientifically educated communicators to work in the interface between the scientific coal face and the public and their policy engine of state.
by Bryan Lawrence : 2009/11/29 : 0 trackbacks : 2 comments (permalink)
erase to transparency
The easy way (with The GIMP): somehow select an area (choose by fuzzy selector contiguous colour is one good option), then, if your image hasn't got an alpha channel, add one (right click, choose transparency, add an alpha channel) ... then color to alpha (right click, choose transparency, color to alpha).
I'm still here ... but my day job is all consuming at the moment. Blame CMIP5.
by Bryan Lawrence : 2009/11/09 (permalink)
Everyone needs to understand this:
A 50%-good solution that people actually have solves more problems and survives longer than a 99% solution that nobody has because it's in your lab where you're endlessly polishing the damn thing. Shipping is a feature. A really important feature. Your product must have it.
It applies to science too ... politics, you name it ...
Reading in 2009, 15-18: Teenage action fodder for summer
Even the unobservant readers of this blog will have noticed the silence by now. It's workload folks ... something has to give. I'll be back ...
Meanwhile, for your amusement, and my records, here's the first half of my summer reading ... as a snapshot.
All four of these came for 20p each at the village Fete (you'll be right if you were to suppose that I didn't just buy four books for 80p, there is a fifth, and I just finished it, but there were a few in between I need to tell you about, and I'm not ready for that yet.
The thing these four have in common is that they're more teenage boy action fodder. I chose this lot on the grounds that I expected to spend lots of the summer with only five minutes here and there to read, and it was pointless to try and read something which required anything approaching attention. I also chose them because the rest of the books on the stand sucked even more ...
Anyway, here they are, starting with three from Jack Higgins:
The Keys of Hell and the Day of Reckoning. These are the same book, oh no they're not, they've got different covers, and the characters have different names, and the country they're set in is different, and the events are different but they are the same book! As I read the second, I kept having deja vu. Surely I just read this? The word formulaic jumped into my mind and would not let go ... even James Bond is less formulaic ... oh, the plot? Well I've almost deliberately forgotten already, but basically it's superhero agent type bloke beats mafia (or equivalent) butt (in one country or another ) ... yawn. Still somehow I read them, so perhaps despite my age, there's a dose of brainless teenager in me yet, I could have stopped, but they were so easy to read in a low intensity sort of a way ...
Fortunately Thunder Point did actually have something approaching a plot, and although one of the same superhero agent dudes was in this one, somehow he came across more plausibly (or maybe my ability to suspend disbelief had been suped up by a day or two of warm sunshine) ... anyway, if you have to read one of these three, read this one - it's on safer ground for Higgins being vaguely related to WWII.
The other one was a different kettle of fish.
Bernard Cornwell's Sharpe has been fighting the battlefields of the early 19th century for a wee wihle now, on paper and on the tellie, and I've been an avid reader and collector since the beginning. Yes, this too is boys own stuff (it might seem like this is a recurring theme, but be fair, a lot of what I read isn't), but it's historically accurate and when I was a teenager I was right into the history of the Napoleonic wars, so it's fiction set in a period for which I once knew the history pretty well.
Anyway, this one was Sharpe's Fortress It's typical fare, Sharpe is mistreated by erstwhile colleagues because he's not a gentleman, makes friends with some real men, dallies with a female (incredibly peripherally for a relatively modern book), beats various foes by his own bloodymindedness and martial skill, and conquers a fortress to boot. No wonder he was on the tellie.
I must be getting older though, so while I enjoyed it for what it was, I'm getting less keen on following Sharpe, the books seem more formulaic (There's that word again), and I'm finding it harder and harder to enjoy books that are primarily about kicking butt without some more interesting twists.
Anyway, amazing what happened when I pitched up at a School Fete, and there were piles of books for 20p each isn't it? I ended up buying a bunch nearly all the same (the fifth book was very different ... but I'll leave that as a tease).
The relationship between collecting metadata, and the optimum size of a child's plate of food
I've said it before, and no doubt I'll have to keep saying it, but the word metadata is understood by nearly every individual differently. This has a number of consequences, starting with defining (in any given) case, what comprises metadata. The problem is nicely encapsulated a recent email on the Galeon list. I hope Gerry Greager wont mind me further publicising his statement:
A lot of otherwise really sharp folks tend to define everyone's data and metadata by their own prejudices, including me. After all, MY data's easy to identify and define, and I can see how YOUR data should be identified and defined, too. What? you don't agree with me? How dare you?
The corollary of this position I'd state as:
It's really easy for me to create my metadata, and time consuming and unrewarding to create the metadata you need. How dare you ask me to waste time doing it? Oh, by the way, can you please create the metadata I need to consume your data? What, you don't want to unless you're a co-author? Bizarre!
(In this context, consumption goes well beyond merely being able to load and manipulate the data, the meaning and context matter too ...)
This has interesting consequences for those of us trying to collect metadata within projects, like, for example metafor. There, one of our goals is to document the models used and simulations produced in CMIP5. That means, we're going to be asking the modelling groups to enter metadata about those models and simulations, and it's going to be time consuming to do so. I expect many will consider it not of direct benefit to them (that said, I hope just as many, if not more, will recognise direct benefits). Indirect benefits should be obvious: the better documented we make these models and simulations, the better the interpretations and derivative science should be, particularly when those intepretations and derivations are done by those outside the normal community of model data users, by folks who need that extra metadata to be sure of what they are doing (or even to do it at all).
Ok, so I think I make a cogent argument about benefits, so where does childrens eating behaviour come in? Well, I think when one is trying to gather metadata, we're in the same boat as parents are with young children: if you put too much food on the plate, kids just dabble round the sides and don't each much. Put the right amount on the plate, and kids gobble it up. Too little, and you're back to "don't each much".
So, when asking for metadata, it's crucial to ask for just the right amount, enough for a large proportion (but not all) the potential data consumers, but not so much that the task of producing it puts off the metadata producers, and you end up getting little or none of what you need. (And don't ask so little, that you end up getting little or none of what you need.)
Returning to metafor, the question I keep asking myself is: "Should we ask that, will it put folk off answering at all?" The problem of course is, knowing what the answer is. Again, like children, we need to try and second guess how much capacity and desire there is ...
With a metadata entry tool, we're in the even more complicated situation of a children's party: we have different children (with different capacities and desires), so we have to work out the average plate size, but allow for second and third helpings for those with large capacity. That is, we need to guess how much metadata we can reasonably ask for in the average without making it look too large, but make it possible for those with the interest and desire to give us much much more information within the same structures.
The answer is further complicated by the situation. To push this analogy even further, In the case of CMIP5 we have the advantage of a peer-induced pressure (for increased metadata production), just like at that party, where there is peer-induced pressure (for increased food consumption).
Silence is golden, so this must be a golden blog ... the last month I've either been head down over the keyboard slaving on the development of a tool to collect descriptions of the models used and simulations produced for CMIP5, or on holiday. More of the latter soon.
Meanwhile, as I sat in the backyard on Sunday afternoon, supping a Grove Mill Sauvignon Blanc, my head fell backward as it is want to do, and I gazed at the contrails decorating our sky. They often decorate our sky, which means amongst other things that a) we seem to sit under the entry flightpaths for a lot of routes into Europe, and b) our sky is often clear enough to see them ... but that those two factors are not unrelated. Sometimes our sky isn't clear enough to see contrails, because it's full of clouds made up of ... dispersed contrails!
Looking at them reminded me of their radiative importance, and influence on climate. The literature used to be full of stuff about it, ten years ago, but things seemed to have moved on. A couple of numbers stuck in my head though: Mannstein (1999) found in an algorithmic analysis of AVHRR derived contrails a global mean annual contrail coverage of 0.5%, with significantly higher values in some times and places (and this is itself a factor of about 50-100% less than other manual analyses). I vaguely remember figures as high as 6% for central Europe at some times (but can't track down the reference quickly). One also has to remember that these numbers have to be an underestimate, because as they spread out they become less detectable, and obviously the presence of other clouds can both mask such contrails or, indeed, be caused by them.
Bryan's Fishy Example
I've had a few days lately where I've trotted out my favourite "easy to understand example" as to why some scientific datasets need to be preserved, and some do not. In the final analysis, we find that it's easy to identify some datasets as "must preserve" and then we enter "value judgement" territory. But it's helpful to have a strawman to consider on the way.
With apologies to those of an aquatic bent, I like to pick on fish.
Imagine that I make a series of complex measurements of the anatomy of my favourite fish using my equally favourite instruments. I publish those measurements. Should those data be actively managed for posterity by some third parties (other than me)?
Well, I would answer no, with a couple of caveats we'll get to. For a start, someone else could wheel out their favourite instruments and make the same measurements on another example of the species1. This of course assumes a) there are enough of them around that more can be found, and b) they're easily accessible (eg. not at the bottom of the Mariana trench). If either of those assumptions are broken, then maybe my measurements might be nearly as important as I think they are (like all scientists my natural assumption is that any measurement I make must be really important). We'd better also add the caveat that if we can predict the usefulness of these data in some composite study, then we might want to have it managed too ...
On the other hand, if I made measurements of the number of this fish in some geographical region, that measurement can be repeated for sure, but it will be at a different time, and the time sequence will in and of itself be another interesting measurement. So by default, in this case my (original) data will be (or could be, there are few guarantees in this game) of use in some future study.
Clearly the difference between these two cases (sans caveats) is that in first instance, I was making a measurement that was repeatable (and the likelihood is that the repetition would be in the future with better, stronger, faster instruments). There is no intrinsic interest in joining those two sets of data (my original and the later better) to build a newer more useful dataset. Nor, I presume, is it likely that my numbers will form an input into some later calculations. In the second instance, it's obvious that my data could form part of a useful composite dataset, and it's obvious from the moment I made the measurements.
So, now let's abstract that a little bit. We have two sets of data:
typeB: immediately and obviously of potential interest in the future, and
typeA: not obviously a candidate for future composite analysis.
Given we (humanity) have a finite resources for data management, it's a no brainer: we manage typeB, and consider typeA for a bit longer.
typeB was easy, but for typeA we have a few things to think about. We've got our caveats:
If it's likely because of rarity or difficulty in obtaining samples, that the typeA measurement can't be repeated: Manage it. If not, ask the next set of questions:
Is it likely that some sort of composite dataset will be constructed using my data (perhaps we can foresee the necessity to build the statistics of these anatomical measurements across the population)? The keyword here is likely and it has to be weighed up against the difficulty, expense, and likelihood of repeating the measurement. So here we are in value judgement territory.
Some of these value judgements are obvious, some are less so, but at the end of the day it's pretty hard to set up a hard and fast set of rules. So it's a continuum: at one end, we just make a rule, manage that type of data (e.g. gene sequences), at the other we decide not to bother (simulation sensitivity analyses). In between, we dither. For the "not to bother" stuff we shouldn't do nothing though: that material is suitable for personal data management.
What? Personal data management? Has Bryan been snorting something naughty? (Editor: no).
None of the above considers the issue of "preserving the scientific record". Regular readers will know that I consider that a pretty important issue, and one that also leads to an argument about keeping data. However, I think much of the typeA data above is more like the lab book you (should) keep: it should exist in your possession for some period (years) after you publish the data, but at the end of the day, it doesn't really need to persist. In this case, the scientific record will get on just fine without the numbers. TypeB data, though, we need to keep it, not for my sake, but for all our sakes.
Before we leave this example though, for completeness, we should consider one more issue. If, when I made my anatomical measurements, I tagged them with the geographical location from where I nicked my species examplar, then I'm vastly increasing the chances of my data being useful in some environmental application. So the lesson there is, the more complete my metadata is, the more likely my data is to become typeB and an obvious candidate for future use. Now, when we get data citation working, it also becomes a candidate for citation independent of the paper I wrote about my fish. Collect and write that metadata. You will get the credit one day!
Reading in 2009, 14: Canadian Wildish West
I can see why this was a prize winning book. It is a good read, although I thought a few too many threads were left hanging at the end. In that sense it felt not quite finished, although to be fair, the hanging threads weren't dangling completely unsuspended. Read it for yourself to make sense of this paragraph!
It is worth the read. The book is set in Canada in 1867, where in a small community a man is found dead, scalped, in a cabin. Much follows, with several treks across the wintry desolate Canadian north (probably by a few too many people, there's one lot whose only function seems to be to link two disparate parts of the story). There are some characters who are drawn in lots of detail, and then seem to disappear without trace, and others are drawn in less detail, but more central to the story. However, all this seems a bit carping. I liked the book, and as a tale of epic derring do of (wo)man against nature, it does very well.
I'm amused that wikipedia says that Penny was suffering from agoraphobia while writing the book, and the research was all done in the library. What was that about plagiarism being copying from one person, and research (novelty in this case) being copying from many people?
UM Crack Cocaine
One of the items of discussion at yesterday's NCAS advisory group was whether or not NCAS should be working with more than one mesocale model (currently we do, with significant efforts invested in both the Met Office Unified Model and the U.S. WRF model).
The issues on the table included:
Can we support the model users for two models? Answer: yes, because we (NCAS CMS) support the academic Unified Model users, and NCAR support the WRF users, so WRF requires relatively little NCAS support.
Does the use of WRF contribute towards numerical weather prediction (NWP) (as opposed to understanding of weather processes)? Answer: not directly, in the UK! But we're told that because WRF is currently "easier" to use than the UM, it's easier to build a wider community of modellers, who one might then move onto the more complex environment of the Unified Model, and then contribute directly to UK NWP improvements. (And, in any case, the NCAS role in this area is really about "basic underpinning science" not about direct measured improvemetns in NWP, which in the UK is primarily a Met Office issue).
I just couldn't help feeling that WRF was being seen as the entry level drug for the academic UK NWP community to the UM crack-cocaine. But maybe I was just looking for anything to enliven two days of committee meetings ...
by Bryan Lawrence : 2009/07/08 : 1 trackback : 4 comments (permalink)
cf standard names growth
Adequate descriptions of scientific data depend on precise descriptions of what the data actually are. At the heart of that are what we call "phenomenon descriptions", which at BADC and in much of the climate community, we handle using "CF standard names".
Alison Pamment here at BADC is the international CF standard names manager. Today she posted an update to the CF standard names list, and buried in her email to the CF list was this:
The current version is now version 12, dated 6 July 2009. This has been a very large modification to the table - 804 new names, 67 aliases, 19 clarifications to definitions and 3 modifications to units - almost doubling the length of the table to just over 1900 entries.
This specific large update is part of the preparation for CMIP5, and represents a huge amount of work. We don't do nearly enough to recognise this sort of essential underpinning work, without which we couldn't build automatic tools to find data ... and without which, scientific users couldn't have confidence in knowing exactly what a particular variable measured (or simulated). Well done Alison, and all the folk who worked hard proposing and defining the names!
Reading in 2009, 13: Songs of Earth and Power
... however, as always, I could remember nothing from the previous read. Actually, in this case, really nothing: at no time while reading this book did I even have a sense of deja vu, and perhaps that's because I've recycled those brain cells because they could be better used for other tasks (like remembering who scored what when in some random game of rugby). From which you can rightly infer that I don't think this one of Greg Bear's best books, not even close.
It's actually two books, based around the premise that there is one particular parallel world that intersects with Earth in various ways, and on that world most of the faerie live ... until that world starts to decay ... We have one hero who gradually grows into the recognition that he is special (eventually really special), and soon he's a super hero, dealing with the faerie (and the most powerful magical faerie) as an equal, and then a superior. It's a harmless enough book (or pair of books), significantly better than any of my previous foray into fantasy this year, but I suspect I'll not bother reading it again (primarily because my shelves really are groaning, and something has to go, and given I'm not overly enthusiastic about this one, it might as well be one of the discards).
One of the things that happened while I was away as the release of the latest UK climate predictions. This was (and is) a relatively big deal! While some have griped about the apparently unrealistic level of detail, the reality is that we do have to provide our very best predictions of climate over the next century, and we have to do it at the highest resolution we can. Of course there are caveats, but given we're all convinced that the status quo is no longer a good predictor of the future, then we have to do better. Hence this government initiative which has run for a number of years to fund the running of ensembles of high resolution predictions, to come up with probably estimates of the future, and to produce a website so that prospective users can tailor the predictions to their own backyard. We've been responsible for the last bit!
There are multiple levels to the information on the web, but deep down in the technical site, there is a user interface that allows you to tailor plots to your own backyard. A team of folk at CEDA have been beavering away at that for a number of years building the infrastructure to underpin it, and this blog post is by way of my public thank you to that group of folk, who have worked long hours, to deliver a state of the art data manipulation and visualisation system. Newcastle University was responsible for much of the front end, but most of the hard yards, and the integration were done by our folk. The infrastructure was built on the knowledge we gained from ndg, exploits a raft of home grown, and specially customised OGC services, and runs on virtualised machines with all sorts of load balancing and monitoring goodness.
Over the next few weeks I'll expose a few of the things that you can do with the UI, for now, a quick taster:
The results I'm about to show are for local summer (JJA) at my local grid-box:
I've got just two plots to show for today, and given the current hot weather, I've selected the "mean daily maximum temperature", and how that will change for summer in the next thirty years, and the last thirty years of this century. (Change, with respect to a 1960-1990 baseline.) To really understand this stuff, you need to read the report guidance, not just look at the diagrams ... but for now, like I say, a taster:
The key points to note are that,
Regardless of which emissions scenario you think will pan out over the next thirty years, the likelihood is that we will have summers where the mean summer daily maximum temperature is going to be around two degrees hotter than it used to be ...
However, by the last thirty years of this century, choosing the emissions scenario really does matter. I'm pretty certain that six degrees as the average daily maximum increase could cause real problems here in England, it just wont be nicer finer summer days!
Now of course there are caveats to this sort of prediction, but for my money you'd be better betting on these sorts of numbers than on any hope for the status quo ... and what that means is
It is by no means too late to want to do something about our emissions (i.e. to mitigate against even more worrying future change), and
In terms of adaptation now, these numbers are the best predictions that you're going to get for a few years!
So, I've been quiet. That's what happens when you go on holiday, where the electrons don't go ... camping in the "Doone Valley" for three nights, followed by six nights "on the beach" in Hayle.
A flavour of the time:
Down the left: the valley itself (we camped a few hundred yards or so down the river from that point), an Exmoor poney near Dunkery Beacon, and an image to remind me of the night cold - and the beauty of real fire.
Down the right: the beach at Hayle looking towards St Ives, visiting the Chysauster iron era village, and the night warmth - and the beauty of seaside sunsets.
by Bryan Lawrence : 2009/07/01 : 1 comment (permalink)
Interoperability, Data Fusion and Mashups
I'm involved in the development and implementation of a new NERC information strategy, and it's quite clear that amongst those of us developing the strategy, and amongst those making decisions in response to the strategy, there is a clear lack of agreement and/or understanding what the following three terms mean: interoperability, mashups, and data fusion. Some folk use them interchangably, some use them with specific activities in mind. Confusion about definition always leads to slower progress.
So here are my working definitions:
Some Generic Answers
From the Princeton Wordnet: The ability to exchange and use information
From Wikipedia: A property referring to the ability of diverse systems and organizations to work together
From the Digital Television Glosssary(!): Interoperability refers to the ability of a system or a product to work with other systems or products without special effort on the part of the customer... (my italics).
Getting more specific:
From the Software Engineering Institute: The capability of two or more components or component implementations to interact.
The ability to exchange and use information requires:
Tools to exchange information (and the ability to use those tools). Where those tools consist of systems that integrate information from a variety of systems, without special effort, then we can talk about service interoperability.
The ability to interpret and use the information when it is acquired. When that use involves integrating information from disparate sources, we want data interoperability - that is, we want to be able to integrate the data/information to a common format/view without special effort. This last requires a range of activities:
Standardised metadata structures, and
Standardised (controlled) vocabularies, with
Ontologies to mediate between community standard vocabularies.
It's important to reiterate that interoperability is nothing more than the ability to do something quickly and easily without special effort by the user, it generally doesn't involve the doing of something that otherwise couldn't be done. What it does do is:
Enable the doing of something that wouldn't otherwise be done because it's too much effort (e.g. a wider spectrum of hypothesis testing), and/or
Provide efficiencies of scale, if a task needs to be done many times (by different communities or users), then it obviates the necessity for the development of specific solutions by each community/user (thus saving in time and cost).
Practical uses of interoperable services include providing both "quick look" visualisation and sophisticated graphics, with relatively little effort by the consumer, data manipulation services, and a host of other useful tools for the scientific toolbox.
The "WikiWorld" offers us two definitions for this:
Wiktionary offers us: A derivative work consisting of two pieces of (generally digital) media conjoined together, such as a video clip with a different soundtrack applied for humorous effect, or a digital map overlaid with user-supplied data.
Wikipedia itself gives us: Mashup (web application hybrid), a web application that combines data and/or functionality from more than one source
There's a clear implication that a mashup is generally a temporary construct, and in the context of our data/information strategy, the place of mashups is in the construction of maps/views which show the spatial relationships between different data/information entities. I would argue that such mashups are generally done as part of hypothesis testing of the sort "is there are a spatial relationship between these two quantities worth pursuing?" or "what data/information is available in the neighbourhood of this spatial feature?"
Of course to deliver mashups we have to have interoperable services capable of providing views of data in (or on) a common visualisation paradigm (e.g. a map) (Almost by definition, a mashup occurs because of the use of interoperable services, while you might achieve the same result - map or whatever - via a different technique, it wouldn't be a mashup without the underlying assumption that it was delivered via interoperable web services.)
The WikiWorld provides a set of useful starting definitions for data fusion too:
Wiktionary: Set of methodologies for fusing information coming from different, and sometimes non homogeneous, sources. The result of fusion is a qualitatively different knowledge always referred to a context.
Wikipedia: The use of techniques that combine data from multiple sources and gather that information in order to achieve inferences, which will be more efficient and potentially more accurate than if they were achieved by means of a single source ... combines several sources of raw data to produce new raw data. The expectation is that fused data is more informative and synthetic than the original inputs.
It's clear that data fusion is a more mature activity than a mashup: the expectation is that data fusion results in a product, and that the product is "more informative" than the individual constituents. One might be tempted to suggest that a data fusion product is more than the sum of it's parts!
Another point of distinction between data fusion and mashups are that the latter are done "geospatially", data fusion can occur along any useful axis of the resulting data object (e.g. time, wavelength etc).
It's also clear that data fusion doesn't require service interoperability, nor does it require data interoperability, but it's absolutely true that delivering data fusion is easier with data in common formats, described in a common manner (data interoperability). It's also absolutely true that data fusion possibilities can be explored more quickly by using interoperable services.
Putting all that together
I spelt out my world view of the scientific method in a previous post. Nearly every one of those steps can be enhanced by one or more of the facilities outlined above:
Background research involves evaluating existing data (how, by, obtaining, visualising and comparing. Mashups, and interoperable visualisation services facilitate this process).
Constructing hypothesis may involve the evaluation of mashups, data fusion activities and visualisations.
The creation of new data may involve the fusion of existing data just as much as it might include new observations and/or simulations
Analysis of results includes the comparison of new data with old data (interoperability) and
Archiving data should involve the creation of interoperable data.
Reports should involve interoperable data,
and the annotation of data should involve interoperable tooling.
So put simply, the NERC information strategy needs to improve ones' ability to carry out data and information processes at all the steps in the scientific method. However, if one was to pull out the real game breaker here, I would argue that the interoperable, mashable, fuse-able spectrum improves ones ability to a) answer "what if?" questions with relatively little effort, and b) carry out data fusion, faster, with less effort, and hopefully (because of better metadata) with better provenance associated with more accurate results.
Of course, we'll only ever know whether those last steps did result in better scientific outcomes, and if our investment in more interoperability was worth it, when we can see the ultimate scientific metric: dataset citation!
Reading in 2009, 12: Roman capers
Fire in the East by Harry Sidebottom was another airport purchase (the second of the two I took to Vienna). Unlike the last one this one came home, partly because I hadn't finished it, and partly because it's a rollicking good read if you're into "late classical period military romps" ... (when the Romans were Romans and the Gauls - with the honourable exception of Asterix and all the other anti-Roman heroes - were cowering behind their druids' skirts).
Anyway, this novel is set in the "late Roman period" when more than just cracks in Roman invincibility were appearing on the boundaries of the Roman empire(s), and it's right on the boundary where it all happens. It's all a bit boys own, the goodies kick butt by various strategies which wouldn't have shamed Baden Powel, and the plot is simplistic to say the least. However, I didn't give a damn. I enjoyed the book, the prose is good, the action almost believable. I did have a problem with the comparison between being under a barrage of whistling stones and that of modern shellshot, the language used (artillery) and the extent of the feelings invoked (terror of the noise etc being equivalent to a modern experience) just didn't feel right - but what do I know?
The goodies are good, the baddies wierd, and the spies caricatures. This book is perfect for the adolescent boy in any reader (whatever their gender). It's the first novel of the "Warriers of Rome" so there is more to come. Good!
A controlled vocabulary for advection schemes
Anyone who collects metadata for a living knows the problem. We ask a bunch of folk to give us metadata, and the results are variable, and not much use for searching and comparison.
Here's an example, colleagues in the Curator project asked a bunch of folk to provide a name for their advection scheme in a dynamical core comparison project. This is what they got:
(sorry about the crap quality screenshot, but that was off a shared desktop an Adobe Connect session).
In an attempt to do something approaching science, I offered to help untease these in the Metafor controlled vocabulary work (needed for CMIP5). In metafor we've potentially got a bunch of places to put this information, as part of the Metafor dynamic core description, including potentially:
advection name (Prather etc)
advection type (Semi-Lagrangian)
accuracy order (1st order etc)
conservation fixer type
conserved properties etc
In the rest of the dynamic core we currently have sections on the basic approximations, time-stepping framework, horizontal discretization, horizontal diffusion, boundary conditions, and conservation methods. We clearly need to decide how much the conservation properties are part of the total equation set, or the advection scheme alone). We also need to distinguish between advection for tracer transport, momentum transport, and advection within component models (such as chemistry models).
I also need to comb the literature for some key papers, to construct some a priori expectations for default names.
This will be rather more fun than my normal day job!
(By the way, this is not for an effort to describe models in enough precision to rerun them, it's in an effort to describe what has been run ...)
Aral sea dessication
Back in January I "reviewed" Fred Pearce's "When the River's Run Dry", and in that I mentioned in passing the fate of the Aral Sea. Tonight I came across the Nasa Earth Observatory's feature on the changes in the Aral Sea using Modis images. Just as I said back in January, I vaguely knew about this, and the detail was scary. However, it's one thing to know something intellectually, and another thing altogether to know it emotionally - there's something really immediate about pictures! So here are concatenated two images from the Nasa sequence: a map from 2000 and one from 2009 - with the 1960 shoreline shown for comparison. I reckon it's truly stunning to see the change from 1960, to 2000, and on to 2009!
(For the record, these images were halved in size from the original using ImageMagick's convert -size function, then concatenated using convert a.jpg b.jpg +append output.jpg, and then I crudely drew over the top of the Nasa shoreline using the Gimp, so it was clearer in these reduced resolution images, before a final shrinkage.)
Reading in 2009, 11; One to miss
The fundamental plot idea caught my attention in the airport bookshop. Time was short, about to spend a week in Austria, quick I thought, buy it. Given a bit longer to browse, I might not ...
... and I would probably have been happier not having read this one. The "genesis" part of the plot is reasonable, but the main narrative weaves between the activities of a psychopathic killer who delights in horrific methods of killing people and "our" hero, Rob. Obviously the threads are brought together to an ultimately satisfactory conclusion, but for my taste, there is far too much gruesome detail about the killings, and the main tension in the book is about when next one will turn a page and find something horrific described. The character descriptions are uneven, but the plot does have enough momentum to drag you through the book, even as it might sicken you. (I wonder what I might have done had I other reading choices ...)
I did finish it, but I have no desire to read it again. I left it in my hotel room. I've never done that before: bought a book, and not got it home, or even thought it worthy of trading in.
confidence and competence
Last night I read an article in New Scientist, the general gist of which is that we all prefer advice from a confident source, even to the point that we are willing to ignore a poor track record.
Well, I'm not sure that's true about scientists in their domain of expertise, but I don't for a moment think we're any different in fields where our confidence in our own expertise is lower (and I like to think we're slightly better than average at measuring that ... but, see below ...)
The article goes on to say:
With complex but politicised subjects such as global warming, for example, scientific experts who stress uncertainties lose out to activists or lobbyists with a more emphatic message.
And less than twenty-four hours later, Steve Easterbrook has a really thoughtful and inciteful (is that my word of the week), post on incompetence and awareness, drawing on some psychology research. With apologies to Steve, I liked it so much that I've nicked the graph that he nicked:
and the strapline he put with it:
People who are in the bottom quartile of actual performance tend to dramatically over-estimate how well they did. People in the top quartile tend to slightly under-estimate how well they did.
while noting that he was summarising original research, which I haven't read :-).
Anyway, it's quite a juxtaposition isn't it? Those who are most wrong are least likely to perceive it, and we're more likely to listen to those who are most confident, regardless of their relative competency and experience.
It would be a recipe for disaster, were it not for the fact that not all decisions are made by those who have a propensity to overrate confidence and ignore competency (i.e. the general public). However, decision makers also need to be confident enough in their own decision making to ignore majority opinions (based on confident incompetent advice). The problem there is that the payback (on making unpopular decisions) is generally longer than the electoral cycle ...
Peak Everything revisited
Despite appearances to the contrary, I'm not planning on disappearing down a computing geek hole, and forgetting about climate and the environment, it's just that the day-to-day problems on the top of the list at the moment are "informatics" related.
Anyway, in the interests of showing willing, and doffing my hat towards Michael Tobis who continues to plough a unique and important furrow in the blogosphere, he pointed me towards reading two fascinating things today while masticating my sarnies:
At some point in the future, existing economic ideas just have to break, growth forever is a broken idea. I still remember fervently the day I first realised it, when as a young teenager some Isaac Asimov book (I think) pointed out that at the then current growth of the human race, all the molecules in the known universe would eventually have to be used to make humans ... and it was only a few thousand years away! It doesn't take a genius to do the extrapolation back from that to economics and growth, particularly when one factors in rather more planetorial scales and resources.
And maybe that day isn't so far away! I've been worrying about Peak Oil for a long time (perhaps before it was a fashionable term, I don't know), but I'd never considered peak capital ...
(Last time I visited Peak Everything, it was at Michael's invitation.)
the doi saga continued
Chris made a fairly ascerbic (and fair) comment when I complained about the disappearance of the digital identity of one of my papers. Amongst other things, he found the current(!) doi for it ...
It's still possible I've got the doi wrong, but given the previous doi resolves and the subsequent one as well, it suggests it's just broken. This next example is instructive, same pattern DOI, same vintage, nearly works, yet things are not as they should be:
Right next to that last doi is a link to what Wiley thinks about Dois...
... there is the added problem that ownership of information changes, and location of electronic files changes frequently over the life of a work. Technology is needed that permits an identifier to remain persistent although the links to rights holders may vary with time and place ...
In this case the publishers (Wiley) seem to think that the first part of the doi (before the slash) has to point to them in perpetuity, rather than understanding it's only purpose is to give them the authority to ensure that at assignment, they can assign the right hand part uniquely. Once they've done that, the publisher can move with impunity ... which is the whole point of the system.
Wiley is never going to assign anything in the 10.1256 space, but it has inherited lot. Dear Mr Wiley, YOU DO NOT NEED TO REASSIGN, you're just screwing with the system.
Dear Met Society, you should instruct your publisher to behave!
And now back to my paper ... which seems to have disappeared into the long grass during the reassignment (which sounds rather Orwellian). I guess it does have a new DOI, but on the evidence thus far, I might as well not bother with it either ...
unhinged doi: who ya gonna call?
My publications list has
Lawrence, B.N., 2001: A Gravity Wave Induced Quasi-Biennial Oscillation in a three-dimensional mechanistic model. Q. J. R. Meteorol. Soc., 127, pg(s). 2005-2021. doi:10.1256/smsqj.57607
listed on it.
However, if I dereference the doi, I get:
Content Not Found
Ingenta is no longer hosting these titles of the Royal Meteorological Society. Content transferred to Wiley Interscience effective from 1 February 2007. The journals may now be found at www3.interscience.wiley.com/browse/?type=JOURNAL
Now call me stupid, I thought the whole point of a doi was that the publishers arranged the porting of the urls that the doi points to from one place to another for exactly this situation.
So, one of the following is happening:
I had the wrong doi in my publication list ... and the error message arises from that (wrong message, but fundamentally my problem)
The royal met society and wiley failed to handle the transfer properly, or
The doi system never got updated properly.
If it's either of the latter two, then it brings into disrepute the whole use of DOIs, and both the old publisher (the Royal Met Society) and the new one (Wiley) should share the blame (unless of course behind the scenes one or the other has been pleading for the other to do the right thing)! Of course it's not the only time publishers have got this wrong ...
Catalogues, Shopping Carts and Portals
The NDG (and INSPIRE) vision is that data providers stand up services to expose their data and metadata. We imagine that catalogue services harvest our metadata, and expose it in via catalogue service interfaces. We imagine that portals exploit the catalogue service interfaces to allow users to select data, and exploit the data in visualisation services.
All that's fine, but there's a missing link, I think, if I'm wrong let me know! I think we need a common shopping cart formalism ... (some folk will know I hate the "phrase" shopping cart, but sadly, it does carry most of the correct semantics, as long as you allow it to be used when the contents can be free).
Imagine a world in which there were three data providers (A, B and C), two catalogue services (C1, C2), two portal services (P1, P2), and two visualisation services (V1, V2).
Metadata from A, B, and C would be harvested by both C1 and C2. A user might visit P1 (which might exploit C1), and choose a dataset from A and B because they exposed services understood by V1. So P1 needs to pass to the visualisation service handles to both A and B, and the user may or may not need to interact further directly with V1 (to say, choose specific subsets of the data). In truth, it might be that the subsetting is done in either P1 or V1. Similarly, the user might use P2, select B and C, and want to exploit V1 or V2 ...
We have standard mechanisms for harvesting the metadata to the catalogue services. We have standard mechanisms for portals to talk to catalogue services (we might not like them, but they exist). However, if we want a real service orientated architecture (in a generic sense, no in a WS-* sense) we need some mechanism of handing the context sets to the visualisation services. In an interoperable world, P1 and P2 would use the same format to pass those context sets (user selections of datasets, layers, subsetting etc) to the visualisation services V1 and V2.
In the Web Map Server world, one can use Web Map Context do do this, the WMS Context specification states
... how a specific grouping of one or more maps from one or more map servers can be described in a portable, platform-independent format for storage in a repository or for transmission between clients. This description is known as a "Web Map Context Document," or simply a "Context." Presently, context documents are primarily designed for WMS bindings. However, extensibility is envisioned for binding to other services.
Indeed it goes on to say:
The Context document could be saved from one client session and transferred to a different client application to start up with the same context.
Contexts could be catalogued and discovered, thus providing a level of granularity broader than individual layers.
and so clearly they anticipated the "hijacking" (aka extension) of WMC for exactly the purpose I outlined. The problem of course is that it's just so WMS centric (i.e. it works only for one class of visualisation service); the whole "layer" thing is so redolent of maps that it's difficult to work with from any other perspective. (And it's not great that "Context" is defined in XML schema rather than UML, it makes it more difficult that it needs to be to get at the underlying abstracts rather than the implementation detail).
All that said, the overall Context concept is that of a bucket of XML that can be passed around, with "bucket" level commonality, and "target" level detail, that must be the right idea. It's also right that it's a thing that users might want to park in personal "workspaces" and pass to each other.
Reading in 2009, 10: Careless in Red
There can't be many readers of crime novels who don't eagerly await new Elizabeth George novels. For obvious reasons I wasn't paying attention last year, so we missed the arrival of Careless in Red. However, on my March/April US trip, I clearly passed through an airport bookshop or two, and was very pleased to find a new Lynley mystery.
Like all EG books, this one has many well drawn characters, and a complicated enough plot to keep one guessing most of the way through. Oh to be sure I'd guessed the ultimate outcome early on, but it was one of a spectrum of possible outcomes and she managed to keep all those balls in the air for a long time ... I couldn't be sure. In that sense it was a perfect crime novel, following the scent, with a hunch, but without the evidence. There's a nice twist in the tail ... It's set in Cornwall, and we've got the usual suspects: Lynley and Havers, but for a change there's another senior female police person, who plays a role rather like one might imagine Havers having in a parallel universe (where she's married another copper - Lynley say - and had kids, and it hasn't quite panned out). I liked the juxtaposition of these two, both bossing Lynley around in their different ways. Both being controlled by Lynley, in different ways.
If you like crime novels, and I do, you'll like this one, and I did. This one's a keeper.
Journals and Data Curation
I'm hopelessly behind on my blog reading, otherwise I would have spotted this new Nature policy sooner:
Accordingly, we have modified the Nature journal policy on authorship, which is detailed on our website. For papers submitted by collaborations, we now delineate the responsibilities of the senior members of each collaboration group on the paper. Before submitting the paper, at least one senior member from each collaborating group must take responsibility for their group's contribution. Three major responsibilities are covered: preservation of the original data on which the paper is based, verification that the figures and conclusions accurately reflect the data collected and that manipulations to images are in accordance with Nature journal guidelines, and minimization of obstacles to sharing materials, data and algorithms through appropriate planning.
To echo Chris who was the last link in the chain to get the news to me, this really is good news, both for science, and for those of us trying to convince folk of the importance of data curation.
i had to laugh, part 2
Maybe I should now have a category on programming language humour. Meanwhile, Chris responded to this by digging out an email from 1994 on how to shoot yourself in the foot using a multitude of languages. I googled it, and found a more extensive compendium (especially within the 301 comments) ...
So, for a taste of what lurks at that link (and a multitude of other places on the net), we start with:
You shoot yourself in the foot.
You accidentally create a dozen clones of yourself and shoot them all in the foot. Providing emergency medical assistance is impossible since you can't tell which are bitwise copies and which are just pointing at others and saying, "That's me, over there."
After importing java.awt.right.foot.* and java.awt.gun.right.hand.*, and writing the classes and methods of those classes needed, you've forgotten what the hell you're doing.
... (maybe this also belongs in the category of blog archaeology given the history).
Why is it then we talk to the private sector, they're always impressed with what we can do with the resources available (money, people), yet, when we talk to our scientific colleagues, many of them think we must be amazingly inefficient to cost what we do (in terms of, yep, money and people)?
i had to laugh
kubuntu frustration ... again
Late last year, I wanted to upgrade my kubuntu to 8.10 (intrepid)... but I couldn't because bluetooth was broken in the release.
Now I can't upgrade to 9.04 (jaunty) because wireless is broken.
All this because I really wanted to have openoffice 3.0 ... and better support for usb wireless dongles (because the internal wireless in my laptop is getting dodgy).
Oh well, maybe I will be back to taking macs seriously again ...
Reading in 2009, 9: Under Enemy Colours
Well, I've read them all (I think), from C.S. Forrester to Alexander Kent, via Patrick O'Brian, Dudley Pope and Richard Woodman, even Julian Stockwin - others too. My shelves groan with "wooden walls" fiction; blame my dad for introducing me to Hornblower at an impressionable age.
So when I found Under Enemy Colours by Sean Thomas Russell, I knew that I would like the genre, but would I like the new author? Given my new found impatience with books, could I bothered with this one, would Russell have a new angle?
Short answer: Yes.
Longer answer: I thoroughly enjoyed this book, up there amongst the best of it's genre already. I liked most of the characters (although some were pretty thinly drawn), and yes, there was a new twist: a British naval character, Charles Hayden with split French/English parentage and a degree of personal confusion. I hope I'm going to find out more of this man's life and career (seems likely, most everyone who creates a character in this period/genre ends up with a series). A fair plot, albeit a tad predictable, and some (for this landlubber) pretty well described naval manoeuvres.
Just in case you're wondering why the blogging has dried up yet again (from its already droughty state): I'm half way through a seriously silly week. Monday: Brussels, Tuesday, Wednesday: Lancaster, Thursday, Friday: Paris. I seem to be in a maelstrom of meetings and deadlines that never quite ends. This week it's just a bit more obvious because I'm out of town, and I'm generating a huge number of train miles/hours: I will have been travelling for roughly 28 hours this week (although that time is inclusive of trips to and from stations, and underground etc, it still leaves in excess of 15 hours for real work done using my trusty laptop, albeit with a newly dodgy wireless connection).
So, Monday was the year one metafor review meeting. You can download the talks if you want to. It went well. The main points of feedback were to use more of a model driven architecture (!), and to widen the net of model contributers and metafor users.
Yesterday I did some work to prepare for today and thursday/friday, and travelled to Lancaster. Here we're concentrating on MOLES, running through the new version with folks representing the diversity of NERC's environmental science.
Tomorrow and Friday, it's back to Metafor, working on the semantic structure and software implementation of a questionnaire that can capture key aspects (scientific descriptions) of the models being used in the upcoming CMIP5 model intercomparison project.
lowering my co2 emissions one step more
Our blue 1987 Megane catastrophically failed an MOT a few weeks ago (which means that it's hors de combat, we can't drive it on public roads, not get it insured). So it's off to the knackers yard ... (yes, we really did have two Renault Meganes, albeit very different models, and yes, we do need two cars given where we live and work) ...
... so we've just bought a year old Seat Ibiza ecomotive. So that's two cars we've bought where a primary discriminator between cars has been the expected CO2 emissions per km. You may recall that last time we upgraded a car (also due to knackeraciousness of the previous car) we were pleased to significantly decrease our emissions ... well this time, the difference is even greater. The ecomotive official extra urban figure is 88 mpg!! (Mind you, I think our red Megane has an official figure of 69 mpg, and we get more like 55 ... so if we get 70 with the ecomotive, I think we can be pleased).
From a life cycle point of view, the ecomotive really stands out, currently third on the supermini category at whatgreencar.com, and seventh overall if you ignore electric cars (for now). (As an aside, I think whatgreencar.com is a great site, even if it's not that firefox friendly, and together with the fuel figures at vcacarfueldata.org.uk should be an essential port of call in pre-car-buying research.)
The only annoying thing about this car is that it doesn't have a trip computer telling us our fuel consumption, and as I remarked in 2006, I think one of those does influence driving style (and fuel consumption) significantly. Not that I care much what Jeremy Clarkson thinks, but it's relatively fun to drive too ... so you can have some cake and eat it too!
Reading in 2009, 6,7,8: The Language of the Stones Trilogy
These three books are the literary equivalent of "quaffing wine", that is they're inoffensive users of time which don't leave a memorable taste. I suppose one can't drink (or read) the best stuff all of the time.
The basic idea of the books is quite a good one (how, and to some extent, why, the wars of the roses play out in a parallel earth where there is still magic). I should note that while the books all carry a prominent subtitle, "the third coming of Arthur", I think that's a bit of a marketing ploy: the Arthurian link in the concept is fairly weak, and doesn't really justify the subtitles.
Given that I think the idea's a good one, why is it "just" a quaffing sort of a book? Primarily because the books meander along with many uninspiring passages of weak prose. Nonetheless, I was interested enough to meander along with Will (the main character) all the way through the three books (I guess that must have been 1500 pages in all). If you have the time, and any sort of "fantasy" bent, then you'll probably be happy to read these books. If you're time pressed and/or fantasy isn't one of your genres, then there are other books to read first ...
Having read the 1500 pages of this lot, I'm left thinking that if Readers Digest got hold of these books, they could probably condense the three books down into one good one: the concept, some of the events, and the characters could have made a really cracking book if it had had some pace and tight prose.
In a vague attempt at some even handedness in my 2009 book reviews, having read this lot, I'm minded to compare them with my criticism of The Pillars of the Earth. While I thought that book was too long, and I claimed not be enthralled, compared to these three, Pillars is vastly superior in pace and prose, so perhaps I was just a wee bit more enthralled reading that than I thought at the end.
These are strange times we live in ...
Oracle and Sun ...
Tim Bray's analysis (I originally called this link a "nice" analysis, but somehow it seemed wrong to use the word "nice" given the topic).
In truth I'm not much worried about the impact on MySQL itself (we use postgres), but it is, as many have said, a poster child for the open source movement. Some of the commentators that Tim links to mention the fate of Berkely-DB since Oracle got it. Sadly, I think that's the future that awaits MySQL: slow decline by organisational neglect - mind you, perhaps that's no different than slow decline in affordability by increasingly aggressive renewal cost increases.
The immediate potential impact that worries me most is upon OpenOffice ... which I've begun using as my default choice for most things (over MS-Office), primarily because, gasp, it's better (and it obviously produces xml that could be, in extremis, consumable - unlike the MS-Office XML mess). I hope that Oracle, who have no track record in this space, can maintain the momentum (both in the free version and the commercial version).
by Bryan Lawrence : 2009/04/23 : 0 trackbacks : 0 comments (permalink)
WCS is dead, long live WFS
For many years Steve Hankin has been asking me why we want WCS when OPeNDAP has similar functionality, and many, many, working implementations. For just as many years I've argued that OPeNDAP has/had three major flaws:
It wasn't easily securable (soon not to be a problem),
Didn't have good relationship with metadata, and in particular,
Was all about syntax, not semantics - you subset by array index, not the desired portion of a semantically described domain (e.g give me array elements 4-6 compared with give me the array elements which lie between latitudes 40 to 60 degrees).
but I've also admitted that WCS had some flaws too:
It might be easier to secure, but only because it (might) be easier to implement your own stack ...
Nearly no working implementations.
You would have thought the latter would be a show stopper, and indeed it is, but the first flaw for OPenDAP has also been a show stopper ... until now. So, we are going to deploy OPenDAP soon ... but we still want to deploy something which addresses semantic subsetting as well. So we've been investing in WCS ...
... but today I heard a presentation that filled me with horror. Very well presented, but still horrific. The plan for the future evolution of WCS is so flawed that I can't see it surviving!
Fortunately, the talk on WCS was followed by one on WFS (declaration: from someone in my group) which outlined how WFS can deliver much of the same functionality as WCS. It remains to be seen whether it can deliver a semantic version of OPenDAP ... I hope so, in which case it'll be "WCS is dead, long live WFS".
So predictions, should either of them read this:
Rob Atkinson will smile: He's been claiming for a while that WCS was, or would be, only a convenience API to a WFS!
Steve Hankin will role his eyes, and think: "oh no, not another WXS ..."
by Bryan Lawrence : 2009/04/23 : 0 trackbacks : 3 comments (permalink)
Tales from the EGU
Vignettes from EGU
None (for now). I forgot the cable for my compact camera!
Vienna is a lovely city, and the Austria Centre is the best venue I've seen for a conference of this size.
EGU is mired in the past: halls with literally thousands of posters. However that many, coupled with a "no photography of scientific material" rule makes little sense. Not only do I not approve of the rule, I think it's counter productive: posters simply don't reach the audiences they deserve. While allowing photography wont solve that in the way an electronic archive of the posters could, it would at least mean that some posters which folk found interesting enough to find, but didn't have time to consume, would be digestable later.
Session chairs still don't seem to understand the importance of staying on schedule: if someone doesn't show, don't advance the talks, you just piss of those of us who have tried to generate a cross-session schedule.
The wireless network(s) and proxy server(s) were/are not up to the load! In this day and age that's unacceptable ...
... and neither is having about one power socket per 100 people. (Both of which factors are why I'm not live blogging. All kudos to Steve.)
The folks who went to the trouble of developing a java application that talked to a sqllite database of abstracts and giving it to all the attendees might have considered making it more easily available on a linux platfom! Why bother with an interoperable platform and then bury it in O/S specific launchables?
Stephan Ramsdorf's presentation on thermohaline stability was thought provoking.
The prevalence of model intercomparison projects demonstates both the scientific utility of the approach and the necessity to find more ways of rewarding those whose careers are devoted to model development and model integration as opposed to scientific interpretation.
Downscaling, both dynamic and statistical, are both bigger business than I had appreciated. (It's a long time since I've come to such a broad spectrum conference so this.)
Seasonal predictability appears to be just as dependent on the physical parameterisations of convection and land surfaces as we expect longer term climate to be.
Comments off the record:
If you want to have a highly cited paper analysing the CMIP5 database, make sure you publish in a journal which moves fast, not necessarily one with the highest impact factor.
State of mind after three days of five days?
Pleasantly surprised. I've been a pretty trenchant critic of conferences this size. I think I might have been wrong. The networking opportunities are outstanding and the opportunity to get educated across a breadth of things, simply excellent (particularly for someone in my role)!
by Bryan Lawrence : 2009/04/22 : 0 trackbacks : 0 comments (permalink)
There are loads of depictions of the scientific method out there ...
This is one I've just prepared for a presentation next week. I thought it was worth sharing as is ... the key difference is in this one I wanted to highlight the role of existing data in the process, and the necessity of archiving and annotating product data. In anticipation of obvious objections: of course we don't archive everything, but the point here is that we need to consider archival and annotation as a formal part of the method.
Newark Liberty International Airport
Avoid it if you're travelling internationally.
Inbound, the immimgration folk are the slowest I've met in years of travelling into "fortress America".
Outbound, the security folk are the most officious and nasty of any airport I've ever been to (imagine travelling with a three year old, who having gone through the scanner wearing a fleece is "offered" a choice of having to go back and take the fleece off or have a body search, even though the the scanner didn't ping). That after ensuring that all the families with young children are in the longest queue possible. Then, given the time one has to spend "airside" of security, the food and reading options there are terrible ...
If Newark is your only experience of flying in or out of the US, you may not want to come back. Fortunately it's not my only experience of the US.
by Bryan Lawrence : 2009/04/07 (permalink)
Reading 2009, 5: Pillars need shortening
Well, it's some time since I caught up with my reading reports. Many weeks ago I waded through Ken Follett's The Pillars of the Earth. I say wade in both a positive and negative sense: it's a long shallow book, with lots of good bits and a fair bit of meandering, rather like wading through a river via warm pools interspersed with cold.
I first read this book a long long time ago, and liked it enough then to keep a copy. However, the copy I read this time was yet another Christmas present, and this read was probably enough to convince me that my groaning shelves can probably do without either copy. Don't get me wrong, this is a pretty good book, but it's not a monumental book, and the issue for me now is not the quality, but the length: I don't have the patience and time for 1000 page books that I had anon.That said, when I finished it, I was way happier with having done so than the previous book. If you're reading this to decide whether to read it yourself, then I suspect if you're looking to read "literature", finding a book with less characters and a smaller canvas could be your thing. But if you want medieval entertainment (without ever quite being enthralled), for night after night, then go for it.
the publication ecosystem
Cameron Neylon drew my attention to an absolutely fascinating set of figures that appeared in a Research Information network paper last year (pdf). You should read Cameron's post, it's interesting in it's own right for anyone interested in the place of peer review in the firmament. However, what I wanted to here was abstract some rather interesting numbers from the report. They all appear here:
This figure requires a bit of explanation, which you're not going to get here, aside to say the numbers are an estimates of the totals spent globally in the entire process of scholarly communication and research. Note that the cost of "reading" is split out from the cost of "doing research", if you really wanted to know the cost of research itself you might add those two figures together. Amongst the interesting conclusions that we can draw is that the cost of the publication machinery, from submission to consumption via the eyeballs, is roughly 1/7 of the total costs of the global research ecosystem - and half of that is the cost of searching and printing out the resulting documents. Deep in the model, we find an assumption that it takes 12.5 minutes search time per article and that dominates the search/print cost. I'm not quite sure where that number comes from, but it's very plausible to me. Another interesting number buried in the publishing cost is 2 billion on peer review, i.e. peer review is of the order of 1-2% of the total cost of research.
It's not a great leap to wonder if those large figures on discovery and reading are the consequences of "publish or perish" coupled to the "smallest publishable unit" leading to far to much crap literature to wade through in many disciplines. (However, to be fair, as one who always used to argue to my graduate students that one hasn't "done anything" until one has published it, simply bashing the amount of material out there isn't the whole story.)
Anyway this piece of work prompted me to wonder if anyone has actually quantified what proportion of research time/effort/budget is spent dealing with data handling? I've heard lots of anecdotes, and I've created a few guesstimates myself, but I wonder if anything half as authorative as this report has been done?1 If so, then the obvious question to ask would be whether those numbers would support more or less of a professional data handling infrastructure?
The initial thrust of the day was the presentation of a plan to deliver the first steps of a national research data service, followed by some stakeholder perspectives, and then funder perspectives. Sadly, I thought the proposal was seriously flawed, being based on a naive expectation of what could and should be achieved. I found little dissent from those I spoke with in the margins (or indeed, from those who spoke from the floor during the day). There were one or two individuals who saw this as an opportunity that needed to be siezed, despite the flaws in the proposal, but mostly I heard practical objections, followed by rational arguments about what might be done - and total agreement that something should be done. But what?
Firstly, scope. What's in a name? Are we trying to build a national research data infrastructure, or a research data service? Is the concept of curation relevant? In my mind, having a clear view on the distinction between these concepts is crucial to working out what can be achieved. To help with what I mean here, I'm thinking
of an infrastructure as being composed of many services and facilities. So, if we need a national infrastructure, what "services" need to be national? What facilities deliver which services?
of a distinction between helping with storage, management and use of data in the near term (data facilitation), and into the medium and longer term future (data curation).
(It certainly seemed like some of the folks at this meeting were thinking curation while talking about facilitation and others vice versa, something that became particularly clear during the video presentation about the Australian National Data Service - of which another day, if I find the energy and time!)
Who are the stakeholders and players in these activities? I might summarise them as:
some of whom are in universities, some of whom are not, who sometimes appear as producers, and sometimes as consumers,
some of which are universities, some of which are not, who might teach, or run libraries and other facilities which may or may not have a role to play,
existing data facilities which have trans-institutional (and often trans-national) mandates,
advisors (of good practice),
assessors (of good practice),
Of those, I think of both the research and non-research consumers as users, and all the rest are players in the research infrastructure. In particular, the funders are as much players as those who deliver storage facilities! Without the funders being engaged to the point of mandating and rewarding appropriate levels of engagement by the other players, then some of the goals one might aspire to are simply not feasible. Similarly, the institutions that employ and educate the producers of data are just as much players.
For all their flaws, I think the UKRDS proposers understood the stakeholder and player relationships to a degree. Where it all went wrong, I think, was their understanding of a) the distinction between data and documents, and b) their understanding of the nature, discipline dependence, and fundamental importance of data management plans, as well as c) how one would need to balance the desire to facilitate data reuse against the desire to collect new data and to carry out new research.
Taking these in turn:
data are not documents. There are no common understandings of how to format data, how to store data, and how to document them, and neither should there be1. The consequences of this heterogeneity are profound, not least because such heterogeneity means that in any given institution, there can be many more formats and facilitation requirements than there can be experts in those formats and requirements.
data management plans are discipline and data dependent, reflecting varying importance of varying maturities of data, and the presence or absence of overarching agreements (e.g collaboration agreements etc). Again, while any given institution can have expertise in the concepts of data management plans, the construction and relevance of the plans are likely to be trans-institutional more often than not.
the choice about relative importance of facilitation, curation, and new research are discipline dependent. Yes, it might well be that if we don't look after old data it will be lost, but the decision as to whether that matters in the context of the available funds can only really be made within the scope of the research priorities and expectations
I appreciate that teaching institutions may have other reasons for curating data, but those reasons need to be assessed against other possible ways of using the effort.
All these issues boil down to the fact, that, on the national scale, data facilitation and curation are not overheads, that is activities that can occur by right. Within NERC, we have multiple data centres, precisely because different components of NERC make the judgments in different ways. It's important to note however that this is not an argument that NO component of the infrastructure can be delivered via the mechanism of overheads on research (which might be regularly reviewed), but it is argument that the entire procedure should not (particularly if some parts of the community who already fund data management see an overhead charged on them to do data management in other communities - something that is a big risk with the UKRDS proposal). So, whatever one does for a national infrastructure has to deal with heterogenity on all fronts: from funding mechanisms, to facilities and objectives!
If we return to the NERC example, we have an overarching policy and strategy2, but the implementation in sub-discipline dependent. Can we scale any of the lessons out from the NERC experience nationally? Yes I think so3.
The lessons are simple:
We do need to get to the data producers while they are still producing the data. But doing so is incredibly time and human intensive. So we only want to get to some data producers some of the time. The key role of the data management plan is to identify who and what gets attention and when.
Distinguish between generic data management plans, which might essentially say, no we aren't going to do various levels of data management, and specific plans that will need to engage professionals.
By all means preserve digital data that you haven't invested in ingesting professionally, but don't kid yourselves that you'll be able to use it years later. If you don't invest in professional ingestion (ie. time consuming metadata production - whether manual or via software systems - and quality control), then the data may or may not be recoverable down the track, but if is, it'll be mightily expensive to do so.
But it's a balance, preserving everything to a recoverable standard is mighty expensive too. One of the roles of DMPs is to take an educated punt on what we can risk not doing!
Be very clear about the goals, what is desirable, and what is not, and allow the balance to vary between disciplines and within institutions.
Reward data management, and by this I don't mean the professionals alone, I mean the academics who do their bit. That's a key role for both institutions and funders!
Also a key role for publishers and data journals! I can't stress enough that I think the only way that digital curatoin will ever really take off is when it's realised that the act of curating data (including documentation) is as much a part of scientific discourse as journal articles.
But do assess the quality of the professionals too, and invest in their training and skills, and facilitate interdisciplinary data management knowledge transfer.
So what would a national infrastructure that did that look like? Well, we'll have to save that for another day ...
I don't know what it says, because I'm not about to pay $18 to read it, but I assume I'd like what he wrote ...
Strongly Defend Weakly Held Positions
I've just been at a NERC data management workshop. I may well blog some more about it, but one thing I spent a lot of time repeating to lots of people was one of Bob Sutton's mantras: "Strongly Defend Weakly Held Positions".
There were (at least) two strands at the meeting where this mantra is particularly important, both in the context that we (the data management community, like everyone else) are trying to find our way forward in uncertain technical times - we know the science drivers are demanding interoperability, but doing so without strong constraints on what that actually means:
As Chris Rusbridge pointed out, integrated science (a loosely defined phrase but which at least points to why interoperabilty matters) means that scientists will be using unfamiliar data, therefore someone (data curators and managers) must make data available for unfamiliar users. This means we're often struggling to work out what needs to be communicated, how it can be communicated, and how to get data into differing toolsets, which brings us to
The technical options available to support interoperabilility are evolving rapidly, funding is limited, and while potential usage is effectively infinite, actual usage will drive us forward.
Both of these lead to the necessity for a strategy which simultaneously makes progress while yielding ground when progress via a particular route is overtaken by progress via another route. However, we need to actually make progress, and it's incumbent on those of us who have an overview of the options (scientific, social and technical) to provide leadership in terms of directions - and strongly defend those choices. But it's just as incumbent on us to yield quickly and gracefully when we (inevitably) make some wrong calls. It's also incumbent on us, individually and collectively, to recognise that sometimes it's the mistakes we make - even the expensive and time consuming ones - that tell us how we should have done things, and sometimes doing something wrong is the only way to find out how to do it right!
As an aside, just in case I don't follow up with more about the conference (and let's face it, this blog is a bit stop and start), Chris's point about unfamiliar users lead him to introduce to us the "fourth Rumsfield": the unknown knowns. I rather like to think that leads to part of the job description for data management:
We exist to mitigate against the unknown knowns associated with the collection (or production) of data and it's usage becoming known unknowns!
For some reason both bloglines and technorati seem to have turned up their noses at my blog. I wouldn't mind, but I kind of rely on them to find if there are any inbound links to my blog (and thus anything worth pursuing). There aint much point asking the lazy web a question if I can't find any links pointing to the question ...
Tecnorati is stuck claiming my latest post is from the 6th of October last year (despite the little preview graphic actually being current and a ping from yesterday).
Bloglines has a post from November the 14th, but amusingly finds a citation to a january post when I look for posts, but doesn't find that post when I look for citations.
Google blog reader finds my blog ok, and google web search finds stuff in my blog, but not google blog search. What to make of that?
Oh well. I guess I shouldn't be surprised, there's never been much overlap in what they found as citations, so the fact my posts themselves have disappared into obscurity snouldn't surprise me.
by Bryan Lawrence : 2009/02/03 : 0 trackbacks : 2 comments (permalink)
NDG papers appear in Phil Trans
I'm really glad to see that our papers on
environmental metadata requirements of grids (really service orientated architectures), and
describing the deployment of NDG
have appeared in the NERC e-science special edition of the Philosophical Transactions of the Royal Society.
Reading in 2009, 4: Engleby not for me
I have a t-shirt that reads "too many books, too little time".
Every now and then I read a book and think, "with all the books in the world, why did I bother with this one?" Despite the amount of trash I read, this happens relatively rarely, if a book is entertaining, or, interesting, or both, then I'm usually a happy bunny, and I don't set the bar high.
And so to Sebatian Faulks' Engleby. Half way through I seriously considered not finishing it. It starts well enough, but it doesn't take long to suss the plot, and despite the odd passage of quality prose, mostly the mood it builds up is tedium, and a feeling of "when is this thing going to end" instead of the any of the feelings I would rather have had, like "how is this going to end" (frankly I didn't care) or "give me more" (please no more) ... or "how time has flown while I've been reading" (nope). So, sadly, this was a "I wish I never started it book". Having started a book, it has to be really bad for me not to finish it, and this wasn't really bad ...
If you ever make the mistake of reading it (all the way through), you'll realise that tedium may well have been exactly what Faulks wanted you to feel, but being true to your plot while boring your readers is something you can only get away with when you have a big reputation.
So, I'm sure some folk will like it (a quick look at Amazon seems to imply that lots of people liked it!). But not me. I wish I had spent the time reading something else.
(I hope that when I grow up I'll learn to not finish books I'm not enjoying!)
when the paparazzi are ok
I stumbled across this a week or two ago, and have had it sitting tabbed waiting for a response since, because it really got my goat!
The basic thesis of the writer is that it's a bad idea for someone to wander round a poster session at a (scientific) conference, snapping away at the posters using a camera. Leaving aside the issue that I thought I was the first person to think of doing this with a camera phone (obviously not), like I say, it got my goat!
His justification of his position comes down to the fact that he sees "taking" (information) without "giving" (feedback) as not keeping up with the takers part of a two-way process. He's also worried about what he calls "espionage", and data getting discussed before it's peer reviewed.
Firstly, as to the taking without giving: In some communities, presenting is the price of attendance, the feedback is incidental. In all communities only a tiny percentage of attendees ever give feedback. Does not giving feedback mean I can't/shouldn't listen (to a talk)? Can't read (a poster)? Given how much it costs (in time, money, and emissions) to go to a conference shouldn't we make damn sure we get as much as possible. I could never engage with (or often even read) all the posters at many conferences.
As to the "discussion" before peer review. What's the point of putting an idea into the community if you don't want it to be discussed? (Risk of the data being analysed by someone else I hear him respond? They can't publish it without credible provenance, so what's the issue, the idea was out the moment you took it to a conference?)
Finally, in my opinion, the best conferences make sure the posters (and the presentations if you're really lucky) are on a memory stick and/or in a repository, so I can have access later. If they don't do that, how is not better for me to at least take a copy so I can read it later?
Science is about communication. Anything that hinders communication hinders science. Attribution is important, but a camera copy of a paper doesn't make attribution any less likely than publication, in fact, if we extrapolate from the open access experience, it's likely to make attribution more likely.
by Bryan Lawrence : 2009/01/30 : 0 trackbacks : 0 comments (permalink)
bbc goes to rdf
And not only that, they built their domain model first then built an RDF ontology:
We set about converting our programmes domain model into an RDF ontology which we've since published under a Creative Commons License (www.bbc.co.uk/ontologies/programmes/). Which took one person about a week. The trick here isn't the RDF mapping - it's having a well thought through and well expressed domain model. And if you're serious about building web sites that's something you need anyway.
Someone once said to me that RDF wasn't big out there. Well I knew it was, and maybe he will believe me now!
Reading in 2009, 2: Water Supply
And so to "When the Rivers Run Dry" by Fred Pearce. Which is about what it says on the tin ...
Another apocalyptic read (I'm not in an apocalyptic mood, it just happened that I got two birthday presents last year in the same vein). This book reads well, but it's another one that could get you breaking out the whiskey before the sun gets over the yardarm. It's absolutely not a book about global warming! Although global warming gets a few mentions, it's a book primarily about good intentions going bad coupled with bungled engineering and short term thinking. It is scary precisely because it would appear we're stuffed on the water front before we even get to the implications of warming ...
There are some really fascinating bits in this book, the state of the Aral Sea for example, I guess I vaguely knew what had been going on, but the detail presented in this book is scary, not just because of what has happened, but it (the dry up) was planned that way (and despite all the planning, the resulting water for "use" is being frittered away).
Here are a few bits that I noted (for my own nefarious purposes, not because they were necessarily the most important or most interesting ...). All the numbers (except where stated) are from the book, I don't know what the original sources might have been.
Not enough water in the first place
A back of the envelope calculation (p33-35) of water availability goes something like this: we will run out of water unless we only use the water that falls as rain (somewhere), that is from the "fast water cycle". In practise we only care about that which falls on land (60K cubic km per annum). If we neglect that which evaporates, and that which is transpired (hmm, I'll get back to that), that leaves about 40K cubic km of runoff for "consumption. Of that hydrologists reckon it's practical to "capture" 14K (why?). Take out the runoff in inaccessible places (like Siberia), and we're left with 9K, or about 1400 cubic m per annum per person. But earlier on (p22) he's calculated that he himself consumes around 1500-2000 cubic m per annum in terms of water needed to feed and cloth him (as well as that directly consumed which is far less). So the bottom line is that if everyone wants to live like him, then there's a problem.
But we neglected the transpiration earlier on, and surely that's part of the water consumed to feed him? So I'm not so sure about the budget. However, whether or not he's got the budget details right, the actual efficiencies (or lack thereof) of actual hydrological systems that he discusses throughout the book make it clear that we have a major problem, and we're eating into water from the "slow water cycle" (deep acquifers etc, which are slowly, but surely, being drained).
We can feed them, but can we water them?
(p38) The UN FAO says that globally we now grow twice as much food as we did a generation ago, but we abstract three times as much water from rivers and acquifers to do so.
Dam them all
As a kiwi I both appreciate(d) the benefits of hydro power and mourn(ed) the losses from flooding ... but I always thought of dams as being a Good Thing (TM). However, it appears that it's not always that way: A World Commission on Dams (appointed by the World Bank) made some interesting observations in 2000 (p157-159):
Two thirds of all damns built globally for water supply to cities deliver less than planned (a quarter less than half)
A quarter of dams built to irrigate fields irrigated less than a third of the land intended.
Half produced significantly less power than advertised
an interesting number is the number of kw/flooded hectare: ranging from 0.2 to 5 for some examples he gives. I thought about that a bit: an interesting comparison is that a tenth of the area could provide between one and four times the same energy for the best of these (at 5-20 W/sq m - this last number from the synopsis - pdf - of a new book I want to read).
Even dams built to protect against flooding have increased vulnerabilities (because they're generally kept full and "emergency releases" are floods in their own right).
Dams have resulted in at least 80 million rural folks losing homes, lands and livelihoods!
Many have been poorly sited, often on the basis of faulty estimates of climatic flow (even in wealthy countries like the States: consider the poor future for the Colorado, and Lake Powell in particular - p223 and more recently).
and that's without considering silting and wetland removal etc
Water from thin air
On the positive side! Chapter 31 discusses technologies for "generating" water.
The discussion of water budgets above was about precipitated water. Of course at any given time a lot of water is sitting in the atmosphere as water vapour - roughly 98% of the 13K cubic km in the atmosphere at any one time (about six times the amount in the worlds rivers - again, at any one time).
There is a discussion of dew ponds, and artificial dew producers (using cold ocean water to cause condensation in the desert), and fog capture. Inspiring stuff. (He also talks about desalination and cloud seeding, both of which are rather less inspiring! Even if the former is widely deployed and/or necessary in some places, it's too energy intensive to be a "solution" to the global water issue.)
The bottom line
Actually, just like climate change, the water problem is not just a supply side problem, it's a demand problem too. In the final analysis, we need to drop demand, as well as address changing modes of supply. As far as the latter is concerned, there will be no one solution.
If I got one take home thread from this book it would be that there is a dire need for rational politicians and (more) sensible water managment practises, coupled with geographically realistic assessments of crop suitability. (And on the demand side, less cotton production - and consumption.)
Reading in 2009, 3: Miss Smilla
We've just spent a week on holiday in Cornwall, where, apart from the IPCC tomes (mostly unopened) and Obama souvenier edition newspapers (mostly disguarded unread), my holiday reading was a reread of Peter Hoeg's Miss Smilla's Feeling for Snow.
I first read this about a decade ago, and had dug it out for someone else to read because we'd been talking about excellent translations (and translators). There is an interesting story (pdf) about the translation itself - there being substantial differences between the US and UK editions. Anyway, the book came back, and languished on the floor of the car until a couple of weeks ago, when I was stuck for half an hour in the car, and so I started to read it again ...
Of course a decade was plenty long enough for me to have forgotten the entire plot so it was effectively a fresh read.
The first thing to say about this book is that despite the Danish orgins, even in English, the prose is just fabulous! Mostly I don't care for "fabulous" prose ... when I'm reading a novel I just want a direct connect from text to my brain that doesn't have me realising that I'm actually reading at all ... fancy language gets in the way of that (for me). But this book is different. I can't pull a sentence, or paragraph for you, because I somehow managed to read it in my normal way (ie without being conscious of actually reading), but I still have a sense of joy from the process of reading it. It was clearly wonderful in the original Danish, so all Kudos to the translator(s).
The story itself is a pretty good thriller, with an engaging, resourceful lead character (Miss Smilla) who manages to segue in a nearly believable way from one scarcely survivable event to another. I enjoyed the first half more, as the believability/survivability function fell significantly in the second half, but for all that, it was a good read as a thriller. A majorly unbelievable bit was the reason for it all, but since that only became clear right at the denouement (which, as wikipedia puts it, is "unresolved"), it mattered not.
There is more to it than the thriller and the prose, the glimpses of Denmark and Innuit culture and society and their relationships, with each other and Smilla, weave throughout and give it real character.
Miss Smilla's Feeling for Snow is a seriously good read on many levels. If you even half like thrillers, read it!
Well, I haven't been to a major conference for a while, and I received a raft of invitations to give talks at EGU this year.
So, with colleagues, we have a raft of abstracts submitted:
SD cards to the rescue
Next generation storage (press release pdf):
The next-generation SDXC (eXtended Capacity) memory card specification, pending release in Q1 2009, dramatically improves consumers' digital lifestyles by increasing storage capacity from 32 GB up to 2 TB and increasing SD interface read/write speeds up to 104 MB per second in 2009 with a road map to 300 MB per second. SDXC will provide more portable storage and speed, which are often required to support new features in consumer electronic devices and mobile phones.
Never mind the electronic devices and mobile phones, my data centre will scale to petabytes without issues associated with air conditioning, pwer consumption and physical volume!
It also removes another worry for me. In 2009 we expect to add between 500 TB and 1 PB of new physical storage (on spinning disk). This is a rather large perturbation to our normal growth, and I had been worried about how we would replace it in four years time. If consumer electronics does what it normally does, then in 2012-2013 we'll be replacing a room full of spinning disk with a rack full of SDXC cards ...
The faster bus speeds in the SDXC specification also will benefit SDHC, Embedded SD and SDIO specifications.
and scientific data analysis!
Hat tip (of all places) the online photographer!
Heat not Drought
Just as my night time reading is all about drought (I'll tell you about that another day), I find this fascinating paper in this weeks Science:
Battisti, David S. and Rosamond L. Naylor, Historical Warnings of Future Food Insecurity with Unprecedented Seasonal Heat, Science (2009)
The bottom line is that heat waves may be more important than droughts for some food production. They give the example of a major perturbation on wheat production and consequential world wheat prices arising from the hot, dry 1972 summer in the Ukraine and Russia. They then point out that while that summer ranked in the top ten percent of temperature anomalies between 1900 and 2006 (with temperatures 2-4C above the long term mean), one third of the summers in the observation period were drier!
They then use observational data and simulations from 23 global climate models to show a high probability (>90%) that growing season temperatures in the tropics and subtropics by the end of the 21st century will exceed the most extreme seasonal temperatures recorded from 1900 to 2006. The consequences for food production are extreme!
A couple of choice quotes:
... regional disruptions can easily become global in character. Countries often respond to production and price volatility by restricting trade or pursuing large grain purchases in international markets?both of which can have destabilizing effects on world prices and global food security. In the future, heat stress on crops and livestock will occur in an environment of steadily rising demand for food and animal feed worldwide, making markets more vulnerable to sharp price swings.
... with growing season temperatures in excess of the hottest years on record for many countries, the stress on crops and livestock will become global in character. It will be extremely difficult to balance food deficits in one part of the world with food surpluses in another, unless major adaptation investments are made soon to develop crop varieties that are tolerant to heat and heat-induced water stress and irrigation systems suitable for diverse agroecosystems. The genetics, genomics, breeding, management, and engineering capacity for such adaptation can be developed globally but will be costly and will require political prioritization ...
Two degrees of Warming
William asked what ML thought would happen with two degrees. I suspect the reason he asked that is that most of us believe that two degrees is in the pipeline, and pretty much inescapable now. Indeed, I reckon we'll see it (wrt 1960) within a few decades (wrt now).
Ideally folk should go read the book, but this is the gist of the one and two degree chapters - via section titles (and my parenthetic summary):
America's Slumbering Desert (droughts, soil loss etc)
(An aside on the fact that the Day After Tomorrow hasn't and isn't likely to happen)
Africa's Shining Mountain (fairwell glaciers on kilimanjaro, implications for water)
Ghost Rivers of the Sahara (greening the Sahara, yes, no, maybe, floods and droughts).
The Arctic Meltdown Begins (tipping points for ice, permafrost melt, drying)
Danger in the Alps (mountains and villages at risk of destruction as permafrost melts)
Queenslands Frogs Boil (dramatic biodiversity loss, in rainforests and reefs, in Queensland and elsewhere)
Hurricane Warnings in the South Atlantic (are hurricane characteristics changing?)
Sinking Atolls (bye bye Tuvalu, Kiribati etc)
China's Thirsty Cities (water shortages)
Acidic Oceans (real problems for phytoplankton, and thus everything)
Mercury Rise in Europe (i.e. more heat waves)
Mediteranean Sunburn (fires and drought)
The coral and the icecap (sea level rise beyond the IPCC predictions)
Last stand of the polar bear (arctic melting)
Indian summer (food production decline, water issues)
Peru's melting point (glacier melt leading to water shortage)
Sun and Snow in California (water crisis)
Feeding the Eight Billion (ups and downs in food production, net down)
Silent Summer (climate change too quick for ecosystems, mass extinctions)
There is clearly much more. Please read the book, even if you want to disagree with a few of the details! Obviously ML is cherry picking the literature, but there is much more out there, and I don't think it's unrepresentative!
Some of this echoes what I've always said: In the near future, the climate change risks are not about magnitude and human comfort (hot, cold), they're about climate change speed, and it's implications for ecosystems, water, and food.
curation and specification
Chris Rusbridge has a couple of interesting posts (original and followup) about specification and curation. The gist is that he's reporting some conversations, which I might baldly summarise as saying something like:
Any application stores data and renders it for consumption (by a human or another layer of software). In the best of possible worlds, a specification for the data structure AND the application functionality should be enough to ensure that a third party could render the data for consumption at the next level without reference to the application code itself. However, certain real world experience suggests that the specifications are not enough, you need the code as well, because real implementors break their specifications.
There was some discussion about the ability of OOXML and ODT to preserve semantic content preferentially over latex and PDF ... (primarily I think because some of the key semantic content would have been in figures which could be parsed for logical content, and both latex and pdf would have turned those figures into images).
As a consequence, Chris gets to this position:
So ... running code is better than specs as representation information, and Open Source running code is better than proprietary running code. And, even if you migrate on ingest, keep BOTH the rich format and a desiccated format (like PDF/A). It won't cost you much and may win you some silent thanks from your eventual users!
You'll not be surprised to find I have some opinions on this ...
I think in nearly all the cases where the specification is not enough, it's because the specification was a) not designed for interoperabilty, and b) was not the definition of the format and functionality. In these cases we find the spec is an a postiori attempt to document the format (almost certianly the problem with the Microsoft and Postscript examples discussed in the links). In particular, in those cases where we're dealing with preserving information from a format and specification from one vendor, we find both a) and b) violated, nearly all the time. What that says to me is that we should avoid trying to curate information which is in vendor-specific formats in favour of those where there are multiple (preferably open-source) implementations.
Running code will become non-running code in time, and not much time at that. What I hope Chris means, is keep the source code which ran the application. Even then, every software engineer knows that the code is not documentation, and that with sufficiently complex code, NO ONE will understand it. So, code without specfication is a candidate for obsolescence and eventual residence in the canon of "not translated, not understood, not useful" write only read never (WORN) archives.
What do we do at BADC? (In principle!)
Preserve input data. Copy on ingest if we have to, but we prefer (for the data itself) to demand that the supplier reformat into a format which does conform to a) designed for interoperability, and b) where there is a specification which preserves enough of the information content. (Duplicates of TB of data are not viable).
Preserve input documentation. Preserve specifications. Demand PDF (for now). Yes, the images are an issue, but if the images are data, then they ought to be actively preserved as data in their own right.
Ban MS documentation. History suggests that MS documents become WORN in about 6-8 years. Those who know no history are doomed to repeat it ...
So, I would argue that if you are doing curation, you have to address workflow before you get to the point of curation. If you know you want to preserve it, then think about that from the off. If you know you don't care about the future (shame on you), then yeah, ok, use your cool vendor tool ... but don't give the data to someone to curate, because curation is, in the end, about format conversion. If not now, sometime in the future INEVITABLY. If the documentation doesn't exist to do it, it's not curating. Don't kid yourselves.
All that said, much of the initial conversation was in the context of document curation, not data curation. IMHO the reason for much of what i perceive as confusion in their discussion, is not recognising the distinction! In the final analysis, I think that
if your object is to curate documents (i.e. what I would call the library functionality), then preserving PDF/A, latex etc, is perfectly fine - after all, with the spec, you're preserving with the same fidelity that documents have always been preserved.
if your object is to preserve the data, then it's a different ballgame, and folk need to confront the fact that curating data requires changes to the original workflow!
european summer drying
OK, I confess, I'm clearly reading my abstract summaries this morning ...
Briffa, van der Schrier and Jones: Wet and dry summers in Europe since 1750: evidence of increasing drought (International Journal of Climatology, 2009):
Moisture availability across Europe is calculated based on 22 stations that have long instrumental records for precipitation and temperature. The metric used is the self-calibrating Palmer Drought Severity Index which is based on soil moisture content. This quantity is calculated using a simplified water budget model, forced by historic records of precipitation and temperature data, where the latter are used in a simple parameterization for potential evaporation.
The Kew record shows a significant clustering of dry summers in the most recent decade. When all the records are considered together, recent widespread drying is clearly apparent and highly significant in this long-term context. By substituting the 1961-1990 climatological monthly mean temperatures for the actual monthly means in the parameterization for potential evaporation, an estimate is made of the direct effect of temperature on drought. This analysis shows that a major influence on the trend toward drier summer conditions is the observed increase in temperatures. This effect is particularly strong in central Europe.
Reading in 2009, 1: Six Degrees
So clearly in the last few weeks I've not been working and I've had half as many kids to look after ... so in between tears I've been dealing with fears .... the sort that abound if you make it much past the first chapter of Mark Lynas' book: Six Degrees: Our Future on a Hotter Planet.
Inspired by Aaaron Schwartz, albeit recognising reality (although heavily modified by recent events), I've decided to try and blog my years reading. Don't expect it to be too erudite, when I've got time I read pretty eclectically, and often choose mindless crap just to eat up time with as few brain cells as possible involved.
Anyway, back to the book of the day. The basic thesis is that there are six chapters describing the likely outcomes should our planet heat by between one and six degrees as a result of anthropogenic CO2 climate change.
It's a pretty well written book, with what looks to me like a reasonable coverage of the apocalyptic end of the literature. Clearly it's got a journalistic tone, with a fair dose of hyperbole, but he does temper it with a some qualification from time to time. While we all hope the six degree end is pretty unlikely, the possible consequences of even (!) the 2-3 degree changes make scary reading.
It'd be pretty easy, I think, to find the "ifs buts and maybes" in the original literature, and not much of it was "news" to me, but the thing about reading it all in one place was that it brought home to me that if even some of the predictions come home to roost, the world (both geographically and socially) is going to be a pretty different place in just a few decades, let alone a few centuries. Again, maybe that wasn't news, but there's something about having it rammed home all in one volume ...
So it's inspired me in two ways: I'm going to get back to the entire IPCC report and read the bits I don't normally (i.e. WG2 and WG3 stuff), and I'm going to try much harder to avoid business travel (you may well ask about personal travel, but we'll save the answer for another day). As regular readers will know, I've been avoiding business travel this last year anyway, in favour of virtual conferencing. You now know why, Evan having been pretty sick for a long time, and while looking after Evan is no longer an excuse for not travelling, I think given my profession, and given what we now believe about the future, it'd be wrong not to continue to try. Which brings me to my new years resolution: to try and convince my colleagues, especially the senior ones, to try harder to avoid physical meetings - particularly where the meetings are part of a regular sequence.
Update later same day: One of the things Lynas worries about is declining growth of corals: Ka ching!
blimey! solar wind and tropical cyclones
Here are a couple of papers that I'm going to have to find time to read properly:
Prikryl, P., Ru?in, V., and Rybansk\'y, M.: The influence of solar wind on extratropical cyclones ? Part 1: Wilcox effect revisited, Ann. Geophys., 27, 1-30, 2009, and
Prikryl, P., Muldrew, D. B., and Sofko, G. J.: The influence of solar wind on extratropical cyclones ? Part 2: A link mediated by auroral atmospheric gravity waves?, Ann. Geophys., 27, 31-57, 2009.
Some choice excerpts from the abstracts:
A sun-weather correlation, namely the link between solar magnetic sector boundary passage (SBP) by the Earth and upper-level tropospheric vorticity area index (VAI), that was found by Wilcox et al. (1974) and shown to be statistically significant by Hines and Halevy (1977) is revisited. A minimum in the VAI one day after SBP followed by an increase a few days later was observed. Using the ECMWF ERA-40 re-analysis dataset for the original period from 1963 to 1973 and extending it to 2002, we have verified what has become known as the "Wilcox effect" for the Northern as well as the Southern Hemisphere winters.
Cases of mesoscale cloud bands in extratropical cyclones are observed a few hours after atmospheric gravity waves (AGWs) are launched from the auroral ionosphere. It is suggested that the solar-wind-generated auroral AGWs contribute to processes that release instabilities and initiate slantwise convection thus leading to cloud bands and growth of extratropical cyclones.
It is also observed that severe extratropical storms, explosive cyclogenesis and significant sea level pressure deepenings of extratropical storms tend to occur within a few days of the arrival of high-speed solar wind.
Do I believe in this?
Well, I haven't read the papers, but I'm on record as believing that upper boundary affects can reach the troposphere, so it's feasible, particularly in that the basic thesis seems to revolve around small scale waves driving systems across instability boundaries, a non-linear affect that is more than feasible.