... personal wiki, blog and notes
Bryan's Blog 2007/03
Channel Four Shame
I didn't see the C4 programme, and hadn't planned on commenting on it here, but last night while I was watching the kiwis take another step towards the World Cup, my wife was on the phone to a teaching colleague: Apparently this colleague had seen the programme and had found it "pretty convincing", and worse from my point of view it was a common position! She had discussed it with other colleagues who had also found it convincing - science teachers all! So I spent further into the sleep bank and wrote most of this last night.
I don't blame them, one doesn't expect the mainstream "believable" media to be that poor: while TV is not refereed literature, in the UK at least one expects a level of integrity in "factual" or "discussion" pieces1. But I digress, this post is not supposed to be a diatribe, I wanted to write something that could be accessible to a teacher, with some links to folk who had seen the programme and had made some cogent responses.
So, remembering that I didn't see this programme, let me start with a comment about the contenders: on the one side we have the Intergovernmental Panel on Climate Change (representing thousands of active scientists, most of whom are not directly funded by any government, but are simply assembled by governments), and a handful of disaffected, often out of touch, or simply misled and misquoted, individuals orchestrated by an individual with form on misleading the public:
Martin Durkin, for his part, achieved notoriety when his previous series on the environment for the channel, called Against Nature , was roundly condemned by the Independent Television Commission for misleading contributors on the purpose of the programmes, and for editing four interviewees in a way that "distorted or mispresented their known views". Channel 4 was forced to issue a humiliating apology. But it seems to have forgiven Mr Durkin and sees no need to make special checks on the accuracy of the programme.
Now this isn't about weight of numbers alone, the point is this that there really is no significant argument amongst active scientists about this, climate change is real, happening now, and rather more rapidly than we hitherto expected (although that's not to say the potential impacts aren't overstated here and there), but C4 felt the need to make a "there's my Johnny, the only one in the entire army marching in step" kind of programme. This is a classic situation where a few amateur voices who can't get anything peer reviewed have a take on things that is purported to be as valid as that which results from peer review! What nonsense!
OK, so the two key links you want addressing the programme itself are
The Met Office (home of one of the best climate analysis groups on the planet) who issued a press release on the myths exposed in the programme. See http://www.metoffice.gov.uk/corporate/pressoffice/myths/index.html.
If you have a lot of time on your hands you can read the article and all 532 comments (as of now) at Real Climate, written by a couple of active climate scientist bloggers on one of the few sites on the internet with real blogging street cred on climate. See http://www.realclimate.org/index.php/archives/2007/03/swindled/.
And this is a blog entry I link to above (for the benefit of anyone who prints this out), on amateurism and peer review:
OK, and here's my summary and interpretation of their key points, along with some other bits and bobs.
Met Office: The bottom line is that temperature and CO2 are linked.
Real Climate implies they made a big deal out of CO2 not matching the temperature record over the 20th C. Apparently the graph they showed had been doctored (see below), and the very good explanation (suphate aerosol) for the discrepancy is well known, so the programme makers were lying by omission.
Met Office:The bottom line is that observations are now consistent with increased warming through the troposphere.
The troposphere should warm faster than the surface, say the models and basic theory. And Real Climate implies they claimed the data didn't agree with that. But it does! Unless you want to use data with known errors that have since been fixed (and the folk on this programme knew that perfectly well).
Apparentlly they tried blaming cosmic rays as well, in passing, so I'll address that in passing too: See my blog entry on that house of cards. The Met Office again: The bottom line is, even if cosmic rays have a detectable effect on climate (and this remains unproven), measured solar activity over the last few decades has not significantly changed and cannot explain the continued warming trend. In contrast, increases in CO2 are well measured and its warming effect is well quantified. It offers the most plausible explanation of most of the recent warming and future increases ... changes in solar activity do affect global temperatures. However, what research also shows is that increased greenhouse gas concentrations have a much greater effect than changes in the Sun?s energy over the last 50 years.
So let's wrap up with two more quotes:
The bottom line is that current models enable us to attribute the causes of past climate change and predict the main features of the future climate with a high degree of confidence. We now need to provide more regional detail and more complete analysis of extreme events.
... it means they have "touched up" pretty well all the graphs they've used (the solar one omitted the recent data; the 400y solar "filled in" some missing data that was missing for a good reason). Swindle indeed!
Data Journals or whither the Earth System Atlas
Aside from the assumptions that they were the first to think that scientific data should be preserved (wrong), or first to think of a proper citation based data publishing effort (wrong), or even that they would be the first to deliver such a thing (already done, and not even by us), many of their arguments are good. But I would say that, since I've been making many of the same arguments.
However, I had some substantive issues with their presentation that I didn't get a chance to discuss then, so I'm recording them here (I'm aiming on a meeting with their technical director next week, but it may not happen).
I wonder how hard they've thought about what it means to be a data journal, and all that entails: journals need to be persistent, and while that's relatively easy1 for something that has a paper copy, it gets more difficult for something that is digital only, and much more difficult for something which uses formats and conventions which are not in the commercial main stream. Essentially, a data journal has to be a fully operational data centre first, and a peer reviewed entity second. What I mean by that is: If you can't be sure that the data is properly managed, then will the citation be persistent? It's not for nothing that the AGU only allows data references to "proper" data centres.
Further, they've asserted that their ESA will provide "a centralised model" and that's a good thing, and I would contend that's downright wrong. If they're halfway successful, then supporting data access will require a distributed model; and the more distributed the better, ideally distributed across continents!
I didn't hear a clarion-call of standards compliance. The reason why electronic journals work is that everyone can read them, and that depends on the use of standard output formats (and frankly, the persistence thing depends on standard input formats, and both depend on lossless translation and the work of copy-editors :-). In the case of a data journal, the output formats will include figures (easy) and downloadable data (harder). They declared that "they'd start" with NetCDF, but without a convention for how to use NetCDF that's not enough. Of course they're aware they need metadata, but I fear they've only scratched the surface. I didn't hear ISO, I didn't hear OGC, I didn't hear CF-compliance (well I did hear all those things, but that was in my comments from the floor). Mind you, the QUEST science meeting may have been the wrong place for them to deliver acronym soup ... we avoided it in our NDG presentation to the same meeting and ended up with gross (inaccurate) generalisations as a consequence. So it may be that this is all in hand, and anyway they are still very much in the spin up phase.
I certainly didn't hear any discussion of what they will actually cite, beyond "we'll use DOI's". Regular readers will know that citing data is not going to be trivial.
All that aside, they're going to get some of that metadata structure for free: the QUEST part of the ESA will have to be data that is compliant with the NERC data policy and hence conform to BADC metadata requirements and we will be holding duplicate copies of the data.)
I note the existing ESA site has a UAH copyright, and that's not consistent with what they said about the data access being open. Further, they'll need to step into the data licensing cesspit (it's not enough to say it's free and open, they will have to license it, if for no other reason to avoid liability).
Reasons 1 and 2 are above are the reasons why in our efforts to develop a data journal (of which more another time, we've only just received funding for the second step following on from CLADDIER, but it will involve a pilot project with the Royal Meteorological Society, RMS), we're going for an overlay journal. The data journal part of it (which boils down to the specific metadata and documents needed to elevate the dataset to "peer-reviewed" above the metadata and documents needed to archive the data), will exploit existing reliable archives. That way, we can rely on the persistence via professional data archives, and point to multiple duplicate copies if we have confidence they are the same thing to get performance, we could even encourage different archives to have copies as mirrors if necessary, following the Lots of Copies Keeps Stuff Safe (LOCKSS) mentality.
As I said at the meeting, there is room for a spectrum of data journals in academia, so I don't see this as a "them or us" situation, far from it, I suspect our different approaches may appeal to different segments of the community.
I do notice that they've spent a lot of effort building up an editorial board and contacting the great and good, and we've spent no effort on that (although for the editorial board we'll exploit the RMS). It remains to see whether that has been a blunder for us.
planes, trains, and automobiles.
I'm sitting in a hotel in Paris, communicating by virtue of the hotel next door who has an open wireless network. Well done France, none of the British effort to make money from absolutely everything ...
Anyway, I felt like waxing lyrical about the Eurostar experience. Why does anyone fly from London to Paris? Oh yes, I know it's more expensive coming by train, but it's much much more pleasant. I've done this trip nearly entirely by train ... the first 20 minutes from our country estate (not everything on this blog is entirely accurate) was by car, but then
train into London (on time)
underground across London (quick and easy)
train to Paris
train to the last mile (ok, 100m, also quick and easy)
a la pied.
Total time, equivalent to flying, total cost not that different (given what it costs to get to Heathrow!), queueing time: Minimal. Wasted time hanging around: Minimal.
Now if only I could get a fast train to other places in Europe direct from home ... (and if only my mates who live in Paris weren't on the other side of the world this week).
Management Technique - Or Lack of It
Over the last six months I've been pretty poor at keeping track of my blogroll - my akgregator tells me I have 4020 unread articles. What I tend to do is simply ignore entire blogs for long periods of time, and then have a purge. On the train back from Manchester yesterday I finally got to having a look at Esther Derby's blog: Insights You Can Use. As I read through nearly all the articles I realised two things:
her blog is a must read blog for anyone who manages anyone (let alone manages people who build software, in which case it really ought to be a must MUST read blog). (Obviously I knew that, sort of, as I have her feed, but I've just elevated it back up to my "read nearly every day" category).
I've been a pretty poor manager lately.
I have to hang my head in shame about how I've been treating people, again for two reasons:
I haven't been as patient as I should have been, and
the reason I have been getting frustrated with folk is in many cases down to me!
There as so much in her blog, that it's hard to pick specific things that I want to share with you, but anyway, here's a couple (not necessarily the best things, but just fairly typical: insights I can use)!
The Prime Directive
Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.
Two threads here:
people who aren't following my ideas are "resisting". Why's that then?
they don't know how
they don't feel they have time
they think their way is better
they don't think the new way will work
they don't like/respect the person requesting the change
the new suggestion is counter intuitive given people existing mental models (or what they've been taught)
the new suggestion runs counter to existing reward structures or other organizational systems
the new suggestion doesn't make sense to them
they have no experience that tells them the new way will work, or how it will work.
Maybe we need to be careful about how we introduce new ideas? Our wonderful ideas of how to make things better for other people may not be greeted with enthusiasm, because:
Other people may value things about the old way that we don't see or don't appreciate.
Things that we don't like about the old way may be valued by other people.
It takes time for most sentient beings to adjust to a new way.
People will accept the new way to retain something they value (ok, so help them with that!)
Even when the new way is accepted, people may look back fondly on the old way (deal with that).
Quick Code Reviews
Well, ok, she didn't write this one but she did link to it, so I wouldn't otherwise know about it. The bottom line is that rather than have formal code reviews a good way to create good software and promulgate good knowledge about thta software through a team is to
unintrusively request a small piece of time (max 5-10 minutes) from a colleague to discuss code whenever a new and small (less than 10 minutes remember) piece has been written to fix a bug or satisfy a new unit test (or both I suppose, since any bug in the best of all possible worlds should probably create a new unit test).
go through it line by line, and
get the benefit that over time no code is only understood by only one person!
Up-to-date technical skills : 14% (9 votes)
The ability to juggle multiple, constantly reprioritized tasks: 38% (25 votes)
Skin thick enough to take constant end-user abuse: 8% (5 votes)
The ability to say "no" without making people angry: 15% (10 votes)
A sense of humor: 26% (17 votes)
As she says:
The respondants to this survey apparently want employees who have out-of-date technical skills and are thin-skinned yes-men and yes-women. And they'll take those employees and subject them to multitasking on constanly shifting priorities.
I'm thinking they won't be producing much working software.
Oh s**t, well I want my team to be up-to-date, but I also want them to have those other characteristics, so that implies I may be one of those that leads teams that doesn't deliver as much as they might (we do deliver working software, but I'm always moaning about how slowly, perhaps it's me ...)
After all that, can I change?
Review of the ESA HMA project
From all perspectives, the technical opportunities for involvement in HMA in the future are good: the underlying technology is being developed in a public manner, a testbed and service validation system is planned, there is considerable scope for expansion (both by adding services and data products to the DAIL layer), and an investment in HMA technology is likely to have payoffs in the wider deployment of geospatial services (including commercial deployment).
Technical summary points are:
The HMA project is being developed with methodologies based on the ISO and OGC specifications. To understand the HMA, data providers and data consumers will need to be familiar with those specifications.
Not only are the baseline specifications in the public domain, but many of the HMA architectural specifications are in the public domain in form of OGC documents ? so the only barrier to uptake on HMA technology is appropriate funding.
The project is on target to deliver new functionality based on the existing SSE toolkit and a Data Access and Interoperability Layer (DAIL).
The functionality that will be delivered by the DAIL will be limited by design decisions that have been made for pragmatic reasons in a changing landscape of what should be reliable interoperable web-service technologies.
As the underlying technologies change, and as the requirements of the HMA are driven by th wider GMES project, it is inevitable that changes in the DAIL (and associated toolkits) will be required. This is recognized by the establishment of a HMA project Architectural Board (HAB).
The membership of HAB may need to be reviewed to ensure it is forward looking and not limited to just the existing HMA partners (it may not be enough to have mechanisms for adding new members as new missions are added). HAB deliberations should be public (although obviously individual mission implementation timescales should remain confidential if desired).
There is considerable prospect for expansion of the HMA into other ESA activities.
There are some issues associated with identity management technologies which may slow progress with moving from prototypes to implementation. These are compounded by a potential lack of trust by data providers in (1) the ability of the DAIL to protect information about data/service use by individual users, and (2) the protections that their IPR has within the SOA. (The latter being unfounded in our opinion).
While the project has made good use of OGC specs for metadata management, service description and control, there has not been any significant data modelling, and that will limi the use that can be made of OGC web services for data consumption, either within DAILservices, or by DAIL consumers.
The current development is based around layers; instruments that provide atmospheric profiles will not be well supported in the initial phases. (This is a consequence of the lack of data modelling and consequential lack of feature-type definition beyond the implicit assumption that the data consists of layers).
The permanent testbed to be created as part of the HMA-T project should ease development of HMA compatible services (both those which consume services via the DAIL and those which expose services via the DAIL).
The proposed OGC pilot project should expose the HMA technologies for wider constructive critique, and this will be of significant benefit both to GMES and the wider community.
Update 07-17-2008: There is a new HMA wiki with more information about subsequent activities!
wsgi, unicode and paste
I woke up this morning with a sore head ... no not from the daemon drink, but because I had a unicode problem. Everytime I have a unicode problem I get a sore head. Nearly every time the solution is obvious, but usually I can't frame the question well enough ...
Anyway, this time I was getting this error in a paste application:
ERROR:root:Traceback (most recent call last): File "build/bdist.linux-i686/egg/wsgiutils/wsgiServer.py", line 131, in runWSGIApp self.wsgiWriteData (data) File "build/bdist.linux-i686/egg/wsgiutils/wsgiServer.py", line 177, in wsgiWriteData self.wfile.write (data) File "/usr/local/lib/python2.5/socket.py", line 254, in write data = str(data) # XXX Should really reject non-string non-buffers UnicodeEncodeError: 'ascii' codec can't encode character u'\xad' in position 5710: ordinal not in range(128)
It was fairly obvious that my wsgi application was returning a unicode string where a vanilla string was required. But why was this a problem and what should one do about it?
It turns out that the wsgi spec requires vanilla strings, which means the programmer (me) needs to handle this explicity. Thanks to Ian Bicking on the paste mailing list the solution is obvious (well it is now):
def __call__(self,environ,start_response): ''' This is an example wsgi application ''' #go do some real work and return some (possibly) unicode string r=somefunction(environ) start_response('200 OK', [('Content-Type', 'text/html'),('charset','utf-8')]) return [r.encode('utf-8')] *** highlight file error ***
Roundtripping openoffice and msword - bullets
I'm gradually moving to using openoffice more and more (yes, I think I'd admitted to using msoffice before, but I'm finding it more and more unreliable - especially since my combination of crossover office and ubuntu has stuffed up the font support so pdf output from msoffice is broken). However, to share with some colleagues, I do need to use .doc format ...
And I find that second level bullets get munged in tranlating from odt to and from doc in such a way that I can't seem to undo it. This problem is apparently well known, but there doesn't seem to be an ubuntu solution (or indeed much comment from the ubuntu space).
If anyone knows one for dapper, I'd be keen to know ...
python soap library proliferation
Implementations of python soap libraries appear to be like buses. There are none along for a while, and then suddenly there are two in a row. In the beginning there was soapy and zsi, and they became one. Soapy is old and no longer being supported, and the soapy and zsi communities are coalescing on ZSI, under the name Python Web Services.
Both of the "new" python-soap stacks make use of decorators, which make the code a good deal cleaner than ZSI (starting later has its advantages), but I'm sad that yet again the python community is fragmenting around a key piece of infrastructure.
I can understand why the proponents of the new python libraries didn't like ZSI, and I should be the last person to criticise someone else for thinking green fields are easier to build on than brown fields (I have a history of doing it myself), but yet ...
While competition (and consequential survival of the fittest) is a good thing, I think our community isn't really big enough to support three fully functional stacks. It sounds like the two new ones might themeselves coalesce, which would leave two python soap implementations, and maybe that's supportable, and even good for our community.
Meanwhile, neither of the new ones have a working wsdl2py, which is a key part of service consumption (starting later has its disadvantages), and it's not yet clear how the two companies involved will handle open-source communities trying to build around their babies: it's been hard enough for the ZSI community to avoid forking. So, for the while, we'll stick with ZSI, and we'll probably contribute our ws-security implementation back to ZSI, but I suppose we'll have to evaluate that decision properly now ... I hate shifting sands.
Another Nail in the Cosmic Ray Conspiracy Coffin
I liked this paper because of the conclusion, and because they cited us as the location from which the data is available. I wish more folk would do that. We need to be able to demonstrate this sort of thing to those who fund us.
Anyway, the conclusion is pretty unambiguous, and like those who pointed me at the paper, I can't resist quoting the abstract:
The International Satellite Cloud Climatology Project (ISCCP) multi-decadal record of cloudiness exhibits a well known global decrease in cloud amounts. This downward trend has recently been used to suggest widespread increases in surface solar heating, decreases in planetary albedo, and deficiencies in global climate models. Here we show that trends observed in the ISCCP data are satellite viewing geometry artifacts and are not related to physical changes in the atmosphere. Our results suggest that in its current form, the ISCCP data may not be appropriate for certain long-term global studies, especially those focused on trends.