... personal wiki, blog and notes
Bryan's Blog 2005/07
The Linux Desktop and the Network
I've just been up to Edinburgh for a couple of days of meetings, but while up there, I tried to drop Linux onto a machine for a mate who is sick of spyware and other Windows-acious problems.
I installed Suse 9.3 on his system (keeping Windows), and noting the possibility of him wanting to run software suspend to disk I made sure there was a little boot partition running ext3.
The major thing from his point of view was that he had to have good internet access, and a stable system. Bare in mind that this bloke is very familiar with the various misbehaviours of "that other OS", and not at all about Linux.
He uses tiscali broadband, and that turned out to involve a USB-modem (like most broadband users I guess). The particular usb-modem involved was a fast 800 type modem ...
The installation requires access to the kernel source, but seems otherwise straightforward for me ... but then I'm not afraid of command line access ...
Further problems arise in using it ... in practice there are three steps, which are encapsulated in the following script ...
# /usr/local/sbin/eaglectrl -d sleep 20 /usr/local/sbin/startadsl #
(there is a stop adsl command available too).
One can automate the up/down for a user ("user") by the following steps:
Place the above script in /home/user/bin as mystartadsl
Edit /etc/sudoers (using visudo as root) and add the following line: user ALL = NOPASSWD: /home/user/bin/mystartadsl, /usr/local/sbin/stopadsl
Use the kde menu editor (or whatever else you use) to add in the internet menu "start tiscali" and "stop tiscali" menu entroes which invoke (respectively):
but the user doesn't know automagically whether it is up or down. We need to be able to interact with the kde internet management, and I didn't have time to do that. There is no way my mate has the necessary expertise (now).
Software suspend also breaks (with a usb key disk inserted). The bottom line at the moment is that my mate claimed "the OS is bugggy". It's not much chop for me to claim it's not the OS, it's the distribution ... particularly with a "state-of-the-art" commercial distro ... So he's gone back to Windows (hopefully only for a while).
It would appear that mandrake claims this usb modem works out of the box. I'll be interested to know if that means kde recognises it properly, and the mandrake (or should I say mandriva) firewall behaves properly. I also saw on the ubuntu forum that they hope to have it in a future ubuntu distribution.
I guess my point (apart from documenting what I did should I ever get back up to try and finish this), is that broadband access by usb modem is what nearly every home linux desktop user will want to do (at least until they get a wireless router). Until the desktop distributions support that easily, it's going to be very hard for folk to use these distros.
Business Models and Curation
While up in Edinburgh, I visited the National Digital Curation Centre. Amongst the many interesting things we talked about was the pesky difficulty of applying business models to curation. Chris Rusbridge at the DCC differentiates between curation and preservation as:
Preservation: something you do for the future, and
Curation: something you do for both now and the future.
This is a useful distinction, although I would perhaps modify curation to be both preservation and facilitation, to make clear what the curator is doing for the current users (faciling their doing something else). In any case, this definition of curation works for us.
It's rather easy to provide a business model for facilitation, but not for preservation. If what you do is only preservation, then one has a difficult road to follow in establishing a business model. You have no users, and the future value of what you preserve is probably unknown. You only have costs associated with ingestion and ongoing storage (+migration etc). What are the metrics associated with successful preservation?
If you do both, then the risk you have is that facilitation dominates over preservation, because the business model for facilitation is rather easier to determine, and metrics for measuring success are much easier to determine.
It turns out that the DCC ran (with the Digital Preservation Coalition) a workshop on such issues, at the same time as a subcommittee of the NERC Data Management Advisory Group (DMAG) met to discuss output performance measures (or OPMs). Clearly OPMs have to relate to the objectives of the organisation, which to some extent come down to the business model. I attended neither meeting, but Chris was kind enough to send me some key presentations (they should be available on the web at some point, I'll update this page when they are). Sam Pepler from the BADC attended the DMAG meeting, so I've got some feedback from both.
The material Chris sent introduced me to the Balanced Scorecard approach (see here for an introduction to the balanced score card, but the basic idea is to apply more than just short term finances to evaluation of success, particularly when developing intangible capital). The espida group are applying this to digitation curation.
One of the things James Currall (from espida) appears to have talked about (at the curation cost model meeting) is the value-time behaviour of items, which he depicted in the following diagram (I will give a proper reference to where this comes from when I have one):
and he talked about a number of asset classes that one might look at preserving, which included research data. What struck me though was the implicit assumption in the above figure that all things decrease in value with time. Leaving aside how historians feel about that, I felt obliged to redraw his figure as follows:
The point I'm making is that for us, and for our science community, data usually has an immediate value, associated with (and preceding paper writing), and then, because it's generally something about the real world, the value increases with time, as it becomes part of our timeseries of world observations (athough, until it's used, which might be when the timeseries is "long enough" - whatever that means - we might have difficulty in justifying holding it).
This has implications on our cost model (and perhaps our balanced score card when/if we get to that). In this case, facilitation is generally about helping that first bump, and then preservation is about ensuring that the slope of the value time graph for environemntal data is positive!
The difficulty with our output performance measures is how to capture the latter. One of things that the DMAG subcommittee discussed was measuring the number of datasets published. Leaving aside the definition of publication, I would argue we need to measure the amount of ingestion work that is done (both of new datasets, which is generally hard, and of additional data for old datasets, which although it ought to be easier, is something we generally do badly in terms of updating metadata). Perhaps we can add to our list of criteria, the age of the datasets we hold - the older they are, and the more complete the timeseries, the more important they are - even if they have no current users. Worse still, if the timeseries is not being added to, and we have no (current) users, then how do we evaluate how well we are preserving it, and what resources should be devoted to doing so?
Update, August 2nd: James Currall tells me that when he actually gave his talk at the meeting the diagram had another curve on it: an upward increasing line representing the value of malt whisky with time ... so he was already thinking about the class of things with values that increase with time (another example he suggested was artworks).
Me: I think wine would be a better example than malt whisky (which apparently always increases in value with time :-) ... wine might or might not, but generally you don't know til you open the bottle!
Quite often I have to make tiny modifications to existing pdf files.
I downloaded it and the Fast Light Toolkit, and a few minutes later I could do it. Installing flpsed was a wee bit annoying on my Suse (still 9.2 dammit) system:
Configuring, making and installing fltk was a doddle.
The Configure for flpsed couldn't find the fltk though, I had to do this:
Then it's straight forward to use. Take a pdf document. Use pdf2ps to produce a postscript file. Annotate (not edit) it with flpsed, and then export the result as pdf. Done.
Icehouse and Greenhouse Worlds
I've been ignoring paleoclimatology for a long time as being yet another interesting field, that I haven't time to pay attention to. However, today's lunch time reading was Kump's letter to Nature which introduced me to the concept that in the Eocene (55-34 million years ago), the earth was thought to be essentially warm and ice free most of the time, associated with higher levels of CO2. However, apparently that all changed at the Eocene-Oligocene boundary, when the earth moved into it's current glaciated state i.e masses of permanent ice over Antarctica and elsewhere.
It turns out that during the Eocence there probably were glaciations but they didn't persist. The hypothesis advance for the transition into the Oligocence with permanent glaciation (or why earlier "minor" glaciations didn't persist) seems to be related to a rapid drawdown in atmospheric CO2 resulting from increased weatherability of the continents associated with Himalayan uplift.
There were two things I took from this:
I had no idea that weatherability of rocks could be so important for the atmospheric CO2 loading (ok, I should have, I've heard colleagues witter on about this in the past, but hadn't paid enough attention), and
Kump's final conclusion:
If decreasing atmospheric CO2 stabilized the glacial state in the Oligocene, might increasing atmospheric CO2 from fossil-fuel burning destabilize it in the future? The lesson to be learned here is that we should watch for subtle signs that we are moving from the icehouse world in which Earth has remained for 34 million years into a new, greenhouse world.
As I said in January, this is something we should all give a bit of thought to from time to time, not least because these letters give us a timescale, of say, two or three decades, so it's going to effect our retirements!
(I especially liked the quote from Bartlett on the upper limit of this period: assume all the earth is oil, and current growth rates in usage, and it'll all be gone in 342 years!)
Joseph Reagle has been using elementtree, and that link points to a useful set of notes on how to use it.
I've had a play with a number of ways of processing XML in python, and I like elementtree the best of all ... but I find the original docs slightly disappointing (i.e. not complete enough). Given I don't get to use it day to day, I need an easy to use crib sheet ... if I had the time I'd link all the bits and bobs I've seen together ... but one day is probably a long way away regrettably.
Atlantic Ocean Oscillations and European Climate
Rowan Sutton and Daniel Hodson have an intriguing paper in Science on the influence of the Atlantic Ocean on summer climate in Europe (and North America).
The bottom line of this paper is that the "Atlantic Multidecadal Oscillation" (AMO) has an influence which lasts for tens of years on whether the summer precipitation and temperatures are wetter and/or warmer than the average or not. They conclude with:
In the absence of anthropogenic effects and assuming a period of 65 to 80 years, we should now be entering a warm phase of the AMO. Our results would then suggest a forecast of decreased (relative to 1961 to 1990) summer precipitation (increasing drought frequency) and warmer temperatures in the United States together, possibly, with increased summer precipitation and temperatures in western Europe.
They go on to point out a possible non-linear coupling between the AMO and possible anthropogenic effects on the thermohaline circulation (TC) which could ameliorate (presumably for a while) anthropogenic climate changes by moving the AMO (presumably more rapidly than normal) into a negative phase (with presumably cooler and wetter conditions) as the TC changed. (All the presumablies are mine not theirs).
All this is good stuff, but as Rowan is a mate, I have to take issue with a sentence earlier on the paper:
A simple significance test suggests that the major observed anomalies shown in Fig. 2 are unlikely to have arisen from internal fluctuations of the atmosphere.
I looked at the supporting material (pdf) for all of ten minutes and couldn't fathom it directly. I can think of no way that a significance test on the observational data alone can tell me anything about whether any patterns are due to internal or external fluctuations. If, as I assume from the supporting material (it doesn't state it directly with reference to that sentence), it's based on comparing the variabilty with the atmosphere only model (driven by constrained sea surface temperatures), then it falls or dies on whether that model has realistic variability on those scales (not on the significance test). The paper goes on to justify the argument based on the coupled atmosphere/ocean experiments, but the way it reads implies there is an a priori belief that it is external variability based only on the observed statistics without having done all the model experiments. However, that's just a quibble ... I'd like the paper to have made that more clear, but it in no way detracts from the results.
RSS and Atom
Now that Atom 1.0 is pretty much out, it's useful to point to the comparison between RSS and Atom. I'm doing that here so I can easily find it again.
(As an aside, Sam Ruby who hosts that wiki page was slashdotted and coped, his description is here).
leonardo (the software that runs this site) is being upgraded to atom-1.0 even as I type, but I'm sad to say that I'm not contributing ... perhaps when Elizabeth sleeps through the night I'll regain some extra time for extra activities.
Update July 22: and Niels Leenheer compares Atom 0.3 and Atom 1.0. Via Sam Ruby again.
Microsoft Please Save Paper
I refuse to believe that the world needs default margins of one inch (or 2.54 cm) ... imagine how much paper could be saved if the default margin was 1.5 cm? The first thing I do with a new Office installation is change the default template ... but couldn't it be better from the beginning (those who really want the white space could change their own defaults).
(Yes, I do use MS-Office!)
Patent Status of XML-Signature etc
I found out yesterday that WS-Security has patent/license problems which make it difficult to use in a GPL environment. That got me worried about NDG security. We depend on (or will depend on) three pieces of technology
PKI X509 certificate handling (and signatures), and
our Attribute Certificates.
Taking these one at a time.
What is the patent/license status of XML-signature? It's a W3C standard which is a Good Thing (TM), but that doesn't guarantee much. What the W3C knows is summarised here, but probably the best summary of the status appears as a comment on the patent status of xml-signature by Joseph Reagle (the W3C co-chair) which because it's so relevant I'll repeat in it's entirety:
Unfortunately, it's difficult for the patent status of anything to be very clear. (It's like proving a negative: God doesn't exist.) The only clear patent status IMHO is one that has been upheld in court or otherwise considered uncontestable, and it's license has been publically excercised by many implementors.
Regardless, there are a few ambigous statements from a few years back that folks should be aware of, but I'm not personally aware of any specific claims of infringement or licenses with respect to the 12+ implementations.
PKI. Well, ideally we'll concentrate on using OpenSSL, which has a useful FAQ on the topic of GPL and patents and OpenSSL. The key points are that
OpenSSL itself is not a problem, but the various algorithms it uses are patent encumbered (as described in the README). In principle however, we can always change the actual agorithm we use.
The GPL issue is ok on linux systems, but in case of other O/S it is summarised with this:
If you develop open source software that uses OpenSSL, you may find it useful to choose an other license than the GPL, or state explicitly that "This program is released under the GPL with the additional exemption that compiling, linking, and/or using OpenSSL is allowed." If you are using GPL software developed by others, you may want to ask the copyright holder for permission to use their software with OpenSSL.
Finally, our attribute certificates are just XML documents which describe our own security policies. I don't think anyone elses patent could affect that. However, since we talked about migrating to SAML at one point, so an interesting question would be what would happen if we migrated to SAML to encode our attribute certificates (why we should do this is another question that needs an answer that I can't give - because I haven't one, but people keep telling me we should) ...
The situation for OpenSAML seems somewhat more unclear, and I'll probably need to follow it up at some point, but meanwhile it seems like apache binned OpenSAML because of patent issues, but on the other hand the Shibboleth team seem happy with the patent license that would be granted for them (see the attachment on the previous link), and presumably we could get the same (perpetual terms that appear to allow the delegation of authority to use). Fortunately, SAML isn't on the agenda yet, so we don't need to go there ...
More Good Ideas about Blogging
So here's a what if- what if you could do 20% time projects ONLY if you blogged about the effort with customers? Even if you couldn't share all of the details for competitive reasons, would this make 20% time more valuable to the organization? Something to ponder.
we don't find the time for blogging, we make time for it. I've commented in the past that blogging isn't an addition to our day job, it's part of our day job. In recent weeks I've come to think of it as something akin - though different - to Google's 20% time. We have nothing so formalized, but we probably spend something like 20% of our time (ok, more) researching and writing and pursuing what we consider to be new and interesting avenues of interest. Some of these bear fruit for RedMonk, some don't. But it only takes a couple of hits to make the whole thing worthwhile.
In our context (an atmospheric data centre), one of the difficulties I have is that our staff need to remain research active (in the sense that the computer people need to keep their skills up, and the atmospheric scientists need to keep interested and up with the play). I've traditionally said this should be a twenty percent activity, but it's been nearly impossible to find ways to make this
Perhaps getting them to actually blog about it would achieve both (since most of what we do wont be publishable in the refereed journal sense).
by Bryan Lawrence : 2005/07/18 (permalink)
ECMWF to increase their operational model resolution
I have had my attention drawn to the planned increase in the resolution of the European Centre for Medium Range Weatherforecasting operational model from (in the case of the atmosphere) T511N256L60 to T799N400L91.
This corresponds to improving the horizontal resolution for the dynamics from about 40 km to about 25 km and the physics from about 80 km to about 50 km (where here I'm quoting the shortest resolved wave at the equator). Of course the true resolution is less than this because the numerics can't support two-grid waves (or anything near that). A rule of thumb might be to say four grid points, in which case the resolution is still about 100 km in reality, which is amazing!
If you're anything like me you can't remember - or easily calculate- what these represent in "real" surface resolution terms. So, for my own benefit, I reminded myself by perusing Laprise, 1992 and writing some simple python to generate the numbers.
# This code follows Laprise, 1992, in BAMS, # See http://blue.atmos.colostate.edu/publications/pdf/NT-27a.pdf # from math import * def gridres(N=None,T=None): pi=3.141592 a=6371.0 if N is None and T is None: return 'Unknown Resolution (need N and/or T)' if T is not None and N is None: N=T if N is not None: L1=2.*pi*a/(3*N+1) L2=pi*a/N # note ECMWF seem to quote this. L3=sqrt(4*pi)*a/(N+1) L4=pi/sqrt(N*(N+1)/(2*a*a)) return int(L1),int(L2),int(L3),int(L4) if __name__=="__main__": # print gridres(T=31) # checking code against the paper # ECMWF September Upgrade: # print "Best to quote the 2nd of these numbers on each line:" print gridres(T=799) print gridres(N=400) *** highlight file error ***
WS-Security Licensing Problems
Oh how I hate Intellectual Property wars ...
Although WS-Security, along with the other so-called WS-* specifications such as BPEL (Business Process Execution Language), is under the jurisdiction of OASIS, users still must sign license agreements with IBM and Microsoft.
And oh dear, the terms of that license are probably incompatible with the GPL, so we probably can't build something based on WS-Security and distribute it with GPL-based software. See David Berlind blogging at ZDnet, for a more detailed discussion of this issue. This article also points out the problems with the OASIS "standard"s in general ...
Frightening stuff ... but some hope for the future in another Berlind blog. Meanwhile I begin to understand why W3C standards are far more useful (I don't have to investigate what the licensing status of their standards are, they are clear).
Linux drivers for HP Color LaserJet 2820
I'm considering buying a color (sic) laserjet 2820 (an all-in-one printer) to replace my aging deskjet G55 ...
Anyone out there running one? The hpinkjet site doesn't yet mention it, but I've seen that before ... and things have worked just fine ...
Aerosol Climate Forcing
There is an excellent paper in Nature by Andreae, Jones and Cox 1 on what the future may hold as greenhouse gas forcing increases global temperature, just as the protective affect of polluting aerosols decreases. In fact, it's worse than that: as the authors say:
The twentyfirst-century climate will therefore suffer the treble hit of an increasing warming from greenhouse gases, a decreasing cooling from aerosols, and positive feedbacks from the carbon cycle, whereby increased temperatures cause accelerated release of soil carbon by decomposition.
However, we don't really know how bad it will be because
we don't really understand how much the aerosol forcing is protecting us now. Again, as they say:
Do we live in a world with weak aerosol cooling and thus low climate sensitivity, in which case future climate change may be expected to be relatively benign? Or do we live in a highly forced, highly sensitive world with a very uncertain and worrying future that may bring a much faster temperature rise than is generally anticipated?
it would appear that the parameters constraining the carbon release by decomposition are also ambiguous.
While this paper admits a wide range of possible futures (and describes very well why), the bottom line is that
there is a possibility that climate change in the twentyfirst century will follow the upper extremes of current IPCC estimates, and may even exceed them.
(Actually, if one reads the paper in detail, they imply this isn't just possible, but likely, or at least that's my reading of what happens using best estimates of the parameters and likely emissions scenarios.). They go on to say:
Such a degree of climate change is so far outside the range covered by our experience and scientific understanding that we cannot with any confidence predict the consequences for the Earth system.
However, these predictions are cloaked in an enormous range of possibilities, and they rightly point out that a number of approaches (including improving parameterisation of cloud processes) are needed to improve confidence in such predictions.
Norwegian Government Says Yes to Open Standards
by the end of 2006 every body of the public sector in Norway must have in place a plan for the use of open source code and open standards.
ERA40 Precipitation and Antarctic Ice
There is a fascinating article by Davis et.al. in Science on Antarctic Ice thickness. There is also a perspective piece by David Vaughan of the British Antarctic Survey. The gist of the substantive article is that some parts of the Antarctic land mass are thickening while others are thinning (the results are based on radar altimetry measurements over eleven years). Not surprisingly, they link the thickening versus thinning to precipitation changes. Leaving aside the implications (evidence for climate change and concommitant sea level rise which are the main thrust of the paper 1), one of the things I find most interesting is how good the ERA40 precipitation measurements are! The following image shows the ERA40 snow precipitation (left) and ERS elevation changes (reds more precip, snowmass increasing, blues, less precip, less snowmass):
Given that there are nearly no precip measurements going into ERA40 at these latitudes, it shows that the precipitation physics in ERA40 are rather better than I thought they were, although, as the paper points out, the magnitudes are not as good as the spatial patterns (and some of the differences in some areas are due to ice dynamics).
(As usual the figures have been degraded in resolution and have had the colour scale removed to comply with my copyright fair use criteria).