... personal wiki, blog and notes
Bryan's Blog 2004/11
The NDG b-schema is the heart of how we plan to support data browsing (as opposed to searching).
The key concept is that we have a simple relationship between observation stations (ObsStn), Data Production Tools (DPT), and Activities, which are linked together by Deployments as depicted below:
Examples of how it could work would be
describe a bunch of tools with SensorML.
describe a bunch of places where it could be deployed with the ObsStn schema (we'll have to role our own)
describe the activities with free text
link them together in a deployment, which includes:
start date, end date
activityID, ObsStnID, DPT ID
any deploymen specific settings (e.g. calibration coefficients as actually used) ... we'd have to make sure that the DPT schema supported this because the attributes of the deployment should only be selectable from schemas defined at the higher level.
describe the model capabilities with the EarleySuite
describe a bunch of computational environments with ObsStn (ok, we need to change the name of this thing)
describe the activities
link them together as above except the deployment specific settings would be from the schema of possible model settings implicit in the Earley Suite.
Yesterday and today I got to spend some substantial time on trains (a trip to Liverpool actually) ... and spent the time learning about pyxmlsec (and by implication xmlsec). I was most interested in the application of digitally signing xml documents (and the subsequent verification). We need to do this for NDG authorisation.
Because xmlsec implements the W3C xml-signature standard, the whole thing is trivial. I had expected that I would have to do work parsing the signature element to find out what algorithm to use ... and I was worried about how to find the public key of the signer.
As I say, it all turned out to be relatively trivial, especially in python. In pseudo xml, we go from something like: <Document> <children> ... </children> </Document> to <Document> <children> ... </children> <signature> ... </signature> </Document>
But the beauty of it is that everything in the signature element is standardised, and one can even load the X509 public certificate of the signer into the signature. Having done that, the public key is travelling with the document. Of course to reliably verify one needs the public key of the signer of that person's (server?'s) certificate, but often that's going to be the root certificate, so we're likely to have that in a repository anyway.
So, it all become transparent and trivial in python ... I'm in the process of building a light weight sign and verify class that uses pyxmlsec, and then we wont have to worry about this any more ...
Is Access Grid worth it?
At a meeting I attended yesterday, there was some discussion as to whether expenditure on supporting access grid was cost effective.
My personal experience of video conferencing consists of many AG sessions and many H323 based videocons. For video conferences with multiple site involved, H322 is just plain hopeless (at least the way we do it). AG beats it hands down. So, if we need more than three sites, we have to have a physical meeting or use AG. So there is a case for AG over H323 (maybe in addition).
Let's do a worst case analysis - each site provides only one attendee to a meeting, then lets say each time, there is a saving for that site of (about hundred quid each travel, plus 100 quid each in productive time spent not travelling). Let's guess at about 200 meetings a year (a vast underestimate at the CCLRC), which implies a saving of about 20,000 a year, which is about what the kit was in the first year.
With 40 sites in the country, we're saving about 800K per annum (this must be a very lower bound, the real number must be much more than that, because most meetings average several attendees at each site).
Such an analysis doesn't even address the carbon savings in not travelling, an d to my mind these are even more important - nor does it address the international aspect.
However, the AG experience isn't as good as it could be, and I haven't included the (significant) cost of having an operator in these figures. For more sites to get involved and use AG, and to drop the operator cost, we need to improve it. On those figures alone, there is a prima facie case for us to spend very significant sums and effort on AG support.
Maths and Blogging
Inevitably I'm going to want to put maths in this blog, and if not here, on scientific wikis to support the CF (and other) projects.
I've just done a bit of a wander around the internet looking for wiki solutions, figuring that I'm going to want to extend the Leonardo wiki parser to do this. I started at http://c2.com/cgi/wiki?MathWiki, which as of today is sequence of thoughts from a number of folk, which as the comment at the top says "needs refactoring". Nonetheless, there are a lot of good links off to math wiki projects.
The general consensus is that one needs somehow to support inline latex commands, at least until MathML support is widespread amongst common browsers (hopefully in my lifetime?).
I had a look at two solutions in particular, firstly, the Ian Hutchinson Tex-to-HTML translator (TtH), and secondly, Bob McElrath's LatexWiki. After spending a few minutes looking at each of them, it seemed that the former probably required the wiki viewer to do too much hard work (my KDE 3.3 konqueror didn't render things right first time ...), whereas the latter just worked (via presenting in-line images). The latter has the advantage of being python based too ... (and released under the GPL, the former has a "commercial version with additional functionality", which is a bit of a turn-off). Both have problems with IE, and what you see if you choose to print the page will be very disappointing.
It seems clear to me that the state of play in this area is far more immature than one might have expected, and might repay some sort of investment in time (although regrettably not by me).
Knowledge Management Horizon
Today I attended a workshop "on "Towards Integrated Knowledge Management", organised as part of a DEFRA horizons scanning project. These horizon projects are meant to give DEFRA a chance (!) of being ready for the next environmental crisis, instead of always being in reactive mode. A totally laudable aim.
I was taken with both the number of people there, and the evident belief of some of them that it was possible to guess what questions could be answered (let alone asked of) by a knowledge management system in twenty years time.
While some things should be taken for read: the systems will be more complex, incorporate more provenance, and provide far more context for data, other are not nearly so predictable. In fact, I would argue that the only way to get from now to then (ten+ years), is to improve our existing systems so they are capable of effectively answering todays questions. Arguably, the questions of tomorrow will be the same as today (which have been the same for a thousand years): how can we make our homes safer, improve our own lives without exploiting others, and give our children a better future? However, no one can predict what technology we'll have in ten years time, so the very best we can do is take incremental steps from where we are now and we'll get there ... so we need to concentrate on maximising the amount of contextural data we store with our data and on interoperability. If we do that, avoid Intellectual Property Rights (IPR) and ownership issues, then we'll be on the right path.
I hope this leads to modular service orientated architectures, with published standards compliant data structures underpinning them, and populated with as much data and metadata as possible. Such systems should be built around the communities of existing users, and not central monoliths. However, I fear an over engineered centralised approach ... If only governments would spend as much on acquiring and managing data as they do on poorly defined software projects.
I've been interested in blogs, and blogging software for a long while. I run the KDE akregator on my laptop, and find that i have learnt a lot from other folks ruminations (as well as having been apalled and amused to various degrees).
For a long time I've also tried various ways of keeping notes, many of which I want to make public to my friends and colleagues. I've finally found a piece of software that meets most of my requirements, and because it's written in python, I can add the functionality that I need as I require it.