... personal wiki, blog and notes
taking OGSA DAI seriously again
Early on in the evolution of the NERC DataGrid we investigated OGSA/DAI, which is a "data access and integration" component of the Globus stable. We rejected it for a number of reasons, chief of which were that the software was immature, and it didn't seem to offer much more than what has recently been recently termed WS-JDBC (albeit perhaps with a dash of WS-XML:DB API).
Of course the years roll on, and maybe we should have revisited it, but it has always seemed like the rest of the globus toolkit - a good idea, a bit immature, maturity just over the horizon - maybe useable next year ... with the same fatal flaw: for a group with relatively little spare engineering time, it was always over the horizon, every year!
It still looks like for us it'll be next year (technically "next year" now means "in some successor project to NDG2"), despite the fact that even within the Met community it's gaining some traction. However if it's going to be next year, we have to know some more about it sooner than that, so it seemed good to see that Ian Foster had chased up on the criticism by getting Malcolm Atkinson to say a few words.
The trouble is, even after reading the article, I'm not convinced we should investigate further. Malcolm's points were essentially:
it supports multiple backend formats (OK, but this actually just means you still have to know how to query and understand the schema which defines the backend content) ...
it is extensible ("OGSA-DAI has three popular extensibility points, the data resource adapters, the activities and the client libraries") ... hmmm in what way is JDBC not extensible in the same sense? (OK, I know the answer to that, it's obvious that OGSA-DAI would allow a consistent framework for access to multiple backends, but the other two are surely the same? Still, score a point for OGSA/DAI).
"... OGSA-DAI contains a variety of multiple-data source functions, such as DQP (still in prototype), and multi-site query facilities which deal with partial availability." Hmmm, well I understand the words "Distributed Query Processing" but don't really understand how that is much use unless the backend resources are pretty homogeneous (never the case for me). The rest is a mystery to me.
So, I'm still not enlightened. Which is a source of frustration (it always seems like Globus nearly offers us what we want), however, I can see one minor place where OGSA/DAI might have helped us in NDG2, but I can't help wondering whether the investment in effort would have been commensurate with the reward ... given that my problem still remains that for all the problems which I find interesting, we worry about the nature of the things we store (their "feature-type"). For us, describing those features in a way that is queryable and can be interpretted by client software is the domain of the OGC webservices. It's not clear to me what role OGSA/DAI could play for us in that context (although there is a "delivery" component of OGSA/DAI which is nagging at me in the context of asynchronous delivery of data, which could be important for WPS or WCS with big data objects).
In the UK we have a couple of projects running under the banner of "Grid-OGC collision", at least one of which I think seems to be aiming to confront OGSA/DAI with OGC WFS/WCS. As I say, I can't see the mileage in it myself, but I've been wrong before, will be wrong again, and am glad someone else is doing the investigating. If there is an NDG3, we'll be looking to those projects to guide us as to whether there really is a role for OGSA/DAI in our activities which is beyond a putative "WS-JDBC".