... personal wiki, blog and notes
Bryan's Blog 2006/10/19
On substitution groups and ISO19139
I have bleated already about the difficulties of using ISO19139 with restrictions which introduce new tag names.
Now the official way to do this is probably to exploit substitution groups in the new xml schema associated with your restriction. So, if one wanted to restrict, for example gmd:MD_Metadata one might start in your new schema with something like
At this point one could use the xml schema validation machinery to ensure one had a nice valid instance of the new restricted schema.
My beef is how we use this. The gurus will tell you there is no problem, and maybe there isn't if one wants to invest an enormous amount of time in complex handlers (even so, maybe it's not that straight forward for data binding, and perhaps the tools aren't really that mature - or weren't in 2004).
So, if I'm writing code to handle ISO19139 documents, I'm going to be writing xslt or using xpath or xquery to get at particular content or I'm going to have to invest in brute force if I want to handle things in a high level language like python (as far as I know there are no pythonic tools that get close to this sort of requirement internally).
Let's just explore the brute force method, and a simple use case: I have harvested ISO19139 profiles (I'm starting to think "variants" - complete with quotes - would be a better term :-) from a number of places, and want to deliver the titles to a web page ... so I need to find the titles. I can't assume I can use a simple xpath expression (which is supported in python) to find all the titles. I have to parse all the relevant schemas, and do something complex to find the new title elements. In practice, I have to support each profile as a completely different schema, they might as well not share the ISO19139 heritage - even though there are advantages in the ISO19115 content heritage. Yuck.
OK, now suppose I hand this off to an xquery engine. How easy is that? Let's assume it's not buggy ... This is essentially the use case described as 18.104.22.168 Q7 in the June 2006 use case document. I'm not that familiar with xquery, but it appears that
should then match any element which is linked to it via a substitution group declaration like that above. If it really is that simple, then this is much easier than the brute force method, and a good reason for passing my problems to a real xquery engine.
However, this may well work fine for handling document reshaping type tasks, but returning to the use case, I could well have tens of thousands of harvested documents, if not millions, and so I may well be considering indexing. I don't know, but it would appear that one has to rewrite all substitution group elements when producing an index - does our eXist native xml database technology do this automatically for me? I don't know, and that's the point.
All this marvellous xml technology is bloody complicated, and all to handle the case that a community wants to restrict the usage of some tags or lists! Why make all this grief? Wouldn't it be far easier to give community guidance, but accept perfectly valid ISO19139 documents which fall outside that guidance, because we could all simply follow the simple rules in David Orchard's article article, and especially the one I've highlighted before:
Document consumers must ingore any XML attributes or elements in a valid XML document that they do not recognise.
We could rewrite that as:
Communities should give guidance on those ISO19139 attributes or elements that need populating for usage within the community (and which might need to be handled by community tools).
Job done. No complicated machinery. More tools available, easier indexing, and much easier human parsing of everything ... (from the schemas, to the instances, and all the code that handles them).