Bryan Lawrence

... personal wiki, blog and notes

cf2responses

On November the 9th I publicised the first public draft of the Future of CF white paper both on my blog and the cf mailing list.

The significant community responses are listed here. They are broadly approving, so the next steps are to integrate the specific suggestions into our plans, and deal with the issues raised.

All responses slightly modified for anonymity, and enumeration has been added. If any of the response authors would like me to go back to their response verbatim, I'd be glad to do add links to the verbatim text, with the obvious caveat that the authorship would then be attributable.

(If you know you sent a response, and it isn't here, please send it again ... sorry).

Response A

I strongly support the need for intellectual rigour - its a real credit to the current co-authors that they have kept the current standard so well defined given their resources. I also like your 10 point list for managing modifications - point 5 appears to be new (as far as I'm aware) but seems like a good idea if you can move it into common practise.

  1. I think the root of any concerns I have with the proposed governance structure are related to the details of the way the funding works. In the short term I think the funding model is via Benevolent Organisations. On the one hand these organisations are putting something in and so should be 'rewarded' for this (in fact this is a motivation for becoming a benevolent funder), on the other you do not want this to bias the development of CF or give undue weight to these organisations in the community consensus. This isn't meant to imply any intentional biasing on the part of those organisations: I'm sure they will be committed to an open community process, and the committee structure should minimise any bias. Though there may be biases almost as a side effect of centring the activity in a couple of places.

    • As an example of this possible biasing is the reference implementation(s) (which has be be qualified by the fact I'm not sure I fully understand what is meant by a reference implementation). I'm taking it to mean a library/module of code that would read a CF NetCDF file, check for compliance, and put it into a set of structures in that encapsulate the meta data (and implied operations). This may introduce a conflict of interests. If the conventions staff member (who is likely to be responsible for maintaining a reference implementation) sits at an institute that develops a particular piece of analysis code then this could favour that analysis code environment.

    • So I think we need community consensus on what the reference implementation looks like (the CF checker presumably is its first manifestation?) and what language/environment the reference implementation should be developed in. (either that or you need to explain to people like me what a reference implementation is).

  2. This may not be an issue at the moment but how do you sort out the details of transitioning to different funding models? If either of the funding models 'Institutional Subscription' or 'National Subscription' become viable then presumably there are issues related to establishing the relationship of the funding body to the institutions where the staff are hosted. I imagine the host institutions would be contractors to the funding body but what say would committee members or funding streams have in, say, staff recruitment, work planning, or management?

  3. They are the main concerns, what follows are some more random additions/comments.

    1. Main Text: Do you need to make it more explicit what funding needs to pay for (or is it sufficiently self evident to be unnecessary?) I could think of

      1. Staff time

      2. Infrastructure hosting costs (reference file repository, interface hosting etc)

      3. Outreach activities (travel, training/publicity, e.g. AMS, EGU,...)

        1. The role of outreach type activities is not really talked about explicitly: this has little baring in the governance structure, but could have implications for the roles the funded staff. I imagine both staff would need to do some of this.

    2. To your bullet point lists of recommendations for both committees is it worth adding the need for representation from different disciplines. The CF standards names staff member in particular will need a lot of support from their committee and if this committee has wide discipline representation it is going to help the staff member alot.

    3. Appendices:

      • 4. Ontologies and nomenclature: the problem of coordinate information contained in parameter names is already present internally in CF which recognises qualifiers like 'surface' but would also have a 1.5 m value represented by an unqualified standard_name, but with a singleton vertical coordinate. I can sort of see the logic in this - but it adds a layer of complexity to any code implementation.

      • 7. Discovery information: this is loosely supported through source attributes (CMOR imposes a convention on these to identify model and forcing). But of course you are right, this isn't currently systematic.

      • 10. ? Auto processing of data and standard name creation. When you pass metadata aware data types through processing you need to update the metadata to reflect the processing. This has two consequences in CF (that I can see)

        1. you can create standard names that are not on the standard name list e.g. by adding a transformation qualifier such as 'derivative_of_X_wrt_Y'

        2. after processing, even if you apply transformation qualifiers you create a name that has a currently available standard name alias (e.g. 'derivative_of_latitude_wrt_time' looks like 'northward_wind' - sorry not sure this is the best example because there should probably be some spherical geometry terms in there too.) Perhaps this isn't really an issue for data interchange since you can agree the names in advance and build this in to your code using rename mappings at the last stage. But if you want CF compliant general data processing then I think it is an issue. It has clear links to your ontologies and nomenclature point.

Response B

This is a great start. My main concern is to clarify governance / decision making.

  1. The WGCM/CF panel seems ok to me, though I dont know much about the WGCM, but I trust its an apporpriate body. Is there an executive board who is capable of appointing the panel, etc? The panel, as I understand, oversees governence but doesnt make technical decisions?

  2. I'm unclear exactly on who is on the 2 standing commitees. It appears that they are appointed by the panel (although in another place you mention "self-selected"). It might be useful to distinguish voting members vs non-voting members, since you would like to get maximum input, and allow people to think of themselves as members, but reserve voting rights to those who have a big enough stake not to be obstructive.

  3. Consensus normally means unanimity. For this to work, people have to be able to give up their own ideas for the sake of moving forward with good enough ideas. This is a norm that needs to be explicitly stated and encouraged. OTOH, there is the danger of mediocre "design by committee". Possibly you want to use consensus, but have an escape clause,where a super-majority (2/3, 3/4?) of votes cast can decide to proceed without consensus.

  4. The role of ad-hoc groups could be crucial. I would expect many new ideas would come out of smaller, motivated groups, possibly not part of the established commitees. These should be encouraged; I suppose their job is to create semi-formal proposals to be presented to the working committees.

I would advocate these kinds of governance issues be made into a concise, written document that the CF panel create as one of their first tasks.

Response C

The plans outlined in the paper look good to me.

There is an official working group on metadata within the PRISM Support Initiative (PSI). This group should keep a close interaction with the CF developments.

Reponse D

  1. Your white paper brings up some very excellent points to bring about the next level of standardization to the (NetCDF) community. ?I think this effort integrates well with the IOOS DMAC Expert Team on Metadata and would have impacts on other Expert Teams/Caucus (Modeling).

  2. This also has far reaching impact on the usability of OPeNDAP and clients that utilize the CF 1.0 standard; enforce the CF-1.0 standard (Live Access Server). ?A well vetted CF-2 version I see as a major step forward to bringing the community together (hopefully).

  3. I would throw my support behind this if IOOS were to add its support behind this effort and/or a potential. ?A combined effort between IOOS and Unidata would be excellent. ?I mention Unidata as a key contributor because of the existance of the UdUnits package they wrote many years ago.

  4. I would like to see usage of the udunits convention enforced in the definition for 'units'. ? In combination with the udunits package, data can be quickly crosswalked between various quantities. ?This will eliminate one of standards we need to come up with for the community.

Response E

Firstly I would like to congratulate you and the other authors on the development of the CF Convention. For a long time I have been uncomfortable with the existing NetCDF Conventions, in particular COARDS. My lack of comfort stems from the fact that the group I work with mostly do Coastal modelling, and many of our grids are curvilinear in nature. Most conventions I know of lack the structure to support such numerical grids, but CF appears to have this capacity. Your standard is also penetrating the tools area which is exciting.

I have a few comments I'd like to pass on to you:

  1. Some years ago, my then boss and I engaged in a discussion about Coordinate Conventions on the Unidata listserv. As part of the discussion we proposed a convention by which coordinates were bound to data variables through the use of attributes. At the time there was little enthusiasm for our suggestions, but we were pretty convinced of its merits so we ploughed on and developed our own convention. In many ways the convention is similar to CF, although there are many significant differences too (namely the standard name attributes).

    • I hope I am not giving you a 'me too' impression, because that is not my intent. CF is clearly a more structured, open and well developed convention then our own.

  2. I love the standard_name. It is a wonderfully simply but insanely powerful attribute. In our group we not only develop hydrodynamic models, but data processing and visualisation tools (in Java). These graphical tools have the capability to ingest multiple data sources, and without the standard_name, we have no generic way of knowing that, for example, the 'temp' variable in one data source is logically equivalent to the 'temperature' variable in another. The standard_name is the perfect glue to bind these variables.

    • Of course the standard_name is only as good as the ontology, and I suspect this needs a little work. Unfortunately, I foresee ongoing disagreements about what to call variables, especially uncommon variables. One thought I had is to associate a 'scope' with some standard_names. For example, variables that may only be used within an organisation (e.g. internal QC flags) could still use the standard_name attribute, but their scope would only make them internal to the organisation. The assumption is that these variables would never be published outside of their intended 'operational scope'. If however, they were published by accident, the 'scope' would still be useful as it would inform the recipient that the variable in question has no known meaning, and they should ignore it or proceed at their own peril.

  3. I am interested in you thoughts regarding stewardship of the standard and how it would be developed and funded. I, or members of my teams, would be happy to contribute in our 'spare' time, as well as contribute ideas. However, like you, I strongly agree that there needs to be a group that makes sure the standard stays on track.

  4. I'll be honest and say that the only area of CF that I don't like is how Map Projections are handled. I'm not sure if the convention I have seen is actually part of the standard, or a separate convention overlaying CF. The syntax I have seen defines the projection by using multiple attributes. This is a GeoTIFF like approach and, to my way of thinking, is cumbersome.

    • In our standard we too had to support Map Projections, but instead we adopted the USGS's projection convention as used by the PROJ 4 library, and also by other tools such as the GDAL imaging tools.

Response F

  1. Separate the namespace/controlled vocabulary stuff from the data model stuff. ?Offer to maintain the domain-specific namespace through Unidata. (I suspect it's not costly, and they could publish it as a serve and get some visibility for doing so.)

  2. Move the data models for things like irregular grids to OGC/WCS. ?I don't think OGC does domain-specific namespace stuff, but it is surely the place to talk about data models. ?Moreover, it'll get the private sector to consider creating software products that use it. I see this as an incredible advantage for the consumer.

  3. If OGC isn't the way to go, then they need to explain why. They've gone so far as to talk about mimicking their structure, but then to propose something new. ?This is the most serious deficiency in the manuscript.

  4. Unidata gets base NSF funding -- why shouldn't it go to this kind of activity? ?It seems to me that it is consistent with their mission and funding. ?To keep it from eating too much of their budget, let Unidata be the R&D test-bed development coordinator, and look to OGC as something to migrate toward as things mature. NSF should like that, and Unidata can look for more broad support at NSF to do so, if they want to extend outside ATM.

  5. Finally and in summary: As a (harsh) reviewer of this document, I'd say it's na?ve, arrogant and wasteful to suggest creation of a new organization for CF standards development and maintenance. That's not to say it's a bad idea, rather, the document just doesn't justify it.

No comments yet

Comments presently read-only.

This page last modified Thursday 11 September, 2008
DISCLAIMER: This is a personal blog. Nothing written here reflects an official opinion of my employer or any funding agency.