... personal wiki, blog and notes
Farewell - Move Over to my new blog
It is time to move on.
After a decade of blogging at this address, using the Leonardo blogging platform, time and job changes have caught up with me.
I've started a new blog at http://www.bnlawrence.net. I intend to migrate every thing that's worth taking from here, and then eventually closing this down.
Meanwhile, I have started blogging there, and intend to keep it up. Early in the new year I'll explain why this time I think I'll manage to get back into the flow.
Meanwhile it's goodbye from here, and hello from there!
by Bryan Lawrence : 2017/12/05 (permalink)
Science and the Digital Revolution: Data, Standards, and Integration
I was asked to give a talk at this CODATA meeting which was aimed at developing a roadmap for:
Mobilising community support and advice for discipline-based initiatives to develop online data capacities and services;
Priorities for work on interdisciplinary data integration and flagship projects;
Approaches to funding and coordination; and
Issues of international data governance.
I gave a talk on Data Interoperability and Integration: A climate modelling perspective. For this talk I was asked to address an example from the WMO research community on what we have accomplished in standardising a range of things, and reflecting on what has worked/failed and why. I wasn't given much time to prepare, so this is what they got: pdf (11.5 MB).
There were some other interesting talks, some of which resonated particularly with me, including
Bill Michener talking about Dataone, and the importance of governance activities where all lines of responsibility are clear: no committees or panels whose remit is unclear or whose reporting lines are non-existent.
Jeremy Fry talking about digital infrastructure in support of (physical) chemistry, who had some thought provoking comments on creativity, "engrooved" bad habits (harking back to a talk by Paul Trowler, which I have yet to get a copy of) and the potential role of AI in scientific discovery:
Also, human creativity seems to depend increasingly on the stochasticity of previous experiences - particular life events that allow a researcher to notice something others do not. Although chance has always been a factor in scientific discovery, it is currently playing a much larger role than it should. (The Atlantic, April 2017).
Robert Hanisch talking about experience of the IAU in developing technical infrastructure, and in particular the governance of technical components for the International Virtual Observatory Alliance. The IVOA attempts to follows the W3C governance model:
Data access protocols require two independent implementations.
Notes are promoted to Working Drafts, which become Proposed Recommendations, which get endorsed to become Recommendations, which eventually become standards although the did note that the formal endorsement step wasn't (yet?) in practice.
All communities reported difficulties in funding infrastructure at the sorts of levels necessary, and little or no investment in usability ... funders like prototypes ... but not boring maintenance and improvement ... which means that sometimes scientific take-up falls far below what was envisaged.
Unfortunately I was not able to go to days one and three of this three day meeting.
by Bryan Lawrence : 2017/11/15 (permalink)
Spent nearly the entire week (and certainly more than a "European maximum 48-hours") on things to do with ESIWACE (on the deliverable I mentioned last time, reporting for the EC, etc). I did have a fair chunk of one day on other things as also chaired a meeting of the advisory panel for our climate predictions for the copernicus climate data store project (CP4CDS, a project about deploying and maintaining sofware for a special ESGF data node to support climate services) - that took up the best part of day. However, that's pretty much it for the week. Hard to believe we're two months into the year - and I still haven't come up for air and managed to create any significant blogging time.
by Bryan Lawrence : 2017/02/27 (permalink)
Weeks six and seven
Two weeks to report this time, primarily because I had a day off on sick leave, and two days on real leave, so there's only seven weekday workdays to talk about (although I have just spent bit chunks of both yesterday and today, that is, sat and sun, on work as well).
Early on the main thing I was doing was trying to catch up on the ever increasing email mountain, but in week six, by the time I took out a day in London for a NERC Information Strategy Group meeting (mostly about the future of NERC data centres), a half day on technical futures for CEDA, another half day on my final CEDA monthly meeting and a follow up meeting on CEDA support for SPARC, and a day off on sick leave, that only left a few hours here and there to get much done. The net effect of course was that the email mountain grew.
To be fair, it wasn't so much that the email mountain grew, but the Nozbe task list grew. I did manage to process a lot of email, but quite a lot of things got thrown onto the pile for "later".
Then this last week (week seven), we had a few days on holiday down Dorset/Devon way. Just the couple of nights, but real recharge territory. However, back to work on Wednesday, and back to London - this time for an NCAS Science Strategy Board meeting, so only two real work days available.
So, in the work time, I did a wee bit of work on the ENES infrastructure foresight that I've talked about before, and quite a lot of work on the first deliverable we've got for ESIWACE: which is on requirements and business modelling for (weather and climate) data centres. Given the constraints on my time during the week, it's been a big effort on that this weekend too. I suspect quite a few things from that will turn up on my blog ...
... but anyway, that's the weeks that were.
by Bryan Lawrence : 2017/02/19 : 0 comments (permalink)
Space and Open Plan Offices
I was paying a bit more attention to twitter this morning than usual (I'm hoping I'll get some feedback on my analysis of citations that I posted yesterday). One thing that blew by was this headline the Washington Post:
"Google got it wrong. The open-office trend is destroying the workplace."
which took me back to something that I wrote in 2005, reporting on work done long before.
It seems that Google was ignoring history as well. Unlike them.
There's been lots more work done since the work I cited in that blog post, for example:
Workers in open-plan offices are more distracted, unfriendly and uncollaborative than those in traditional workplaces, according to the latest industry survey.
Employees who have to share their office with more than two people experience high levels of colleague distrust and form fewer co-worker friendships than those working in single-occupancy offices ...
"...the open-plan proponents' argument that open-plan improves morale and productivity appears to have no basis in the research literature."
Why so much data? Part I: The rise of direct numerical simulation
Over the last decade or so, many in the scientific community (especially in the environmental sciences) have been surprised by the increased cost of data handling - not only in absolute terms, but in terms of the percentage of the cost of doing "normal science".
This increase has been problematic on a number of fronts: not only do individuals not always plan appropriately for managing their data storage and handling, but even where they have, institutions and funding agencies have themselves been surprised and not always too keen to pay up. After all, in a world where there is very little new money in real terms (or even none), an increase in one part of the budget needs to be offset elsewhere. So, the real consequence of these increases in storage cost have to be decreases in the amount of science done (fewer staff/instruments/computers), and that's unpalatable even if it's unavoidable. At the moment many choose to think it is avoidable, which is quite a feasible position if you're an ostrich, but not so good if you're responsible for delivering science!
In this post, and maybe a couple to follow, I want to address why data handling is becoming a bigger deal in environmental science, and why we can't avoid spending more of money on it (but also how we can avoid spending more than we need to).
So, what are the factors in play? Well as well as background economics, there are four:
The direct influence of Moore's Law on instrumentation and simulation (finer resolution in space and time means more numbers),
The indirect influence of Moore's Law on what can be simulated (more compute means more things are computable),
The growth of interdisciplinarity (more things need to be compared and contrasted) and more people are doing it, and
The relationship between Moore's Law and Kryder's Law (is the cost of storage falling as rapidly as the cost of creating numbers to be storing is falling?).
Before delving into the technical details, I want to look at one of the underlying scientific trends that arises, partially as a consequence of the "indirect influence of Moore's Law": the rise and rise of direct numerical simulation, especially in the environmental sciences.
As noted above, more computability means more things are computable, and when we couple that to increased mathematical sophistication, more and more of the real word is amenable to direct numerical simulation: that is, it can be numerically simulated from fundamental equations rather than approximated by heuristics. The importance of this from a scientific point of view is that if one believes one is simulating the underlying processes properly, one can use the "simulation system" to predict how the system will behave under different circumstances than have been observed (either by coupling it into more complex systems or by using the system to predict past or future behaviour).
To some extent this is the holy grail of science: when one can simulate a system so well that one can't tell 1 whether one is observing a simulation or the real word, one can believe we understand that part of the real world.
For some time I have been asserting that an ever greater part of environmental science is engaging in direct numerical simulation (DNS), year on year. At the same time, I've been asserting that larger and wider communities were interacting around data, and especially around the data from model intercomparison projects, again, year on year. Of course, these assertions were not unrelated!
These assertions were based primarily on my interactions with the scientific community (one of the things about running an environmental data centre for much more than a decade is that my day job has involved interacting with individuals from across the scientific spectrum), and so one might dispute them. However, a year or so ago, I realised I might be able to get some quantitive information to support them by a bit of careful text mining. Unfortunately, I also realised I was never going to get the time to do it properly, so what follows is very amateur, but I hope still interesting. (If you have the skills and access to the data to do this properly, get in touch!)
The following figure is generated by spending a lot of time doing searches on Google Scholar (it would have been rather less time if Google didn't actively stop one doing this sort of work programmatically - I did try, and it worked fine until their "no robot code" stopped me in my tracks). Each point reflects the number of hits from a search on a specific set of terms from material restricted to two years. The terms were chosen to try and reflect five specific categories of interaction:
direct numerical simulation (across any discipline) - "dns" in the figure,
direct numerical simulation in the environmental sciences - "dns+env" in the figure,
model intercomparison projects - "mips",
use of satellite data in environmental science - "sats", and
regular observations - "sondes".
(The table at the end of this post gives the details of the exact searches carried out.)
The numbers in the legend are firstly the ratio of the last couple of points over the first couple of points - a measure of proportional growth, and secondly, the gradient from a fit to the number of hits per annum - a measure of absolute growth.
The figures back my observations rather nicely:
The number of papers on any facet of environmental science is growing. (No news there).
The number of papers using direct numerical simulation is growing rapidly, but the use of DNS in environmental sciences is growing even more rapidly, at least using the proportional measure.
The increase in papers which use MIP data is explosive, and one can see the direct influence of CMIP5 in the numbers.
Growth in observational science is slower than in numerical science, although the effect of increased availability of satellite data is apparent.
Obviously these conclusions could be heavily affected by the search terms I used, so if your mileage varies, let me know!
For the record, these are the exact (full text) searches used:
dns: search for the exact phrase "direct numerical simulation"
dns+env": as above, but require one of the following as well: cloud,rain,weather,climate,ocean,atmosphere,land,river,biogeochemistry,aerosol.
"sat": search for an exact match on "nadir sounder" (as a proxy for atmospheric satellite data only, and using that as a proxy for environmental science use of satellites in general).
"mips": exact match for model intercomparison project AND the use of the word simulation AND either atmosphere OR ocean.
"sondes": at least one of "radiosonde" or "dropsonde" appears (being a proxy for any sort of "traditional" observation - words like Lidar and Radar being too difficult to limit to environmental science, at least in this first cut at the problem).
by Bryan Lawrence : 2017/02/05 : 0 comments (permalink)
Nothing exciting to report. I spent nearly the entire week processing email, interacting with my team (both directly and on slack), and producing short things (e.g. a management level one pager on why NCAS should continue to support the CF conventions, and how). However, I did spend a wee bit of time trying to reinforce some of my observations around why storage is becoming a much bigger deal in environmental science. More on that next ...
by Bryan Lawrence : 2017/02/05 (permalink)
Not much to report for week four, since the work I did this week was mostly "processing email" - although we did submit the outline EPSRC bid, so one thing actually done (for now). The lack of anything else substantial was down to still being in the States in the early part of the week, coupled with a virulent bout of (probably) food poisoning which knocked me out for a couple of days.
by Bryan Lawrence : 2017/01/29 (permalink)
Quite a different week, for different reasons.
Firstly, the work. Up before the larks on Monday to take my daughter to school and then on by Eurostar to Paris. Monday through Wednesday this week was the final general assembly for the IS-ENES2 project: the second "Infrastructure (to support) the European Network for Earth System Modelling".
IS-ENES2 is a pretty important project to European climate science, although for a lot of folks it's invisible, but it has two important facets:
data-wise, for us, it's under-written our support for the CF NetCDF conventions, the support for constructing the data request for CIMIP6, and the entire es-doc initiative to document CMIP climate models and their simulations. IS-ENES2 has been the major supporter of the ESGF in Europe.
hpc-modelling-wise, it's underwritten work on devising plans for future model infrastructures, from workflow, couplers and model codes themselves. Looking forward much of that part is morphing into ESIWACE, but significant elements of support for current "production" climate science have been supported by IS-ENES2, and are not included within ESIWACE.
I had a couple of key roles in this meeting, which was held in a "room with a view" at the top of a tower in UPMC in Paris:
I've been helping coordinate a mid-term update to an ENES infrastructure strategy, and I've also been working on coding issues for future models, so I gave talks on both aspects.
Unfortunately I had to leave before the end of the Tuesday, and couldn't be there for the Wednesday. I had to head back to Blighty to work on the EPSRC bid, before flying to the states on Thursday. So Wed was mostly about the EPSRC bid, although I fitted in a bunch of other small things around that.
The US trip is a family thing, so I wont say much about that, but because i had a long day flight, and because although I need to be in the States, I don't need to be "off work" all the time, I've got a lot of other work done ...
... catching up on a lot of reading around data analytic futures suitable for JASMIN, as well as learning about some new software to use for information management. Sometime I'll blog about that too, but not now. I've also caught up on a lot of other bits and bobs.
I also managed a couple of long telcos on Friday, one on HPC futures around storage for the NERC community (tied up in the JASMIN funding I've been talking about), and one on making some measurements to help plan a migration away from parallel file systems to object store disk. That's a big story for another day too!
by Bryan Lawrence : 2017/01/23 : 0 comments (permalink)
Another week, another load of paperwork written ... another week that didn't feel much like science except for the Intel bit which at least was interesting ...
Pretty much the same topics as last week on my mind:
(Foresight) Last week I didn't spend any time on "Foresight", this week I did! This is the mid-term update to the European Network for Earth System Simulation (ENES) infrastructure strategy from 2012 (pdf). We held a meeting last year in October, and I'm coordinating the update, but it's been on the back burner because of other commitments. However, we're talking about it on Tuesday (i.e. in a couple of days), so I had to push on with it this week. I got a skeleton structure for a document done and discussed it with some colleagues.
(C3S Magic Lot2) I didn't spend any time on this last week either, but this week spent a couple of hours on it in the context of our (CEDA) CP4CDS contract with ECMWF to supply ESGF data to the Copernicus Climate Services project (Lot 1). This activity deserves it's own blog post and will get it when I come up for air ... but meanwhile just to say that this week was about the interaction between Lot2 (where I have a UoR involvement) which is about providing code to run in the climate services system which will be delivered by Lot1, and Lot1 itself.
(Chasm) spent a couple of hours today preparing a summary presentation of the outputs from our chasm workshop also held in October last year, for tomorrow. This is about the future of how we programme climate models and their infrastructure. It's not going to be easy!
(EPSRC Data Science Bid) A lot more time on that this week, impacts, objectives, updates to the outline, and some iterations around effort and finances.
(JASMIN Funding) Updated the brief for NERC with more details about the science programme consequences associated with the various financial and technical scenarios, and dealt with some of the consequential questions.
One new thing this week. Spent a day in Hamburg getting a restricted secret (!) briefing from Intel about their future plans. Very interesting stuff, none of which I can talk about, suffice to say I worry about programmability of next generation architectures (that's no secret, I worry about how we programme current architectures such as KNL, and our entire Chasm activity is about this issue ...). I think this is an oncoming train which much of the environmental science community is treating in the best traditions of ostrich escapism (collective heads in the sand).
As always on the Hamburg metro I notice how everyone seems just a bit more relaxed than the equivalent journey would be the UK. It might be just that there are less people in the carriage, but then I visit DKRZ and again people just seem less hassled, so it's more than that. I get the impression that they still actually fund new things in Germany, rather than just ask people to carry on doing the old things and do new things with no new money; and as a consequence people have sensible workloads unlike here (he says at the end of a 63 hour working week). Things seem to get done, and maintained ...
Of the other things I talked about last week, I progressed most of them in some small way as part of my normal email flow - I spent hours more on email this week ...
by Bryan Lawrence : 2017/01/15 : 0 comments (permalink)