... personal wiki, blog and notes
Quite a different week, for different reasons.
Firstly, the work. Up before the larks on Monday to take my daughter to school and then on by Eurostar to Paris. Monday through Wednesday this week was the final general assembly for the IS-ENES2 project: the second "Infrastructure (to support) the European Network for Earth System Modelling".
IS-ENES2 is a pretty important project to European climate science, although for a lot of folks it's invisible, but it has two important facets:
data-wise, for us, it's under-written our support for the CF NetCDF conventions, the support for constructing the data request for CIMIP6, and the entire es-doc initiative to document CMIP climate models and their simulations. IS-ENES2 has been the major supporter of the ESGF in Europe.
hpc-modelling-wise, it's underwritten work on devising plans for future model infrastructures, from workflow, couplers and model codes themselves. Looking forward much of that part is morphing into ESIWACE, but significant elements of support for current "production" climate science have been supported by IS-ENES2, and are not included within ESIWACE.
I had a couple of key roles in this meeting, which was held in a "room with a view" at the top of a tower in UPMC in Paris:
I've been helping coordinate a mid-term update to an ENES infrastructure strategy, and I've also been working on coding issues for future models, so I gave talks on both aspects.
Unfortunately I had to leave before the end of the Tuesday, and couldn't be there for the Wednesday. I had to head back to Blighty to work on the EPSRC bid, before flying to the states on Thursday. So Wed was mostly about the EPSRC bid, although I fitted in a bunch of other small things around that.
The US trip is a family thing, so I wont say much about that, but because i had a long day flight, and because although I need to be in the States, I don't need to be "off work" all the time, I've got a lot of other work done ...
... catching up on a lot of reading around data analytic futures suitable for JASMIN, as well as learning about some new software to use for information management. Sometime I'll blog about that too, but not now. I've also caught up on a lot of other bits and bobs.
I also managed a couple of long telcos on Friday, one on HPC futures around storage for the NERC community (tied up in the JASMIN funding I've been talking about), and one on making some measurements to help plan a migration away from parallel file systems to object store disk. That's a big story for another day too!
by Bryan Lawrence : 2017/01/23 : 0 comments (permalink)
Another week, another load of paperwork written ... another week that didn't feel much like science except for the Intel bit which at least was interesting ...
Pretty much the same topics as last week on my mind:
(Foresight) Last week I didn't spend any time on "Foresight", this week I did! This is the mid-term update to the European Network for Earth System Simulation (ENES) infrastructure strategy from 2012 (pdf). We held a meeting last year in October, and I'm coordinating the update, but it's been on the back burner because of other commitments. However, we're talking about it on Tuesday (i.e. in a couple of days), so I had to push on with it this week. I got a skeleton structure for a document done and discussed it with some colleagues.
(C3S Magic Lot2) I didn't spend any time on this last week either, but this week spent a couple of hours on it in the context of our (CEDA) CP4CDS contract with ECMWF to supply ESGF data to the Copernicus Climate Services project (Lot 1). This activity deserves it's own blog post and will get it when I come up for air ... but meanwhile just to say that this week was about the interaction between Lot2 (where I have a UoR involvement) which is about providing code to run in the climate services system which will be delivered by Lot1, and Lot1 itself.
(Chasm) spent a couple of hours today preparing a summary presentation of the outputs from our chasm workshop also held in October last year, for tomorrow. This is about the future of how we programme climate models and their infrastructure. It's not going to be easy!
(EPSRC Data Science Bid) A lot more time on that this week, impacts, objectives, updates to the outline, and some iterations around effort and finances.
(JASMIN Funding) Updated the brief for NERC with more details about the science programme consequences associated with the various financial and technical scenarios, and dealt with some of the consequential questions.
One new thing this week. Spent a day in Hamburg getting a restricted secret (!) briefing from Intel about their future plans. Very interesting stuff, none of which I can talk about, suffice to say I worry about programmability of next generation architectures (that's no secret, I worry about how we programme current architectures such as KNL, and our entire Chasm activity is about this issue ...). I think this is an oncoming train which much of the environmental science community is treating in the best traditions of ostrich escapism (collective heads in the sand).
As always on the Hamburg metro I notice how everyone seems just a bit more relaxed than the equivalent journey would be the UK. It might be just that there are less people in the carriage, but then I visit DKRZ and again people just seem less hassled, so it's more than that. I get the impression that they still actually fund new things in Germany, rather than just ask people to carry on doing the old things and do new things with no new money; and as a consequence people have sensible workloads unlike here (he says at the end of a 63 hour working week). Things seem to get done, and maintained ...
Of the other things I talked about last week, I progressed most of them in some small way as part of my normal email flow - I spent hours more on email this week ...
by Bryan Lawrence : 2017/01/15 : 0 comments (permalink)
So this is the end of week one of 2017. What did I spend the week doing?
Well, to answer that, I'm going to start by going back a couple of years: back then, in Jan and Feb 2015 I was looking at Pagico as a "getting things done" (GTD) tool. I ended up not choosing Pagico, but I did within a few weeks settle on using nozbe ...
... and I'm still using it, nearly every day! The picture above is a screenshot of my active projects this afternoon, most of which arose from deadlines from last year.
Without (today) getting into the details of how I use Nozbe, I'll just say that this was the list of projects which I expected at the beginning of the week that I might want, or need, to be working on during the week. (There are loads of other projects I'm working on, or need to work on, that are managed in my Nozbe, but I didn't expect to have to put a lot of work on them this week, although the odd task did come up and get done). Of course, in the best traditions of what Harold McMillan might have said, and Helmuth von Moltke did say, my plans don't always survive the inbound email and the demands therein, but you have to start somewhere ...
As it happened, I didn't need to work on all of those things this week (LTS), or didn't find the time (C3S Magic Lot2, chasm and foresight, so I'll explain those a bit more when I do have something to say) - which in the case of chasm and foresight will of necessity be after next week, because things have to be done on those next week.
So this week, I did spend time on
(Reorganising Bryan) Early Monday morning I put the finishing touches on a proposal for how my job description should be rewritten (I'll have a lot more to say about this another day, but this is part of the "consequences" I alluded to in my last post).
(NC Commissioning of CEDA) We (CEDA) had been asked to produce a five page vision statement on the strategic need for NERC's national capability spending on data management and it was due on Friday. My colleagues had produced some bullet points I might want to consider in this document, and on Monday (yes, that public holiday Monday) and Tuesday I produced the document in time to circulate it to colleagues late Tuesday for feedback on Wednesday. I finished it and submitted it on Thursday. This is part of a large body of work which has been carried out over the last couple of years to allow NERC to change the way it commissions its data management support. There will be more to do on this before the new commissioning is complete ... but for the moment I have no active tasks in this project. Bliss.
(HPC Replacement) I am responsible for a grab bag of activities associated with HPC replacement for NERC. This week the major task was producing a draft summary for NERC of options to share a new HPC platform with the Met Office to replace the existing MONSOON machine which is due to be turned off at the end of March. I sent the new draft off to Met Office colleagues on for fact checking ... (MONSOON is a shared HPC platform to allow model development under the auspices of the Joint Weather and Climate Research Programme.)
(CF) NCAS invests considerably in supporting the Climate and Forecast conventions for NetCDF. As part of preparing for commissioning that activity within Long Term Science (LTS, so now you know what that acronym means), key NCAS staff had a three hour meeting on Wednesday morning. It was a wide ranging discussion, covering many things, and leaving us all with some actions to help support CMIP6, as well as deal with the commissioning activity.
(EPSRC Data Science Bid) I am leading a bid from the University of Reading, STFC, and NCAS, for EPSRC funding for data science. At this stage we need an outline bid. I finished that off and circulated it to colleagues as well as had several physical and virtual conversations about it.
(PhD supervision) I try to meet at least weekly with my PhD student. Sometimes I have to do things, sometimes I have to chase up on things. This week was really easy, I just had to sit and listen to the cool things he'd done since I saw him before Christmas.
(CF Data Model Paper) We have a paper trying to clarify a common understanding of some aspects of CF nearing completion. I didn't have anything I needed to do at the beginning of the week, and the outstanding tasks might not ever need doing, but I still spent some time talking about it with my co-authors and contributing to some new diagrams - it had to be a priority because as one gets really near submission it's important that we can discuss things together while we all have the details at our fingertips (i.e at the top of our mental stacks).
(JASMIN Funding) In many ways the most significant thing I have done in recent years is instigate and lead on the delivery of the JASMIN supercomputer. The big issue at the moment is hardware replacement and the necessary upgrades to storage which follow from everyone's pesky habit of creating more data. This week I've been producing technical and financial scenarios associated with a range of possible funding futures. This is tedious but necessary stuff, and took most of Thursday, Friday, and a few hours today (Sunday).
I also processed a lot of incoming email. "Processed" in the sense that I did one of the following things with all of it (thanks to Nozbe I have nearly email inbox zero):
I deleted it.
I filed it (in evernote, or in gmail archive)
I responded to it (which may or may not have required some work)
I put it in Nozbe to do at some future time.
Not a very exciting week, but a lot of important (I think) stuff done. I hope that some of the thing I have to say in future blog posts will be more interesting ...
by Bryan Lawrence : 2017/01/08 : 0 comments (permalink)
Back to the Future - I think therefore I WILL blog
(with apologies to Descartes :-)
It looks like I haven't written a blog posting for more than eighteen months. That's really sad on a number of levels, I think it reflects a combination of my workload in terms of volume and content.
I want to get back to blogging. I am a scientist (still, just), I should be communicating about what I'm doing because that'll help me do it better (and help society get value from its investment in me - that might be a bit of a pompous statement, but it's true I think).
Much of what I have done in the last year has been writing various management documents about data, HPC, and finances. I have done some other bits, but they've either been as part of things that have actually turned up as papers or talks (if you look at my publication list and talks page you'll see that there has been activity, even though my blog has been a bit empty), or as things that I haven't felt I could publicise because of the interests of other people (my student, my colleagues in a couple of projects etc).
On top of those content issues, there have been volume issues. Much of what I have done has been against deadlines, often with little warning, and coming from multiple directions at once. I kept thinking I would get a breather, but it hasn't turned out that way, and so the entire year I was under pressure, feeling knackered, and lots of things had to give, and the blog was one of them. If nothing else I could and should have blogged about the papers and talks, but even that seemed too much.
Clearly the workload thing is problematic (and there have been consequences which I hope to discuss here anon) but not blogging is problematic in its own right. I think the lack of blogging has been detrimental to the delivery of my job itself. I believe that when I was blogging it was good for communication, it helped me learn, and it helped me organise my thinking about things that have acutally ended up in papers and production services (at CEDA etc). I have missed out on all those things by not blogging, although some of the management work has been surprisingly good for helping me organise thinking, just as some of it has been completely nugatory. (I should blog about the good and bad of my recent management experiences sometime, there were some lessons worth sharing!)
It used to be that I also used this place for notes, but it's clear that that role has been supplanted by evernote, project wikis (closed and public), and ipython notebooks (I can't get used to saying "jupyter", is that an age thing?). In the future I'd like to host notebooks alongside my blog, and include some blog articles as notebooks, but that'll require a technology change ...
So, in the best traditions of new years resolutions, I have committed to myself to try and get at least one blog article out a week (except when on holiday) - if nothing else I'll try and give a bland summary of the week that was. Look out for week one coming here soon!
by Bryan Lawrence : 2017/01/07 : 1 comment (permalink)
playing with docker
From time to time I get a very short opportunity to try and do some science, and I find the context switching harder and harder. To that end, I want to make more use of ipython-notebook.
Nowadays my compute environment is a macbook pro running mavericks, and I have two VMS built: a JASMIN analysis platform (JAP) image (based on Centos, for science) and a Linux Mint image (primarily to give me a route to reliable LibreOffice - unlike the version running on the Mac). Both can run ipython notebook, but I couldn't work out how to get that visible to browsers running in my Mac environment (which is what I wanted this time).
I could probably have worked that out, but I thought, it's about time I got some hands on experience with docker, since everyone is raving about it, and folks in my team are also starting to use it ... so why not try that route?
Herewith a couple of hours on a Saturday afternoon and another hour or so on a Sunday morning:
I got boot2docker working, and then I thought I'd try out the continuumio anaconda image ... but I immediately discovered that it didn't have netcdf4 and basemap by default (and that matplotlib was broken), so herewith my first dockerfile:
# aims to run basemap, and eventually, cf-python # the anaconder base image is itself based on debian FROM continuumio/anaconda MAINTAINER Bryan Lawrence <email@example.com> # as of May 10, 2015 the base image needs this to work with matplotlib: RUN apt-get -y install libglib2.0-0 # now the stuff we want RUN conda install netcdf4 RUN conda install basemap
I was able to build that using
docker build -t bnlawrence/cfconda .
and run it using:
docker run -it -v $(pwd):/usr/data -w /usr/data -p 8888:8888 bnlawrence/cfconda
(from within a directory on the mac where I wanted my notebooks to reside.) I then have to run
ipython notebook --ip=0.0.0.0 --no-browser
inside the container, whereupon, as if by magic, I can access my notebooks on the mac at
(I would have liked to have run the notebook directly on the end of the docker run statement, but when i do that, the notebook kernel seems to be really unstable and repeatedly crash. I don't know why.)
Now, hopefully I can start using ipython notebook during my working week ...
by Bryan Lawrence : 2015/05/10 : 0 comments (permalink)
A citation and provenance system for climate modelling
What would a modelling citation and provenance system need to do?
We start from the assumption that I will be accessing files from a local "copy" of some files of data, and that I have a subset of files that I've used for a particular problem.
So, I have to describe that compendium of data, which means I need a tool which identifies which data I used ... It needs to be able to do something notionally like:
makecite "list of files" > provenance.list
What's actually in provenance.list should be a list of permanent identifiers to data actually used, not the data itself.
I expect I will want to cite this provenance.list in my publication, so the provenance list itself should be a (published) dataset, with an identifier. So, there needs to be a way of describing and publishing my provenance.list.
Now, you reading my paper, need to be able to obtain and use that provenance list. Assuming my provenance.list has a DOI, let's assume getting it is straightforward (it should be small).
Now you need a tool which allows you to use the provenance list to get the relevant data or check that you already have it, something like:
which should result in a set of files , or
might confirm that you have those files. Alternatively (or additionally)
might give you (or me) an updated set of versions for the same datasets ...
That user story is very file-centric. We could probably make it more "data-centric" by, for example, including opendap urls to bounding boxes, but as it stands it's very simple, and hopefully doable (none of these tools actually exist!)
This story doesn't address credit, but it does address scientific repeatability!
So what to do about credit? We could of course pull out of the list of permanent identifiers a list of contributing simulations.
What to do with it? Do we believe it will be possible to go from those simulation identifiers to appropriate "traditional papers"? In principle yes, in practice no. We can expect to do this exercise before the appropriate formal scientific model and simulation description papers have even been written!
So, can one use "data" DOIs? It rather depends on whether we believe an appropriate data publication system is in place and on an appropriate granularity. However it too may not be in place when the citation is necessary.
However, that's a very traditional way of thinking, that we have to show the modelling group credit by putting a traditional citation to them in my paper. If one has a more altmetrics focus, we should be happy that the metrics can be calculated, we don't have to have the right way of doing it a priori!
by Bryan Lawrence : 2015/03/02 : 1 comment (permalink)
Pagico Experience at week one.
Ok, I promised to report my experience with pagico at week one.
I really like the dashboard view, and the must do and might do lists ... I found it a really good way of thinking about what I need to do next. However, the bottom line with Pagico is that it's harder than I'd like to get information into Pagico, and to move between Pagico's view and other views of information (in particular, Evernote).
Email integration: I did go ahead and put my gmail through apple to investigate. I can drag and drop an email from apple mail into Pagico, but it ends up as being an item in a collection of items for the Project. So, if I understand it correctly, if I want to postpone answering an email because it'll take more than a couple of minutes, the expected workflow is: drag and drop it into a project collection, then create a task, then give it a due date. Not really frictionless, I really wanted emails to become tasks with minimal intervention.
However, it looks like there is a better option for (apple/outlook) mail integration if you have mailtags installed. I don't, but I suspect I would if I lived in Apple mail land. (Actually, if you live in Apple email land, and email is the main issue for you in terms of GTD, mailtags might be of interest in it's own right).
Evernote integration is simply via the ability to drag and drop links, which are to the browser version, and always open it in a tab which requires logging in. Compared to competitors, this isn't really integration.
Drag and drop feels incomplete. In some views, I can easily drag files onto tasks, but in other places, come what may, I can't help dragging them onto the parent project. I found that frustrating ...
So, my final feeling with Pagico - at the moment - is that it has a really good interface for task management, but until they fix email and evernote integration properly, and deal with the drag and drop issues, the friction of getting information into Pagico is just too high. I'm going to look elsewhere (and hope I can get all my tasks out of my trial version of Pagico). I could easily be persuaded to come back if they sorted the evernote/email integration.
So, I'm still inspired enough to keep on with GTD tools, so I expect there will be more to report anon.
by Bryan Lawrence : 2015/02/01 : 0 trackbacks (permalink)
Pagico and getting things done
Getting Things Done. GTD.
Why? Well, I'm continually feeling hassled by the number of things that I'm trying to keep track of, the size of my inbox etc, so a good GTD tool has always been something I've been looking for.
Years ago I used Remember the Milk (successfully, for some months, but in the end it it couldn't deal with the complexity of information I wanted to store in it). I've used a range of notes tools, and I'm currently using Evernote. I use it for everything of course, but for GTD, I just use a weekly to-do list with check boxes, but it doesn't organise things ... and I find I ignore the reminders ... so is there something better out there?
Well, Pagico on Eric's screen certainly looked like it. I particularly liked the someday tasks, that show up for tomorrow. I liked the "dashboard" (pseudo-Gantt) ... and when I read about it I liked the idea of Evernote and email integration. So, as i said, I spent some hours with it. Of course, in doing so, I did a bit of googling... and started wondering whether Pagico was really what I want.
However, meanwhile, in just trying to work out how to use Pagico, I did some really useful thinking about how to organise my workflow into task lists, tasks, projects and collections. I (manually) moved a bunch of emails into Pagico (and archived them in Gmail). I archived everything else. I achieved inbox zero for the first time in, well, it seems like forever (certainly at least a year). The lesson I take from that particular exercise is that the "organise" part of GTD is incredibly important, and probably independent of the tool (provided it has at least three levels of hierarchy).
Experience with Pagico itself? Well, I had some little glitches I didn't like, so on Saturday I wrote to the developer. On Sunday I had a reply, I replied, he replied. Blimey, that's responsive (and I made it clear I was only using the trial version and might not buy). Sounds like he'll fix some of the things I didn't like/wanted. Blimey again.
At this point I have a lot of actions in a few projects. We'll see how the week goes, but I'm already a bit disappointed in that the evernote integration is weak - one only gets to drag a web-link in, so it doesn't work with native (mac1) application I use. Also, my email is in gmail. I haven't found a way of marking an email as a task, or how to drag an email into Pagico. For me that's absolutely crucial. I suppose I could load gmail into the Mac email app (from where d&d apparently works), but I am rather partial to google's filtering into priority inbox etc ... (but maybe I don't need that with a good GTD tool).
So, just this few hours of playing with Pagico has made me realise that: I do need a really good GTD tool, even the thinking that this GTD tool made me do was useful, but it needs evernote, email and calendar integration. Oh, and it has to have a good android interface on phone and tablet. Will it be Pagico, or will it be something else?
As I said above, I did some googling, and in doing so I discovered that the whole GTD world has moved on a lot since I last paid attention. purplezengoat, for example, has a very interesting list of tools to think about. I intend to think about them, possibly concentrating on iqtell and zendone. But I'm going to give Pagico a week. Stay tuned.
by Bryan Lawrence : 2015/01/25 : 1 trackback : 2 comments (permalink)
I was honoured by the informatics section of the American Geophysical Union this year by being awarded the Leptoukh Lecture. The abstract and talk itself are on my talks page.
by Bryan Lawrence : 2014/12/18 (permalink)
Building your own JASMIN Virtual Machine
I make a good deal of use of the JASMIN science virtual machines, but sometimes I want to just do something locally for testing. Fortunately you can build your own virtual machine using the " JASMIN Analysis Platform" (JAP) to get the same base files.
Here's my experience building a JAP instance in a VMware Fusion virtual machine (I have a Macbook, but I have thus far done all the heavy lifting inside a linux mint virtual machine ... but the JAP needs a centos or redhat base machine, hence this).
Step One: Base Virtual Machine
We want a base linux virtual machine on which we build the JAP.
Start by downloading a suitable base linux installation (Centos or RedHat). Here is one I got some time ago: CentOS-6.5-x86_64-bin-DVD1.iso
From VMware fusion choose the File>New Option and double click on the "Install from Disc or Image" option and find your .iso from the previous step.
Inside the linux easy install configure your startup account
You might want to configure the settings. I chose to give mine 2 cores and 4 GB of memory and access to some shared folders with the host.
Start your virtual Machine.
(Ignore the message about unsupported hardware by clicking OK)
Wait ... do something else ...
(This is a good place to take a snapshot of the bare machine if you have the available disk space. Snapshots take up as much disk as you asked for memory.)
Step Two: Install the JAP
Following instructions from here. There are effectively three steps plus two wrinkles. The three steps are: get the Extra Packages for Enterprise Linux into your config path; get the CEDA JAP linux into your config path; and build. Then the wrinkles: the build currently fails! However, the fixes to make it build are pretty trivial.
Open up a terminal window and su to root.
Follow the three steps on the installation page, then you'll see something like this:
--> Finished Dependency Resolution Error: Package: gdal-ruby-1.9.2-1.ceda.el6.x86_64 (ceda) Requires: libarmadillo.so.3()(64bit) ... Error: Package: grib_api-1.12.1-1.el6.x86_64 (epel) Requires: libnetcdf.so.6()(64bit) ... Not found You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest
But never fear, two easy fixes are documented here. You need to
Force the install to use the CEDA grib_api, not the EPEL version, You do that by putting
at the end of the first (EPEL) section in the /etc/yum.repos.d/epel.repo file, and
Add the missing (older version of the) armadillo library by downloading the binary rpm on the ticket and installing it locally, then you can redo the final step:
yum install jasmin-sci-vm
And stand back and wait. You'll soon have a jasmin-sci-vm.