4th MOBILISE Training School: “Next Step in the Digitisation Process of Natural History collections: Publishing of Biological, Geological, Palaeontological and Mineralogical data”.

Written by Olivia Beavers, Assistant Curator of Vertebrate Zoology at World Museum, National Museums Liverpool.

In December I was selected, along with 17 others, to attend the 4th MOBILISE Action Training School held in Brussels, 6–7th February. This training school gave an opportunity for students and professionals of Natural History Institutions from Europe and Israel to learn more about the publishing of our collections’ data sets. A crucial aim of the training school was to learn how to map data to the Darwin Core Standard, and as a result, create a Darwin Core Archive file to be uploaded to GBIF.

The Training School consisted of two parts: the first was an introduction to the group. This was conducted online and addressed theoretical issues associated with our datasets. Part 2 was the two-day, face to face trip to Brussels to check the cleaning and validation of our data for it to be ready for publishing on GBIF (for Biological Data) or GeoCASe (for Geological, Palaeontological and/or Mineralogical data).

A group photo of the 21 attendees and group leaders at the 4th MOBILISE Action Training School, Brussels
Figure: 1 A group photo of the attendees and group leaders at the 4th MOBILISE Action Training School, Brussels ©Katerina Voreadou

Brussels Face to Face

Most of Day 1 was spent mapping the data using a software called OpenRefine and getting it ready to create the Darwin Core Archive file. As many of us are aware, cleaning datasets to get them ready for publishing on pages like GBIF can be quite time consuming. The first step was to extract our own datasets, in my case bird skins from Jamaica, and save the file into Microsoft Excel or CSV format. We imported our data into OpenRefine to edit our Excel files. In OpenRefine, we checked our data’s column headings and their values and compared them to that of the Darwin Core column headings/terms. Most of the columns contained the correct data – and just needed a little polishing.

Photo of the group working on their laptops creating the datasets
Figure 2: The group working through datasets and creating Darwin Core Archive files ©Katerina Voreadou

There were 5 main columns that were essential to include: Scientific name, eventDate (i.e. date collected: YYYY-MM-DD), basisOfRecord (PreservedSpecimen, HumanObservation, FossilSpecimen, LivingSpecimen, MachineObservation), kingdom (Animalia, Plant, Fungi) and location (latitude and longitude). It was mostly a case of changing the header names to match the Darwin Core terms (including lower and upper case), so that the terms could be recognised later on in GBIF.

Some historic specimens were different, as they didn’t have GPS coordinates to the exact location where the specimen was collected. In this case there were headers such as: verbatimLocation or verbatimDate – these allowed us to include information like ‘Spanish Town’, ‘Before 1856’ or ‘Between 1841/1869’ without causing an issue in the software, due to the date not being formatted as YYYY-MM-DD, for example. We were encouraged to fill in other columns that were provided to map as many Darwin Core key terms as possible and get a better validation reading at the end. Once this data- set was mapped and saved, we had a Darwin Core Archive File.

Once we had the Darwin Core archive file, we could upload it from OpenRefine to the GBIF IPT. We then edited the metadata page where we added information like a Dataset Title, a description or abstract, Geographic coverage etc before publishing to GBIF. Once the file was uploaded, we could update to a newer version by reuploading an updated file.

We could check how GBIF rated our Darwin Core Archive files by dragging the file into the validator tab. This told us how to improve our dataset by looking at which Darwin Core terms were missing that we might want to include. This allowed us to see if we had valid Darwin Core data.

Picture of a sculpture of a dinosaur outside the museum building
Figure 3: Outside the Royal Belgian Institute of Natural Sciences, Brussels ©Olivia Beavers

The rest of the afternoon was spent having a look around the very impressive Royal Belgian Institute of Natural Sciences, where the course was held. As an emerging professional, it was really useful to get an insight on how data is published, Darwin Core Archive files and how to improve the quality of data that we send off to be published. The course was funded by COST Trainee Grants and at the end of the Training School, we were awarded with the MOBILISE COST Action Certificate with ECVET credits (European Credit system for Vocational Education and Training).

Thank you to all of the trainers and organisers who helped to facilitate this successful course. 

If you would like to know more about MOBILISE Training schools and upcoming events, check out their webpage for more information: https://www.mobilise-action.eu/training-schools/. Future courses will be posted here:  https://cetaf.org/dest/upcoming-courses/

2 thoughts on “4th MOBILISE Training School: “Next Step in the Digitisation Process of Natural History collections: Publishing of Biological, Geological, Palaeontological and Mineralogical data”.

  1. Pingback: NatSCA Digital Digest – March 2023 | NatSCA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s