The RBGE Herbarium and citizen research

Since 2017 the RBGE Herbarium has enlisted the help of volunteers to undertake the transcription of collection label information from herbarium specimens.

We are now looking for volunteers to take part in a new phase of our citizen research on these specimens. This is an exciting opportunity to contribute to understanding of the Herbarium collections.

Visit the Zooniverse website to get involved!

Specimen stories help inform conservation priorities

A herbarium specimen of Codonoboea lanceolata (C.B.Clarke) A.Weber from the Malay Islands herbarium region. There is a pressed plant attached to white board. In the top right-hand corner there is a capsule. At the bottom-right of the specimen board is the collection label. To the right of the specimen there is a colour chart and scale bar. There are also two other labels on the specimen that were added later by botanists carrying out species identification research.
A herbarium specimen of Codonoboea lanceolata (C.B.Clarke) A.Weber from the Malay Islands herbarium region.

Herbarium specimens are pressed plants, mounted on archival board alongside a collection label. This label contains information on where and when the specimen was collected and who collected it.

When collection label data from specimens are made available digitally, they can shed light on species distribution and how and why these change over time.

So far, a dedicated set of volunteers have participated in 65 digital expeditions, transcribing 67,661 of our specimens using the platform Digivol (a joint collaboration between Australian museum and the Atlas of living Australia, a CSIRO hosted NCRIS facility). The digital expeditions have focused on plant families from Australia, the British Isles and Ireland, and Myanmar.

Choosing your own pathway through a project

To date, citizen researchers working on our expeditions have been shown transcription tasks in a random order. Our new citizen research project hosted by Zooniverse,The RBGE Herbarium: Exploring Gesneriaceae, the African violet family’, allows volunteers to choose the herbarium specimens they want to transcribe due to the development of a new indexing tool (read the Zooniverse blog to find out more about the tool’s development).

We hope this use of the indexing tool will provide a more engaging experience for participants by allowing the possibility to follow in the footsteps of a particular botanist or a species over time and space.

This project was made possible through our participation in Engaging Crowds: Citizen research and heritage data at scale a collaboration between The National Archives, the Royal Botanic Garden Edinburgh, the National Maritime Museum and Zooniverse. Engaging Crowds is a foundation project within the AHRC-funded Towards a National Collection Programme.

Building on our digitisation pipeline

Engaging Crowds offered a unique opportunity to combine citizen research with our pre-existing digitisation pipeline which processes our digital specimen images through Optical Character Recognition (OCR) software.

In house we have found that compared to unsorted, random specimens, those which are sorted based on data from the OCR output are quicker to digitize, particularly when filtered by Collector/Botanist and Country, as transcribers become familiar with local geography and collector handwriting (Haston et al. 2014). At present the OCR server we use only reliably reads typed text and so label headers containing places and people are searched to create specimen batches. Most of the collection label information on our older specimens is handwritten and needs to be transcribed by people.

A word cloud in greens and browns of words found in the OCR text of herbarium collection labels from West Asia, Egypt and the Arabian Peninsula. The frequency with which the terms occur on labels is reflected in the size of the word. It is dominated by the words ‘Miller’ ‘Arabia’, ‘Date’, 'Det', and 'Habitat'. The words 'Name', 'Leg' (short for legit or collector in latin) and 'locality' were also frequently used. Smaller words include place names, habitat descriptors and months of the year.
A word cloud of the OCR collection label output data taken from herbarium specimens collected in West Asia, Egypt and the Arabian Peninsula.

We are using the filters based on OCR text to create geographical subject sets and within these collector indexes for our Zooniverse project. The indexing tool uses this information to allow volunteers to choose which specimen they transcribe.

Top left: a screenshot of the subject set choice panel where volunteers choose a herbarium specimen set. These are grouped by geographical regions of the world. In the screenshot 4 sets are visible: Inner China, Korea and Taiwan, India Bangladesh and Pakistan, Malay Islands and Southern Africa. An arrow goes to a second screenshot (bottom right) which lists the specimens within the Malay islands subset. There are red boxes around the headers Botanist and Species indicating where volunteers can either sort alphabetically or search columns. Behind the screenshots are sections of herbarium specimens showing collection labels with typed headers.
Image showing how volunteers first choose a subject set for a particular region of the world (top left) and can then search and sort by botanist or species (bottom right). Behind these screenshots are examples of herbarium collection labels with a mixture of typed and handwritten information.

Unlike Digivol where most of the label is transcribed as a single task, Zooniverse project volunteers will transcribe selected information from these collection labels through a series of short workflows.

We have chosen our Gesneriaceae specimens for this project. It is a pan-tropical and species diverse family of ecological importance and one of our research-focus families, with in-house research staff working on the classification, naming and identification of species within it.

Once complete the data transcribed by volunteers will be available on our online catalogue and the Global Biodiversity Information Facility (GBIF), which is an international organisation that focuses on making biodiversity data available from institutes across the globe on a single portal.

Get involved

If you’d like to try out this new project, please visit the Zooniverse site.

No previous experience is necessary, and we welcome everyone who might like to get involved. We’d be hugely grateful for your time.

Our Zooniverse project is the third to launch as part of the overarching Engaging Crowds project. You can also find and take part in HMS NHS: The Nautical Health Service from the National Maritime Museum and Scarlets and Blues from The National Archives, both of which also provide transcriber choice via the Zooniverse new indexing tool.

References and Resources

Towards a National Collection

https://www.nationalcollection.org.uk/

Engaging Crowds Website

https://tanc-ahrc.github.io/EngagingCrowds/

The Zooniverse Indexing Tool Blog Post

https://blog.zooniverse.org/2021/11/03/engaging-crowds-new-options-for-subject-delivery-interaction/

RBGE Zooniverse Project

https://www.zooniverse.org/projects/emhaston/the-rbge-herbarium-exploring-gesneriaceae-the-african-violet-family

The National Archives Zooniverse Project

https://www.zooniverse.org/projects/bogden/scarlets-and-blues

The National Maritime Museum Zooniverse Project

https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service

The RBGE Herbarium Catalogue

https://data.rbge.org.uk/herb

The Global Biodiversity Information Facility

https://www.gbif.org/