Camellia yunnanensis

Camellia yunnanensis

George Forrest was first sent to China in 1904 by the Regius Keeper Isaac Bayley Balfour. On this and six subsequent expeditions Forrest collected prolifically in NW Yunnan, SW Sichuan, SE Tibet and NE Upper Burma. The result was vast quantities of seed for a variety of British garden owners and firms, and specimens (over 30,000 numbers in multiple sets) for the RBGE herbarium, especially in the genera Rhododendron and Primula.

We have now databased 9,595 of these specimens. These can be found on our website by searching by the collector’s name which was included when they were databased. However for some aspects of our work, we database the specimens with a very minimal level of information – the region from where the specimen was collected and the name we currently file it under. This fast process allows us to make a large number of our specimens available through our website.

To allow staff and volunteers to find these minimally databased records we have developed methods which will allow them to pull up the record and view the images to add further data to the specimen records.

When an image of a specimen is imaged, a copy of that file is sent to a piece of software which performs Optical Character Recognition (OCR) – it looks at the image, reads what it can and saves a copy of any text it finds in a database. We can later search this text to try and pull out collector names, countries and other information which may be present on a label. This is currently limited to printed text, as hand-writing recognition is a difficult task, and reliable software for this is still under development. Luckily most Forrest specimens have a standard printed label with Forrest’s name and general region where the collection was made. This means that we have been able to find a further 541 specimens which have been imaged, by searching for the words “George” and “Forrest”, but not currently discoverable or findable on our website.

Word Cloud

Word Cloud from OCR output

The ‘word cloud’ created from the OCR output shows what words were picked up by the OCR, with size representing the frequency of a word. China and Yunnan are the largest locality related words, but west, eastern and Tibet are also common. Locality and alt are from the labels, which have several pre-printed phrases. Bulley, Ness, Neston and Cheshire are from some of Forrest’s earlier collections, when he was collecting on behalf of Bulley, for his garden Ness.

These specimens are from three regions: 3A (Outer China incl. Tibet), 4 (Inner China, Korea and Taiwan) and 5 (South Asia), with the majority (505 specimens) being from area 4 – as may be expected from our knowledge of where Forrest collected.

Styrax grandiflorus

Styrax grandiflorus

There are 28 families represented by these specimens, with 27 of them in area 4. The families do not necessarily represent what Forrest was collecting or family size in this region, but are more likely to represent what we have been imaging as part of our on-going digitisation programme.

We are working to populate these, and other minimally databased records by exploring different technologies and workflows, including the possibility of making use of Citizen Science programmes.