The RBGE DNA database – how an EDNA number is assigned to a DNA extraction

When people extract DNA in the RBGE molecular lab, we insist that it’s given something we call an EDNA (Edinburgh DNA) number. This links to a database that is internal to RBGE.

The EDNA number is used for all internal molecular lab processes – it’s written on the tube of DNA, used to refer to the sample in lab books, and part of the file name for all DNA sequences that are generated from that sample. Using this standard system across all projects means that we can keep track of what DNA we have, we can store it in a way that makes it relatively easy to retrieve, it can be used in other projects, and critical information like which specimen voucher is linked to a DNA extraction is not lost if people move on from RBGE.

Getting an EDNA number involves filling in a simple Excel spreadsheet with some basic collection information, and uploading it to a database. The Excel spreadsheet is accessible to RBGE lab users on an internal server (DNA, Molecular lab registration forms, EDNA (DNA), EDNA_submission_sheet_v01), and has two sets of fields, required and additional. If anything’s missing from the required fields, an EDNA number will not be issued, whereas the additional data is recommended but not essential… However, the more fully complete the data entry is, the faster it is to use it to generate GenBank submissions and publication voucher tables, justifying spending a little extra time on getting the forms completed.

Two points to remember when filling in the spreadsheet are not to use special characters, and not to make any of the entries too long, as there’s a maximum character number.

REQUIRED INFORMATION

Taxon name: this should not have authority information (Bellis perennis L.), just the genus and specific epithet (Bellis perennis).

Collector name: this cannot begin with an initial (J. Smith) as it will be rejected by the database; either use a full Christian name (John Smith), or put the surname first (Smith, J.).

Collector number: if there is none, s.n. is accepted.

Country code: two-letter standard codes; when filling in the spreadsheet, there is a tab with all the codes that you can look up (e.g. DE for Germany).

Material type: drop-down menu choices – fresh, frozen, herbarium, seed, silica gel dried.

Extraction type: drop-down menu choices include tissue maceration type, e.g. pestle, or mixer mill, and chemistry used, e.g. CTAB, Plant DNeasy minikits, Qiextractor.

ADDITIONAL INFORMATION

User DNA ID: this is the number that was given to the extraction in the lab; it’s extremely useful to have this for various troubleshooting in the lab – it can help match accessions to tubes, sort out issues with sample order, etc.

Extraction Date: entered in standard format year-month-date. Again, this can be useful for later troubleshooting, e.g. for separating batches of extractions by date, in case something went wrong on a particular day.

Herbarium barcode: this is ONLY for RBGE herbarium barcodes, not those from other institutes. If this is available, filling this in will propagate specimen data from the herbarium database. However, the required fields still need to be filled in.

Living Accession Number: this is ONLY for RBGE living accessions, not those from other institutes. If this is available, filling this in will propagate specimen data from the living collection database. However, the required fields still need to be filled in. The qualifier letter should not be filled in here.

Living Qualifier: this field is for any alphabetical character after the Living Accesion Number.

Silica Gel Box Number: this field is best left empty unless silica material came from a box numbered in the same format as “SGN12345”.

Sample note: free field, but there is a limit on how many characters are allowed, so should be kept short, and free from special characters. It may be useful to note e.g. if the extraction was from sporophyte versus gametophyte tissue, or flower versus leaf.

Location: free field, but there is a limit on how many characters are allowed, so should be kept short, and free from special characters.

Coordinates: free field, but there is a limit on how many characters are allowed, so should be kept short.

Decimal longitude:

Decimal latitude:

Collection Date Verbatum: this is for dates that cannot be turned into the correct date format, e.g. “Spring 1920”, “October 1976”.

Collection Date: entered in standard format year-month-date. This can be very useful in relation to DNA quality. If this is filled in, there is no point also filling in the Collection Date Verbatum field.

Note: free field, but there is a limit on how many characters are allowed, so should be kept short, and free from special characters.

Once the EDNA form is filled in, it can be uploaded to the EDNA database, which is available to users at RBGE who have a Username and Password.

Once logged on, the tab ‘Importer’ becomes highlighted; at the bottom of the Importer screen is a “Load” button. The information in the excel sheet should be pasted into the ‘Load data’ window, and mapped to the fields. This will leave four fields that need to be filled in manually, three required fields: User (the lab user’s name, available from a drop-down list); Project (again, from a drop-down, e.g. MSc, barcoding, Leguminosae); Contact (a permanent staff member who will take long-term responsibility for the project, chosen from the drop-down list) – and one optional field, EDBANK Format (how the DNA will be stored long term – Plate, Strip or Tube; for most phylogenetics projects DNA will be stored in individual tubes, while for some population genetic project it will be stored in strips or plates – check with the molecular lab staff if unsure which format to chose).

After this information is filled in, the tab “Validate” becomes available. The entered data is screened for things like collector names that start with initials, accession numbers, dates, latitudes and longitudes that are in the wrong format, or other errors. If any are found, then these need corrected in the excel spreadsheet and the information all needs reloaded and re-entered. If there are no validation errors, the “Import to EDNA” button becomes available. At this point, the data will either successfully import, or other errors will be identified (e.g. non-standard characters, or too many characters). Unfortunately errors identified at this later point only stop EDNA numbers being generated for individual samples rather than for the whole batch, and it is not possible to cancel the issued EDNA numbers. This means that, for example, if entering a plate of 96 DNA extractions to EDNA, it’s quite possible for some samples in the middle of the plate to not be assigned a number. Obviously this becomes a sample labelling headache that is optimally sorted by redoing the entire batch to get consecutive EDNA numbers for all the samples, although this will lead to apparent duplicates of samples in the database. Molecular lab staff should be informed of redundant numbers, so that the duplicates are not also assigned places in the DNA bank.

When the numbers have been generated, they can be downloaded from the database by clicking on the “Tasks” tab, and the “As Spreadsheet” option – this will return all the information that has just been entered, along with the EDNA accession numbers for each sample.

Botanics Stories

Other News

Low Visibility

The RBGE DNA database – how an EDNA number is assigned to a DNA extraction