An algorithm competition for automatic species identification from herbarium specimens

Research output: Contribution to journalJournal articleResearchpeer-review

  • Damon P. Little
  • Melissa Tulig
  • Kiat Chuan Tan
  • Yulong Liu
  • Belongie, Serge
  • Christine Kaeser-Chen
  • Fabián A. Michelangeli
  • Kiran Panesar
  • R. V. Guha
  • Barbara A. Ambrose

Premise: Plant biodiversity is threatened, yet many species remain undescribed. It is estimated that >50% of undescribed species have already been collected and are awaiting discovery in herbaria. Robust automatic species identification algorithms using machine learning could accelerate species discovery. Methods: To encourage the development of an automatic species identification algorithm, we submitted our Herbarium 2019 data set to the Fine-Grained Visual Categorization sub-competition (FGVC6) hosted on the Kaggle platform. We chose to focus on the flowering plant family Melastomataceae because we have a large collection of imaged herbarium specimens (46,469 specimens representing 683 species) and taxonomic expertise in the family. As is common for herbarium collections, some species in this data set are represented by few specimens and others by many. Results: In less than three months, the FGVC6 Herbarium 2019 Challenge drew 22 teams who entered 254 models for Melastomataceae species identification. The four best algorithms identified species with >88% accuracy. Discussion: The FGVC competitions provide a unique opportunity for computer vision and machine learning experts to address difficult species-recognition problems. The Herbarium 2019 Challenge brought together a novel combination of collections resources, taxonomic expertise, and collaboration between botanists and computer scientists.

Original languageEnglish
Article numbere11365
JournalApplications in Plant Sciences
Volume8
Issue number6
ISSN2168-0450
DOIs
Publication statusPublished - 1 Jun 2020
Externally publishedYes

Bibliographical note

Funding Information:
We thank the New York Botanical Garden for support and funding from the National Science Foundation (IAA‐1444192, DEB‐1343612 and DEB‐0818399 to F.A.M.). Special thanks to the staff of the New York Botanical Garden, particularly Kim Watson and Nichole Tiernan for all the specimen digitization work. We also thank the organizers of FGVC, the Kaggle platform, and all the Herbarium 2019 competitors for taking on the challenge of this data set.

Funding Information:
We thank the New York Botanical Garden for support and funding from the National Science Foundation (IAA-1444192, DEB-1343612 and DEB-0818399 to F.A.M.). Special thanks to the staff of the New York Botanical Garden, particularly Kim Watson and Nichole Tiernan for all the specimen digitization work. We also thank the organizers of FGVC, the Kaggle platform, and all the Herbarium 2019 competitors for taking on the challenge of this data set.

Publisher Copyright:
© 2020 Little et al. Applications in Plant Sciences is published by Wiley Periodicals, LLC on behalf of the Botanical Society of America

    Research areas

  • artificial intelligence, computer vision, FGVC, herbarium specimen, Kaggle, machine learning, Melastomataceae

ID: 301822923