Comprehensive species lists are important tools for taxonomists, field biologists, conservationists, biosecurity officers, policymakers, biodiversity data platforms, amateur naturalists and many others (see the list of open-access papers Towards a global list of accepted species).
Lepidoptera (moths and butterflies) are a hyperdiverse group for which we still lack a high-quality synonymised checklist. Many users and websites rely on LepIndex, a database created from a card index at the Natural History Museum (NHM), London but this is very incomplete and needs significant curation.
The Catalogue of Life (COL) Checklist has treated LepIndex as its primary resource for Lepidoptera but replaces several families with more complete and current datasets from various sources. I have helped to prepare and continue to maintain such datasets for Gelechiidae, Pterophoridae and Alucitidae.
Over the last few years, a copy of LepIndex has been imported into the TaxonWorks online taxonomic workbench tool as the Global Lepidoptera Index, and significant improvements have been made to some sections (especially Bombycoidea and Geometridae). From June 2022, this TaxonWorks dataset replaces the NHM version of LepIndex in COL. TaxonWorks is a collaborative data management tool which opens the door for a much wider community of taxonomists and other experts to work together on delivering a truly comprehensive and current listing of the world’s butterflies and moths.
Lepidoptera (moths and butterflies) form one of the largest insect orders. Close to 10% of all known species of living organisms are Lepidoptera. They are among the most easily surveyed and monitored insect groups, with more than 75 million occurrence records in the Global Biodiversity Information Facility (GBIF) today (compared with 22 million records for Coleoptera). As a highly diversified group feeding on most plant species and other substrates, sampling and monitoring Lepidoptera can give broad insights into ecosystem complexity, health and dynamics.
In 2008, John Heppner (p. 627 in: Capinera, J.L. Encyclopedia of Entomology) estimated that the Lepidoptera number 255,000 extant species with around 156,100 currently named. Michael Pogue (Biodiversity of Lepidoptera, pp. 325-355 in Foottit, R.G & Adler P.H. (2009) Insect Biodiversity – Science and Society) offered a calculation (based on multiple datasets) showing 155,181 described species. The Catalogue of Life (COL) Annual Checklist 2021 contained 148,897 accepted Lepidoptera species. Changes discussed here have raised this count to 154,344 accepted species.
During the last few years, I have updated and digitised species lists developed by Klaus Sattler at the Natural History Museum, London for the Gelechiidae and by Cees Gielis at Naturalis, Leiden for the Alucitidae (including Tineodidae), Pterophoridae and Macropiratidae. This affords an opportunity to assess the completeness of LepIndex and the accuracy of the estimates in Capinera 2008 and Pogue 2009. The following table compares the counts of accepted species in each of these groups in Pogue 2009 and the current datasets now in Catalogue of Life:
|Alucitidae (including Tineodidae)||208||260||25%|
|Pterophoridae (including Macropiratidae)||1192||1562||31%|
In each of these groups, at least 25% more species have been described and are currently accepted by taxonomists than are indicated in the published estimates. It therefore seems likely that the total count of currently described Lepidoptera species may be much closer to 200,000.
However, partly owing to the continued level of taxonomic research across the group and partly because of the sheer size of the order, there is still no truly comprehensive and current list of described species for this group. For many years, the Global Lepidoptera Names Index (LepIndex) has served as the reference checklist used by many biodiversity data platforms to organise data on Lepidoptera. LepIndex is a digitised and updated version of an index card archive to the scientific names of the living and fossil butterflies and moths of the world produced over many decades by lepidopterists at the Natural History Museum (London). The stated coverage for this dataset is 137,441 species.
The current situation
For many years, LepIndex has provided most of the Lepidoptera names and classification used in the Catalogue of Life (COL) Checklist, although several families have been sourced from other datasets that have been more fully curated by taxonomists familiar with the groups, specifically:
- Alucitidae (including the former Tineodidae) from Catalogue of the Alucitoidea of the World
- Gracillariidae from Global Gracillariidae: Global Taxonomic Database of Gracillariidae
- Lycaenidae, Pieridae and Papilionidae from the Global Butterfly Information System (GloBIS)
- Nepticulidae and Opostegidae from Nepticulidae and Opostegidae of the World
- Pterophoridae and Macropiratidae from Catalogue of the Pterophoroidea of the World
- Tineidae from Global taxonomic database of Tineidae (Lepidoptera)
The remainder of the Lepidoptera coverage (120 families) in COL comes from LepIndex.
The NHM card index was a nomenclatural catalogue rather than a synonymic species list and was never completely curated to reflect all revisions of the order. Coverage of literature from the 1980s onwards is very incomplete and the last edits to the dataset were made in 2018. As a result, LepIndex has the following weaknesses as a resource for organising biodiversity data:
- Large numbers of new names, combinations and revisions are missing, especially from the last 30 years.
- The original generic placement (original combination) for many names is not reliably recorded – this at least makes LepIndex unreliable for determining whether parentheses are required around authorship.
- The only combinations that may be provided are the original combination and one current when each card was last edited – in many cases, only one combination is available.
- Many names currently considered synonyms are shown as (provisionally) accepted.
- For the most part, higher classification is limited to family and these do not map consistently to current family concepts, especially in superfamilies such as Gelechioidea and Noctuoidea.
- A significant number of names were mistranscribed from the original card images resulting in inaccurate spellings.
Despite these flaws, LepIndex has remained in use as a reference classification because no other digital resource is as comprehensive. Even Markku Savela’s excellent Lepidoptera and some other life forms site (which accurately handles much of the more recent literature) contains slightly under 117,000 species and Wikispecies contains under 112,500 pages that include the word “Lepidoptera” (including many that relate to other ranks than species, literature references including the word, etc.). As platforms such as GBIF have expanded their importance, the weaknesses of LepIndex have become clearer and more pressing.
LepIndex has been migrated into the TaxonWorks online taxonomic workbench platform developed and maintained by the Species File Group in Illinois. This is a rich editing environment for nomenclatural and taxonomic datasets and provides many useful tools for editors to contribute updates and corrections.
Many corrections and updates have been applied to the TaxonWorks version of LepIndex, including major revisions to the Bombycoidea and some other families.
The dataset will regularly be published to ChecklistBank as the Global Lepidoptera Index. ChecklistBank is an online platform developed by GBIF and COL to hosts checklist datasets and including the tools used each month to construct the COL Checklist. ChecklistBank also allows any dataset to be downloaded in multiple formats or accessed through a public API.
Additionally, a new family dataset for the Gelechiidae (Catalogue of World Gelechiidae) is now available in ChecklistBank. This is based on the list maintained over many years by Klaus Sattler at NHM but has been updated to include changes made in the literature in the last five years and to serve as a placeholder for a few names that were included as Gelechiidae in LepIndex but that have no current accepted family placement. Associated changes have also been made to the Global Lepidoptera Index to move other species that were previously considered to be Gelechiidae into the currently accepted family.
Now that these datasets are accessible through ChecklistBank, the June 2022 edition of Catalogue of Life includes them in its construction. The following table summarises the current components of the COL Checklist for Lepidoptera.
|Nepticulidae and Opostegidae of the World||Last updated 2016|
Exploring fresh import
|Tineidae||Global taxonomic database of Tineidae (Lepidoptera)||Based on data from the late Gaden S. Robinson. Last updated 2011|
Needs full update
|Gracillariidae||Global Taxonomic Database of Gracillariidae||Updated January 2022|
|Gelechiidae||Catalogue of World Gelechiidae||Updated June 2022|
Expected integration into Global Lepidoptera Index
|Catalogue of the Pterophoroidea of the World||Updated June 2022|
Expected integration into Global Lepidoptera Index
|Alucitidae||Catalogue of the Alucitoidea of the World||Updated June 2022|
Expected integration into Global Lepidoptera Index
|Global Butterfly Information System||Last updated 2013|
New update expected
Lycaenidae coverage is incomplete and must be reviewed
|All others||Global Lepidoptera Index||Updated June 2022|
I plan to integrate the Gelechiidae, Pterophoridae/Macropiratidae and Alucitidae datasets into the Global Lepidoptera Index dataset. The Nepticulidae/Opostegidae, Gracillariidae and Papilionidae/Pieridae datasets are actively maintained outside TaxonWorks but more regular imports are needed. The Tineidae need more work but are also likely to be merged into the Global Lepidoptera Index.
How to contribute
More work is required on almost all other families. Discussions are underway to bring in copies of well-managed datasets for several other families, but contributors or editors are sought for other components. Contributions may take any of the following forms:
- Regular copies of existing global superfamily/family/subfamily/tribe datasets that are already maintained externally using other tools. Merging efforts around TaxonWorks as a common platform for all lepidopteran groups would bring significant benefits, but the priority is to maintain high-quality checklists for each group.
- A single copy of an existing global superfamily/family/subfamily/tribe that can be shared with COL under a Creative Commons Attribution (CC BY) or Creative Commons Zero (CC0) licence.
- Editors (or teams of editors) ready to assume responsibility for updating and maintaining a group within the Global Lepidoptera Index. We can arrange training in the use of the tools.
In all cases, COL advocates for the approach outlined in Garnett et al. 2020 Principles for creating a single authoritative list of the world’s species. Lists should be developed collaboratively by taxonomists and other experts working on the group. Decision processes should be transparent and aim to secure an appropriate consensus view. As far as possible, there should be no barriers to contribution and participation by relevant taxonomists from any region.
Since an increasing proportion of new and even historical taxonomic literature is being made accessible in structured formats (e.g. Pensoft journals, Plazi TreatmentBank), and since most of these datasets will be accessible through ChecklistBank, there is a great opportunity to automate (or semi-automate) inclusion of new taxa, combinations and synonymy.