Until 2022, Catalogue of Life (COL) and GBIF still relied on the NHM LepIndex dataset for names for almost all Lepidoptera (butterflies and moths). This is now superseded by a revised version of LepIndex maintained in TaxonWorks as the Global Lepidoptera Index (GLI). See this earlier post for more detail.
The concept used in LepIndex for the gelechioid family Elachistidae corresponded to what we now treat as a subfamily Elachistinae. At the time of its last import into COL, LepIndex had 491 scientific names associated with this (sub-)family, organised as follows:
- Family – 1 accepted
- Genus – 35 accepted
- Species – 410 accepted, 1 provisionally accepted, 40 synonyms, 2 ambiguous synonyms
- Subspecies – 2 accepted
In 2019, Lauri Kaila published An annotated catalogue of Elachistinae of the World (Lepidoptera: Gelechioidea: Elachistidae) in Zootaxa. I had already brought GLI up to date for the Australian Elachistinae treated in his 2011 Monographs of Australian Lepidoptera volume, so I decided to take the time also to update the remainder of this subfamily and to include all post-2019 species I could find. This is now completed, and GLI now includes 1284 names for the group. This total comprises names in Kaila 2019, those from newer papers, fossil names from LepIndex and a few nomina dubia that were not in the catalogue but seem plausibly to refer to elachistine moths. I was not rigorous about adding every historical combination for epithets that have passed through multiple genera, but original combinations and current combinations should all be present, as should original combinations for all synonyms. I did not update the micro-references that were already in place for older names, but the newer names link to structured citations.
Totals are now as follows:
- Subfamily – 1 accepted
- Genus – 14 accepted, 50 synonyms
- Species – 819 accepted, 392 synonyms
- Infraspecific taxa – 1 accepted, 7 synonyms
About five genera and around a dozen other species that were under Elachistidae in LepIndex previously have been moved to other families in the Lepidoptera. Many of these cases are discussed by Kaila, although a few represent highly outdated placements in the NHM catalogue that were apparently not even considered worth discussing. Many small genera have been synonymised into Elachista, Perittia or Stephensia. Four fossil genera are not treated by Kaila but are retained from LepIndex.
I fixed multiple misspellings that occurred in LepIndex either because information on the index cards was incorrect or during transcription into digital format. Despite the scale of the publication, I found no obvious misspellings in Kaila 2019.
Based on these raw numbers, it is clear that LepIndex lacked around 50% of the currently expected number of accepted species for the family and that many synonyms were also missing. The actual situation was even more serious than this appears, because many names were accepted by LepIndex are now considered synonyms, and vice versa.
Here is a summary of results from the largest genus, Elachista. LepIndex had 355 names associated with 327 accepted species in this genus, whereas GLI has 1,046 names for 716 accepted species.
Just 183 (56% of 327) accepted names in LepIndex exactly matched the spelling, authorship and status, and only 9 (32% of 28) synonyms exactly matched the spelling, authorship, status and accepted name offered by GLI. If variation in authorship (mostly missing years and/or parentheses) is ignored, these totals rise to 200 accepted names and 12 synonyms that match the expected species.
81 (25%) of the names accepted for Elachista species by LepIndex are now considered synonyms for other species in the genus. 36 accepted names (11%) now refer to species outside this genus.
6 (21%) of the LepIndex synonyms in this genus are now treated as synonyms for different species
In other words, of the 365 names that LepIndex associated with species in the genus Elachista, even ignoring issues with authorship strings, just 212 (58%) directed users to the currently accepted name for a species.
Reviewing this not from the perspective of what the taxonomic community knows and what names are actually in circulation for species in the genus Elachista (again ignoring issues with authorship):
- Nearly 70% (507 of 716) of the currently accepted species names in Elachista were unknown to LepIndex/COL/GBIF a year ago
- 78% (815 of 1,045) of the names now in TaxonWorks for Elachista species were unknown or incorrectly handled a year ago
Elachistinae forms perhaps 0.3-0.4% of the total described Lepidoptera fauna, so these corrections are only a small step towards delivering a comprehensive and reliable catalogue for world Lepidoptera. This subfamily now joins Nepticuloidea, Gracillariidae, Gelechiidae, Lecithoceridae, Alucitidae, Pterophoridae, and Tortricidae as groups that are in good condition in the COL Checklist. Preparations are well under way to bring in some other major family-rank datasets that have been prepared over many years by dedicated groups of taxonomists. Both Geometridae and Bombycoidea are likely to be replaced in the next few months.
The rest of the Lepidoptera is covered by aging datasets. The Global Butterfly Information System dataset (GloBIS/GART) may soon be updated. This covers the Pieridae and Papilionidae. I am working on a refresh for Gaden S. Robinson’s Tineidae dataset which was last updated in 2011. Even the Nepticuloidea (last updated in 2016) is urgently awaiting a planned update. All the rest comes from LepIndex.
The following table compares accepted species counts for the same taxa in different datasets. This is a crude metric – if large numbers of names that should be treated as synonyms are included as accepted species names, this may inflate numbers. However, these numbers show clearly that effort to clean up LepIndex data always leads to significant increases in record counts.
|Total excl. |
The COL version of LepIndex is missing names for taxa that had been sourced from other datasets prior to 2019. The total count provided for LepIndex uses GLI counts for these taxa – the total is therefore an overestimate, but the mean growth across these groups is at least 27%. Applying the same rate across all other Lepidoptera groups gives an estimate for the order of 181,608 accepted described species. There is reason to consider Geometridae an outlier since significant NHM work on the family preceded the 2011 version of LepIndex. Excluding Geometridae from the calculation raises the estimated percentage growth to 41%, giving an estimated species count of 195,565.
Revised versions are as follows: Nepticulidae and Opostegidae of the World (Oct 2016), Global Taxonomic Database of Gracillariidae (Jan 2022), GLI Elachistinae (Mar 2023), Catalogue of World Gelechiidae (Feb 2023), GLI Lecithoceridae (Mar 2023), Catalogue of the Alucitoidea of the World (Nov 2022), Catalogue of the Pterophoroidea of the World (Jan 2023), World Catalogue of the Tortricidae (Tortricid.net, Dec 2018), Geometridae (pending update, 2022), Bombycoidea (pending update, 2022). The last two datasets will be added to COL once associated taxonomic catalogues have been published.
The table shows two calculated estimates for the current total number of described Lepidoptera species. I consider it highly likely that most remaining groups will expand at least 41% as gaps in LepIndex are addressed. Given the large amount of ongoing revisionary work in the Noctuoidea (42,941 species in GLI today), it seems reasonable that this popular group may have gaps as significant as those shown here for Bombycoidea, which would inflate the numbers much further. At a minimum, Catalogue of Life today is likely to be missing 40,000 described Lepidoptera species.
I would note too that many I found for Elachistinae that LepIndex lacked many 19th century European and British names. Some of these are significant omissions, for example names from Haworth, Hübner and Herrich-Schäffer, including the currently accepted name for the widespread species Elachista freyerella (Hübner, 1825) (with hundreds of records in GBIF). Although the NHM card index was maintained into the 1990s, modern publications begin to disappear even from early in the 1980s.
I feel even more than before the need to make the scale of the challenge much more public and for COL to become more proactive in finding and promoting new ways for content to be edited. A traffic-light system for coverage and quality for each taxon would be a big step forward.