Categories
Araba Bioscan

Araba Bioscan SLAM 17-24 March 2023

Categories
Araba Bioscan

Araba Bioscan SLAM 10-17 March 2023

Categories
Araba Bioscan

Araba Bioscan SLAM 3-10 March 2023

Categories
Miscellaneous

Digital cameras for wildlife observations

What I photograph

These are notes from my personal history on experiences with capturing images as a tool for recording wildlife. My perspective is limited. I don’t take time to produce really stunning artistic photographs. I use my camera to record what I see as faithfully and as crisply as I can. My targets are usually insects and other invertebrates, birds, mammals, reptiles and smaller numbers of flowering plants and fungi.

My primary goals are to provide reference shots for myself and others to improve identification for less well-recorded species and to provide evidence for mapping and monitoring species distributions and populations.

Over the years, these basic goals have led me to try many different cameras to find one best suited to my needs and to maximising simplicity. Most of this post is a cursory examination of what has driven my choices at each stage and where I find myself today.

TL;DR - My needs seem to be well met by a combination including the Sony RX10 IV camera, the Marumi DHG Achromat 330 (+3) macro lens, and a GODOX MF12 macro flash set.

I share my photographs in a few key places on the web, always under a Creative Commons Attribution (CC BY) licence so that I can maximise the potential value others can gain from my images and so that there’s a way I can be contacted in cases of misidentifications:

  • Flickr – my primary destination for all my photos that are not of irretrievably low-quality.
  • iNaturalist – my primary channel for making these images useful for biodiversity research and monitoring (and to get identifications for organisms outside my expertise).
  • Global Biodiversity Information Facility (GBIF) and Atlas of Living Australia (ALA) – secondary channels, in that I rely on iNaturalist to pass them on, but these are in fact the main reason I take time to photograph wildlife.
  • Wikimedia Commons – secondary channel for long-term reuse, mainly from other Wikipedians taking the time to add my photos from Flickr.
  • Mastodon – cherry-picked photos I feel like talking about on a given day.

I also take a lot of photos under a microscope (mainly again insects and other invertebrates), but this piece is about field photography.

Before digital cameras

In the distant pre-digital past, I tried to photrograph birds with various largely unsuitable film cameras. Probably the only useful legacy from those efforts (aside from a few slides) is this photo of the only Little Auk I’ve ever seen, in Southwick (Sussex, UK) in 1995.

Photograph of choppy water in a marina with a tiny black and white bird occupying perhaps 0.25 percent of the image area.
Alle alle, Little Auk or Dovekie, Southwick, UK, 29 November 1995 (some long-forgotten film camera)

Around a year after that photo was taken, I began getting seriously interested in the UK’s moths, mostly with help from the only good guide readily available at the time, Bernard Skinner’s The Colour Identification Guide to Moths of the British Isles (1st edition). The only cameras I had lacked any macro capability, so I started my journey into entomology with a pack of coloured pencils and my appallingly limited artistic skills. The following drawings are sadly representative.

In 1999, we moved to New Zealand and purchased a video camera that wrote to magnetic tape (no idea what brand or model). With some care, it could capture and export still images, so my first insect photos are of relatively large moths attracted to our garden in Auckland, like this ghost moth, Aenetus virescens. These images were only 768 x 576 pixels in size.

Aenetus virescens, Mairangi Bay, Auckland, New Zealand, to MV light, 8/9 November 1999 (some video camera)

Coolpix 4500 and other bridge cameras

Canon PowerShot SX60 HS camera with Raynox DCR-250 macro lens and a collapsible flash diffuser
Canon PowerShot SX60 HS camera with Raynox DCR-250 macro lens and a collapsible flash diffuser

In 2002, as we returned to Europe (Denmark, during my first stint with GBIF), we bought a real digital camera. Internet discussion groups had identified the Nikon Coolpix 4500 as the camera of choice for reasonably inexpensive macro photography, boasting a 4 megapixel resolution (2272 x 1704 pixels) and the ability to focus as close as 2 cm without serious distortion, so that is what we got. The first insect I photographed with it was a many-plume moth, Alucita hexadactyla in August that year.

Alucita hexadactyla, to MV light, Hellerup, Denmark, 2 August 2002 (Nikon Coolpix 4500)

I went on to use it for thousands of images, mainly of moths I attracted to light but also other wildlife. It always took excellent photos, although it was completely unsuited to photographing birds or any animals at a distance.

Pogona barbata (Cuvier, 1829), Eastern Bearded Dragon, Aranda, ACT, 18 September 2009 (Nikon Coolpix 4500)

Somewhere around 2009, I was feeling constrained by the lack of zoom capability and tempted by the increasingly large sensors of newer cameras. Although though I continued to use the Coolpix 4500 for moths, I invested in a series of different bridge cameras to give more flexibility in the field. These included a Nikon Coolpix P6000, a Canon PowerShot SX40 HS, then a Canon PowerShot SX50 HS and finally a Canon PowerShot SX60 HS.

Macropus giganteus Shaw, 1790, Eastern Grey Kangaroo, female with joey, Aranda, ACT, 21 October 2009 (Nikon P6000)
Emmelina monodactyla (Linnaeus, 1758), to Robinson trap, Søborg, Denmark, 1/2 June 2014 (Canon PowerShot SX40 HS)
Notodryas vallata Meyrick, 1897, to actinic light, Blackheath, NSW, 8/9 November 2014 (Canon PowerShot SX50 HS)

I usually carried a Raynox DCR-150 or DCR-250 macro lens that I clipped in front of the camera lens when I needed true macro. With the PowerShot cameras, these lenses gave me the same versatility as the Coolpix 4500 but with larger image sizes and the ability to use the same camera for telephoto shots of birds.

This combination was very effective, but although it brought some improvements on the telephoto side, the PowerShot SX60 was less satisfactory than earlier models for macro use since images were heavily vignetted by the Raynox lens except at the very highest zoom levels. I was able to zoom in and take great pictures of the smallest insects, but I had to remove the lens, change camera settings and rely on non-macro zoom if the insect was larger than around 25 mm. Constantly switching backwards and forwards between Raynox macro and non-Raynox telephoto disrupted the experience of photographing moths at a light sheet (probably my most significant use for a camera).

Canon DSLRs

Canon EOS 7D camera with Canon EF 100 mm f/2.8L Macro IS USM lens and Canon MT-24EX twin macro flash
Canon EOS 7D camera with Canon EF 100 mm f/2.8L Macro IS USM lens and Canon MT-24EX twin macro flash

Eventually, because of my focus on macro imagery, I took the plunge and invested in a Canon EOS 7D DSLR with EF 100mm f/2.8L Macro IS USM lens and (ultimately) the MT-24EX twin macro flash unit. This is a fantastic and very flexible combination. The autofocus is great, and it’s easy to get good images. The flash unit is perfect for illuminating the area in front of the lens, although it can need to rest briefly to recycle. This delay has rarely been a problem for me, except when charge is very low.

So, for several years my standard field equipment has been a Canon DSLR for moths/macro and a bridge camera for other wildlife. The disparity between image quality for moths and birds caused me also to test a telephoto lens (Sigma 150-600 mm f/5-6.3 DG) and even to go into the field with two DSLR bodies (EOS 7D and EOS 6D) so I could switch more quickly between lenses.

Habrosyne pyritoides (Hufnagel, 1766), Buff Arches, to LepiLED and actinic light, Utterslev Mose, Søborg, Denmark, 15 June 2018 (Canon EOS 6D, Canon EF 100 mm)
Chrysopilus cristatus (Fabricius 1775), female, Rude Skov, Denmark, 30 June 2018 (Canon EOS 6D, Canon EF 100 mm)
Nucifraga columbiana (Wilson, 1811), Clark’s Nutcracker, Rocky Mountain National Park, Colorado, USA, 21 September 2016 (Canon EOS 7D, Sigma 150-600 mm)
Ochotona princeps (Richardson, 1828), American Pika, Rocky Mountain National Park, Colorado, USA, 21 September 2016 (Canon EOS 7D, Sigma 150-600 mm)

I was always exceptionally happy with the results from Canon DSLRs, but the bulk and weight is troublesome when carrying or traveling with multiple lenses and bodies. Taking so much equipment was particularly problematic when camping.

So, in recent years, I’ve found myself rarely carrying anything bulkier than the PowerShot SX60 except when I am photographing moths at light. Because of the hassle of doing everything properly, I’ve found myself taking many fewer pictures than formerly.

Sony RX10 IV

Sony RX10 IV camera with Marumi DHG Achromat 330 (+3) macro lens and GODOX MF12 macro flash set
Sony RX10 IV camera with Marumi DHG Achromat 330 (+3) macro lens and GODOX MF12 macro flash set

I recently decided to do something about this. I’ve been testing a simpler solution and have been immensely pleased with the results.

The Sony RX10 IV is a bridge camera with a Zeiss 24-600 mm equivalent zoom lens. It covers my desire to photograph birds and mammals much better than my past equipment, including the Sigma lens.

Macropus giganteus Shaw, 1790, Eastern Grey Kangaroo (with cloud of flies), Black Mountain, Canberra, ACT, 22 February 2024 (Sony RX10 IV)

The native macro capability of the camera is good for quick images of flowers, etc., but a couple of additions makes it an astoundingly good camera for macro.

First, the Marumi DHG Achromat 330 (+3) macro lens (in the 72 mm size) plays the same role the Raynox DCR-150/DCR-250 lenses did with my older PowerShot cameras, but with virtually no vignetting throughout the zoom range. This means I can go from a field of view around 20 cm wide down to little more than 20 mm simply by zooming. Focusing the camera of course relies on moving backwards and forwards to find the correct distance. This is less optimal than the autofocus offered by the EF 100 mm lens, but the camera does manage to adjust focus within a narrow relevant range.

Secondly, a pair of GODOX MF12 macro flash units and a GODOX XPro II Trigger replace the Canon MT-24EX flash unit but with greater flexibility and an unbelievably short refresh interval. The ring that holds the twin (or up to six) flashes can be attached to the camera itself or to the Marumi lens using a 72 mm screw adaptor.

So far, I’ve attached the flash outside the Marumi lens. The flash ring does very slightly vignette the frame when the zoom is below around 34 mm. I suspect that with some extra step-up/step-down rings I could mount the ring further back or even behind the Marumi lens and avoid all obstruction.

The camera is now giving me macro images that are every bit as good as those I was taking with the Canon setup. The complete Sony combination weighs 1.70 kg compared to 2.25 kg for the Canon. Another advantage is the relatively low profile of the GODOX trigger compared with the very upright trigger component of the MT-24EX. In the past, if I wanted to add light to help locate an insect, I held a torch below the EF 100 mm lens. Now I can use a head torch directed at a slight downward angle. This is much easier. The GODOX flash units also have LED focusing lights that seem very good for this but that clearly place some drain on the flash batteries. It should be noted that the MF12 batteries are not removable. When drained, the flash unit itself must be recharged. With the MT-24EX flash unit, all power comes from AA batteries. Hence the longer recycle times, but that does allow the batteries easily to be switched in the field.

Strepsinoma foveata, to light, Aranda, ACT, Australia, 2/3 March 2024 (Sony RX10 IV)
Stangeia xerodes, to light, Aranda, ACT, Australia, 2/3 March 2024 (Sony RX10 IV)
Christinus marmoratus, to light, Aranda, ACT, Australia, 2/3 March 2024 (Sony RX10 IV)

Conclusion

For now, it seems the Sony RX10 IV serves my needs exceptionally well. I am much more ready just to go out with the camera to see what wildlife I can find and photograph.

As a final comparison, here are two images of an ichneumon wasp taken with the Canon 7D and the Sony RX10 IV. I believe I could have done more to optimise each camera for these shots, so this should not be treated as a realistic comparative test, but these photos again show how well the Sony does in this macro configuration.

Lissonota macqueeni, sample from SLAM trap, Aranda, ACT, Australia, 17-24 February 2023, photographed 2 March 2024 (Canon EOS 7D, Canon EF 100 mm, Canon MT-24EX)
Lissonota macqueeni, sample from SLAM trap, Aranda, ACT, Australia, 17-24 February 2023, photographed 2 March 2024 (Sony RX10 IV, Marumi DHG Achromat 330, twin GODOX MF12)
Categories
Araba Bioscan

Araba Bioscan SLAM 24 February to 3 March 2023

Categories
Araba Bioscan

Araba Bioscan SLAM 17-24 February 2023

Categories
Autonomous Moth Trap

Autonomous Moth Trap Image Pipeline

This is the seventh post in a series:

This post summarises the image/data pipeline currently in place for the image capture and processing from two Autonomous Moth Traps:

  • AMT-2: Raspberry Pi 4.0 unit with Logitech BRIO camera collecting images via motion detection, as in Danish design:
    • Positioned on ground in fixed location for 250 nights between 22 May 2021 and 6 May 2022, collecting 967,511 images
    • Later fitted to a wall so far for 417 nights between 7 August 2022 and 28 September 2023 (ongoing), collecting 1,124,600 images to date
  • AMT-Alpha: Raspberry Pi Zero unit with Raspberry Pi HQ camera and 6 mm lens collecting images via timelapse:
    • Positioned on ground at various locations for a total of 122 nights between 20 November 2021 and 28 September 2023, collecting 82,833 images

Videos made from the images collected by each trap can be seen at https://vimeo.com/user157042939.

Together these traps have collected 2,174,944 images. The pipeline discussed below has identified and extracted 20,209,207 features of interest (“blobs”). This post documents the current processing, but I am reviewing all steps and expect to begin applying machine learning tools shortly.

Python software and associated YAML configuration files can be accessed from the dhobern / AMT repository in Github. The motion detection software is from the Motion-Project / motion repository.

AMT-2 image capture

AMT-2 is triggered daily using the following crontab settings (shown for 29 September 2023):

# m h  dom mon dow   command
# Light Trap Moths
4 19 * * * python /home/pi/lighton.py
5 19 * * * motion
6 19 * * * /home/pi/setCamera.sh
5 3 * * * pkill motion
6 3 * * * python /home/pi/lightoff.py
7 3 * * * /home/pi/backup.sh
0 12 * * * python3 /home/pi/amt_crontab.py sunset+60 sunrise-120 480

These settings indicate that the trap will next execute the following series of actions:

  • 19:04 – turn on lights (high-power LEDs and ring-light)
  • 19:05 – begin motion detection
  • 19:06 – apply camera parameters once the camera is started
  • 03:05 – end motion detection (480 minutes after start)
  • 03:06 – turn off lights
  • 03:07 – copy images, configuration settings (YAML) and crontab and camera settings to staging folder for SFTP access
  • 12:00 – reset crontab times so motion detection on the next date will start an hour after sunset and end two hours before sunrise or after 480 minutes, whichever is earlier

The configuration settings for this trap are currently as follows:

event:
  basisofrecord: Machine observation
  coordinateUncertaintyInMeters: 2
  decimalLatitude: -35.264047
  decimalLongitude: 149.083427
  geodeticDatum: WGS84
  recordedby:
    email: dhobern@gmail.com
    name: Donald Hobern
    orcid: 0000-0001-6492-4016
provenance:
  capture:
    camera: Logitech BRIO 4K
    illumination: 10-inch ring light
    imageheight: 2160
    imagewidth: 3840
    mode: Motion
    operatingdistance: 250
    processor: Raspberry Pi 4
    unitname: AMT-2
    uvlight: High-power LED tube - 6 UV, 1 green, 1 blue, 1 white

The camera settings are the output from:

v4l2-ctl -d /dev/video0 --list-ctrls

The folder of images and metadata is automatically transferred over SFTP at 08:30 each data onto a (Windows 11) desktop machine for subsequent processing.

AMT-Alpha image capture

On startup, AMT-Alpha launches a Python script amt_modeselector.py. This awaits triggering via a push button on the outside of the unit. When this button is pushed, the script selects an action based on the position of a rotary switch which may in four states:

  • Automatic – script does nothing, assuming that a cron job is scheduled to start amt_timelapse.py at a specified time. This mode is for unattended use.
  • Manual – script immediately launches amt_timelapse.py.
  • Transfer – script runs amt_transfer.py. If an USB drive has been inserted in the external USB port, this then reads configuration options from /media/usb/AMT/amt_transfer.yaml and may transfer images, configuration files and logs onto to the USB drive and new configuration files or updated software onto the device.
  • Off – script triggers a soft shutdown of the device.

Regardless of whether image capture is triggered manually or via crontab, amt_transfer.py reads configuration settings specified in amt_settings.yaml (which overrides default values set for the unit in amt_unit.yaml and underlying default values for the software specified in amt_defaults.yaml). If a GPS sensor is attached, the unit inserts coordinates into the configuration metadata via a temporary YAML file amt_location.yaml. The complete final configuration is stored as an output file along with the images captured.

The following configuration file is from a run of AMT-Alpha on 28 September 2023. This included a 120-second delay before collecting images at 20-second intervals:

_configurationfiles:
- /home/pi/amt_defaults.yaml
- /home/pi/amt_unit.yaml
- /home/pi/amt_settings.yaml
- /home/pi/amt_location.yaml
event:
  basisofrecord: Machine observation
  coordinateTimestamp: '2023-09-28T19:03:11.621919+10:00'
  coordinateUncertaintyInMeters: 1
  decimalLatitude: -35.264043
  decimalLongitude: 149.08358
  geodeticDatum: WGS84
  lunarPhase: Full Moon
  recordedby:
    email: dhobern@gmail.com
    name: Donald Hobern
    orcid: 0000-0001-6492-4016
  sunriseTime: '2023-09-29T05:46:00+10:00'
  sunsetTime: '2023-09-28T18:04:00+10:00'
provenance:
  capture:
    awb_gains:
    - 2.8
    - 1.6
    awb_mode: 'off'
    brightness: 60
    camera: Raspberry Pi HQ + 6mm Wide Angle Lens
    contrast: 35
    envsensor: DHT22
    folder: /home/pi/AMT/
    gpioenvdata: 9
    gpioenvpower: 10
    gpiogpspower: 24
    gpiogreen: 25
    gpiolights: 26
    gpiomanualmode: 22
    gpiomodetrigger: 16
    gpiored: 7
    gpioshutdownmode: 17
    gpiotransfermode: 27
    gpssensor: BN220
    illumination: 10-inch ring light
    imageheight: 3040
    imagewidth: 4056
    initialdelay: 120
    interval: 20
    maximages: 720
    meter_mode: matrix
    mode: TimeLapse
    operatingdistance: 265
    processor: Raspberry Pi Zero W
    program: /home/pi/amt_modeselector.py
    quality: 50
    saturation: 0
    sharpness: 70
    transferimages: true
    trigger: Manual
    unitname: AMT-alpha
    uvlight: High-power LED tube - 4 UV, 1 green, 1 blue
    version: 0.9.2

Images and configuration files may be transferred for processing via a USB drive or SFTP.

Segmenting images

Images from both traps have been processed using SegmentImages.py, initially based on the published Danish code. This uses OpenCV to detect objects of interest (“blobs”) and then applies a cost calculation to determine which blobs are likely to represent the same insect in consecutive images.

The cost calculation is based on costs in five dimensions (calculated in amt_tracker.py).

  • Size – 0 if the two blobs have the same number of pixels, 1 if one blob is at least four times the size of the other, with linear interpolation for intermediate values
  • Distance – 0 if the centroids of the two blobs are within 25 pixels of one another, 0.01 if the two blobs overlap or their centroids are within 100 pixels, 0.02 if they are within 250 pixels, and in all other cases the distance divided by 4405 (as the maximum distance possible on the screen)
  • Color – crude comparison of similarity of colours in blobs. The pixels in each blob are assigned to one of eight cells in RGB colourspace (intensity less than or greater than 128 for each of the RGB components) identified as K for “black”, R for “red”, G for “green”, B for “blue”, C for”cyan”, M for “magenta”, Y for “yellow” and W for “white”. Blobs are then assigned a colour string including the letters for all cells including at least 2% of the pixels in the blob. A cost of 1/8 is then assigned for each colour letter associated with one blob and not the other.
  • Direction – 0 if this is interpreted to be the first or second detection of a species, otherwise 0 if the position is exactly aligned with the direction between the last two detections, 1 if the position is in exactly the reverse direction, with linear interpolation for intermediate angles.
  • Age – allowing for insects disappearing and reappearing within five consecutive images. 0 if the blobs are in consecutive images, 1 if last seen five images previously, with linear interpolation for intermediate ages.

These five costs are then assigned weights based on a subjective (slightly tested) assessment of their relative importance. The weights applied have generally been 4 for Size and for Distance, 2 for Direction and Colour and 1 for Age. This means that the weighted cost for assuming two blobs are related is a distance in a hypercube with sides measuring 4, 4, 2, 2 and 1 units, i.e. with a hypoteneuse length or maximum weight of sqrt(41). These are then normalised to the range 0 to 1. Only weighted costs below 0.25 are considered plausible redetections.

Blobs are then assigned to “tracks” (series of locations of the same presumed insect over multiple images) based on an effort to minimise total cost.

The Python code creates a data subfolder containing:

  • amt_image.csv – CSV list of all images captured by the unit, including date and time and associated temperature and humidity if these were collected.
  • amt_blob.csv – CSV list of all blobs, including source image, bounding box, size, cost calculations and other variables. Each record also includes a track identifier and a changed flag indicating whether the blob was new or altered compared to earlier images. A sample is included as the image at the top of this post.
  • blobs – a folder containing segmented JPEG images for all blobs with the changed flag set to True.

This process successfully links many blobs into tracks but is also prone to merge or confuse tracks when insects are very active. The weightings are arbitrary, and tuning the weights might improve the process. Tracks are only a convenience to simplify later stages in the process.

Editing tracks

Another Python program, TrackEditor.py, is used to edit the tracks and associate them with species (or higher taxon identifications). This is a crude Tkinter application that loads data from amt_blob.csv along with the associated blob images and presents these for review and identification. Results are written into amt_track.csv. This lists the tracks and associates them with the name for the associated taxon. The editor allows tracks to be split and merged, so it also rewrites amt_blob.csv with revised track identifiers for the blobs.

The following image shows the TrackEditor window for some of the insects recorded by AMT-2 on the night of 28/29 September 2023. Clearly, several insects have been combined into a single track with id 187 (the 103 in parentheses gives the mean length of the sides of the associated images). Similarly, tracks 220 and 225 are for the same insect and can be joined.

The available operations are:

  • Clicking on the first image in a track joins the track to the previous track.
  • Clicking on any other image splits the track into two tracks, with the second track beginning with the clicked image.
  • Clicking the link icons (to the right of the track identifiers) on any two tracks merges them into a single track.
  • A scientific name can be entered into the text field for each track – a taxon dictionary supports autocompletion.
  • The three letter codes assign common higher taxon names to the track (Insecta, Coleoptera, Diptera, Hymenoptera, Lepidoptera, Trichoptera, Hemiptera, Tortricidae, Oecophoridae, Formicidae and Araneae).
  • The first of the three icons opens a larger image view for the first image in the track with buttons to step through the track.
  • The second icon deletes the track.
  • The final icon opens a dialog allowing one or more images from the track to be selected and submitted as a new observation via the iNaturalist API.

The following image shows the result of clicking on the first image of track 225 to merge it with track 220 and the larger image view for track 194.

The following image shows the result of splitting and organising track 187 and of adding two species identifications.

To date, this editor has been used to label 15,719 tracks containing 369,319 segmented images for approximately 350 taxa. Many images are series with very little inter-frame variation. Many taxa are larger groupings such as Diptera or Larentiinae.

Next steps

Labeling tracks (and hence blobs) with identifications is time-consuming but should allow a rich training set to be prepared with images representing a large proportion of the local fauna.

Sufficient images may already have been tagged to support at least training a model to group insects into broad categories and discard images that do not clearly represent individual insects. The outputs from such a process could then speed preparation of species level training sets.

Categories
Araba Bioscan

Araba Bioscan SLAM 10-17 February 2023

Categories
Biodiversity Informatics Lepidoptera Species Lists

Updating Global Lepidoptera Index for Psychidae

This is a small update to the recent post on updating Global Lepidoptera Index (GLI) for Elachistinae species. I have subsequently reworked GLI for Psychidae, based primarily on:

  • Sobczyk, T. (2013) World Catalogue of Insects Volume 10, Psychidae (Lepidoptera). 1–467 pp.
  • Arnscheid, W.R. & Weidlich, M. (2017) Microlepidoptera of Europe Volume 8, Psychidae. 1–356 pp.
  • Papers known to Google Scholar relating to Psychidae and published since 2012 (many from Zootaxa, smaller numbers from Entomofauna, SHILAP, DEZ, etc.).

Names for Australian Psychidae in GLI were already largely up to date owing to earlier efforts to align with Nielsen, E.S., Edwards, E.D. & Rangsi, T.V. (1996) Checklist of the Lepidoptera of Australia (Monographs on Australian Lepidoptera Volume 4).

However, the coverage for the rest of the family reflected the original digitisation of the NHM card index. The card index itself seems to have been maintained less thoroughly than for many other families. Names for Psychidae in LepIndex reflect very dated concepts for genera and species synonymy.

The recent sources for this family vary to some degree in assignment of genera to subfamilies and tribes and in use of subgenera. GLI now follows Sobczyk 2013 in these respects, but overrides for European species from Arnscheid & Weidlich 2017.

Following all updates, the number of species known within the family has risen from 1,118 to 1,454. However, the total number of species names (including both accepted names and all synonyms) has more than doubled relative to LepIndex. Much of this is because of changes in generic placement and synonmy, although significant numbers of species and names even from as early as the 1970s were missing from the card index.

Overlap in species names within the family Psychidae between LepIndex and GLI. Names are considered to be a full match if spelling and authorship are identical (including parentheses) and if the two datasets give the same accepted name for the associated species.

Of the 2,938 species names now included in GLI, only 418 exactly match a name in LepIndex and also map to the same accepted species name in both datasets. The vast majority of accepted psychid names in LepIndex are no longer considered correct.

Even with many historical names now synonymised, updating Psychidae in GLI resulted in a 30% growth in the number of accepted species recorded for the family. This is in line with the estimates in the earlier Elachistinae post that between 27% and 41% of all accepted Lepidoptera species are missing from Lepindex and that around 40,000 more species still need to be added to the dataset.

Categories
Biodiversity Informatics Lepidoptera Species Lists

Updating Global Lepidoptera Index for Elachistinae

Background

Until 2022, Catalogue of Life (COL) and GBIF still relied on the NHM LepIndex dataset for names for almost all Lepidoptera (butterflies and moths). This is now superseded by a revised version of LepIndex maintained in TaxonWorks as the Global Lepidoptera Index (GLI). See this earlier post for more detail.

Methods

The concept used in LepIndex for the gelechioid family Elachistidae corresponded to what we now treat as a subfamily Elachistinae. At the time of its last import into COL, LepIndex had 491 scientific names associated with this (sub-)family, organised as follows:

  • Family – 1 accepted
  • Genus – 35 accepted
  • Species – 410 accepted, 1 provisionally accepted, 40 synonyms, 2 ambiguous synonyms
  • Subspecies – 2 accepted

In 2019, Lauri Kaila published An annotated catalogue of Elachistinae of the World (Lepidoptera: Gelechioidea: Elachistidae) in Zootaxa. I had already brought GLI up to date for the Australian Elachistinae treated in his 2011 Monographs of Australian Lepidoptera volume, so I decided to take the time also to update the remainder of this subfamily and to include all post-2019 species I could find. This is now completed, and GLI now includes 1284 names for the group. This total comprises names in Kaila 2019, those from newer papers, fossil names from LepIndex and a few nomina dubia that were not in the catalogue but seem plausibly to refer to elachistine moths. I was not rigorous about adding every historical combination for epithets that have passed through multiple genera, but original combinations and current combinations should all be present, as should original combinations for all synonyms. I did not update the micro-references that were already in place for older names, but the newer names link to structured citations.

Totals are now as follows:

  • Subfamily – 1 accepted
  • Genus – 14 accepted, 50 synonyms
  • Species – 819 accepted, 392 synonyms
  • Infraspecific taxa – 1 accepted, 7 synonyms

About five genera and around a dozen other species that were under Elachistidae in LepIndex previously have been moved to other families in the Lepidoptera. Many of these cases are discussed by Kaila, although a few represent highly outdated placements in the NHM catalogue that were apparently not even considered worth discussing. Many small genera have been synonymised into Elachista, Perittia or Stephensia. Four fossil genera are not treated by Kaila but are retained from LepIndex.

I fixed multiple misspellings that occurred in LepIndex either because information on the index cards was incorrect or during transcription into digital format. Despite the scale of the publication, I found no obvious misspellings in Kaila 2019.

Results

Based on these raw numbers, it is clear that LepIndex lacked around 50% of the currently expected number of accepted species for the family and that many synonyms were also missing. The actual situation was even more serious than this appears, because many names were accepted by LepIndex are now considered synonyms, and vice versa.

Here is a summary of results from the largest genus, Elachista. LepIndex had 355 names associated with 327 accepted species in this genus, whereas GLI has 1,046 names for 716 accepted species.

Overlap in species names within the genus Elachista between LepIndex and GLI. Names are considered to be a full match if spelling and authorship are identical (including parentheses) and if the two datasets give the same accepted name for the associated species.

Just 183 (56% of 327) accepted names in LepIndex exactly matched the spelling, authorship and status, and only 9 (32% of 28) synonyms exactly matched the spelling, authorship, status and accepted name offered by GLI. If variation in authorship (mostly missing years and/or parentheses) is ignored, these totals rise to 200 accepted names and 12 synonyms that match the expected species.

81 (25%) of the names accepted for Elachista species by LepIndex are now considered synonyms for other species in the genus. 36 accepted names (11%) now refer to species outside this genus.

6 (21%) of the LepIndex synonyms in this genus are now treated as synonyms for different species

In other words, of the 365 names that LepIndex associated with species in the genus Elachista, even ignoring issues with authorship strings, just 212 (58%) directed users to the currently accepted name for a species.

Reviewing this not from the perspective of what the taxonomic community knows and what names are actually in circulation for species in the genus Elachista (again ignoring issues with authorship):

  • Nearly 70% (507 of 716) of the currently accepted species names in Elachista were unknown to LepIndex/COL/GBIF a year ago
  • 78% (815 of 1,045) of the names now in TaxonWorks for Elachista species were unknown or incorrectly handled a year ago

Discussion

Elachistinae forms perhaps 0.3-0.4% of the total described Lepidoptera fauna, so these corrections are only a small step towards delivering a comprehensive and reliable catalogue for world Lepidoptera. This subfamily now joins Nepticuloidea, Gracillariidae, Gelechiidae, Lecithoceridae, Alucitidae, Pterophoridae, and Tortricidae as groups that are in good condition in the COL Checklist. Preparations are well under way to bring in some other major family-rank datasets that have been prepared over many years by dedicated groups of taxonomists. Both Geometridae and Bombycoidea are likely to be replaced in the next few months.

The rest of the Lepidoptera is covered by aging datasets. The Global Butterfly Information System dataset (GloBIS/GART) may soon be updated. This covers the Pieridae and Papilionidae. I am working on a refresh for Gaden S. Robinson’s Tineidae dataset which was last updated in 2011. Even the Nepticuloidea (last updated in 2016) is urgently awaiting a planned update. All the rest comes from LepIndex.

The following table compares accepted species counts for the same taxa in different datasets. This is a crude metric – if large numbers of names that should be treated as synonyms are included as accepted species names, this may inflate numbers. However, these numbers show clearly that effort to clean up LepIndex data always leads to significant increases in record counts.

TaxonLepIndexGLIRevisedYear↑%
NepticuloideaNA985107320169
GracillariidaeNA17462013202215
Elachistinae4108198192023100
Gelechiidae463947665799202325
Lecithoceridae78015181518202395
Alucitoidea186246260202340
PterophoroideaNA10571574202349
Tortricidae8697948511360201831
Geometridae212602249723969202213
Bombycoidea346351156617202291
Total[43213]482345500227
Total excl.
Geometridae
[21953]255373103341
Other
Lepidoptera
99690108627[126606]
[140563]
27
41
All
Lepidoptera
[142903]156861[181608]
[195565]
27
41
Comparison of accepted species counts for different Lepidoptera taxa between a) the last version of LepIndex imported into COL, b) Global Lepidoptera Index as of 2023-01-18, and c) versions curated in the last few years (year listed indicates date considered current) and considered nearly complete. Growth is shown as a percentage increase in the number of records since the older of LepIndex or GLI.

The COL version of LepIndex is missing names for taxa that had been sourced from other datasets prior to 2019. The total count provided for LepIndex uses GLI counts for these taxa – the total is therefore an overestimate, but the mean growth across these groups is at least 27%. Applying the same rate across all other Lepidoptera groups gives an estimate for the order of 181,608 accepted described species. There is reason to consider Geometridae an outlier since significant NHM work on the family preceded the 2011 version of LepIndex. Excluding Geometridae from the calculation raises the estimated percentage growth to 41%, giving an estimated species count of 195,565.

Revised versions are as follows: Nepticulidae and Opostegidae of the World (Oct 2016), Global Taxonomic Database of Gracillariidae (Jan 2022), GLI Elachistinae (Mar 2023), Catalogue of World Gelechiidae (Feb 2023), GLI Lecithoceridae (Mar 2023), Catalogue of the Alucitoidea of the World (Nov 2022), Catalogue of the Pterophoroidea of the World (Jan 2023), World Catalogue of the Tortricidae (Tortricid.net, Dec 2018), Geometridae (pending update, 2022), Bombycoidea (pending update, 2022). The last two datasets will be added to COL once associated taxonomic catalogues have been published.

The table shows two calculated estimates for the current total number of described Lepidoptera species. I consider it highly likely that most remaining groups will expand at least 41% as gaps in LepIndex are addressed. Given the large amount of ongoing revisionary work in the Noctuoidea (42,941 species in GLI today), it seems reasonable that this popular group may have gaps as significant as those shown here for Bombycoidea, which would inflate the numbers much further. At a minimum, Catalogue of Life today is likely to be missing 40,000 described Lepidoptera species.

I would note too that many I found for Elachistinae that LepIndex lacked many 19th century European and British names. Some of these are significant omissions, for example names from Haworth, Hübner and Herrich-Schäffer, including the currently accepted name for the widespread species Elachista freyerella (Hübner, 1825) (with hundreds of records in GBIF). Although the NHM card index was maintained into the 1990s, modern publications begin to disappear even from early in the 1980s.

I feel even more than before the need to make the scale of the challenge much more public and for COL to become more proactive in finding and promoting new ways for content to be edited. A traffic-light system for coverage and quality for each taxon would be a big step forward.