This is the fifth post in a series:
- Autonomous Moth Trap Project
- Autonomous Moth Trap Hardware Revisions
- Hardware and Software updates to Autonomous Moth Trap
- Alternatives for Autonomous Moth Trap
- Software components for Autonomous Moth Trap
- Time Lapse module for Autonomous Moth Trap
A key goal for the Autonomous Moth Trap project is to make it easier to collect and manage time series data on insects using units that operate in the field.
The Danish project generously shared both the software that was deployed on their Raspberry Pi systems and the Python scripts they have developed for processing the resulting images.
I have used some of the code and many of the concepts from these scripts in my own system. I have created a new Github project (github.com/dhobern/AMT) to manage and share my code and other digital assets. I welcome review, bug fixes and reuse.
There are at least eight software components that should form the core for a software-data ecosystem for this trap and that may well apply to other related use cases. I have existing implementations for a number of these, although many improvements are possible. Others can follow as more images are collected.
The figure at the top of this post shows some of the relationships between these components. The two components to the left execute inside the trap (using the Raspberry Pi as the processor). The rest execute on a desktop computer or laptop (Windows/Mac/Unix).
The path indicated by the green arrows reflects my current focus and what I hope to achieve in the near future. This involves automated collection of images, software assistance in deriving a species list and minimum counts (plus a range of metadata) for each species and then publication as a sample event dataset to GBIF or other public platforms.
As the number of identified images increases for a given location or region, it will become possible to execute the path indicated by the orange arrows, building a training set from identified images and then training a machine learning model for image recognition. There is also potential to integrate machine learning into the image segmentation stage to improve classification of interesting and uninteresting objects and to enhance recognition of the same individual in multiple images.
Once a model has been trained and works, it will be possible to activate the path indicated by the purple arrows and automate much more of the process. Quality control will be important and there should probably be other links that verify the identifications and feed more identified images in to retrain the model.
The following are brief notes on each of the components indicated. More detail will be presented in subsequent posts.
Time Lapse Capture
I have a working version of this component, written in Python and controlled with a JSON configuration file. It controls the lights (moth light, ring light for illumination), a temperature/humidity sensor and the camera (interval and number of images, brightness, contrast, saturation, sharpness and JPEG quality). The output is a folder containing a series of images with timestamps and temperature/humidity readings in the filename (but I plan to add these readings to the EXIF too), along with a timestamped copy of the JSON configuration file (since this contains metadata that may be useful later). My Raspberry Pi Zero unit uses this component triggered as a cron job (or as multiple cron jobs at different times of the night).
The version implemented by the Danish team uses the software developed by the Motion project, along with some small Python scripts to control lights, etc., all controlled via cron jobs. I have modified the scripts on my Raspberry Pi 4 unit to add temperature/humidity readings. I expect to expand my Time Lapse Capture component so it uses the Motion software as an alternative mode alongside Time Lapse. This will allow the configuration metadata to be largely identical for both options.
I have again worked from software developed by the Danish team but rewritten large sections to reflect my wishes. My version works on the folder produced by the Pi unit and then generates several derived products:
- A CSV file listing all images and associated metadata for each (temperature, humidity, etc.)
- A CSV file listing each “blob” of interest in any of these images, including coordinates, size, significant colours, an identifier for a “track” that represents a presumed repeated capture of the same individual across multiple images, etc.
- A folder contain cropped images for each blob that appears or changes between images
I will also store a timestamped copy of the configuration settings for the image segmentation as part of each output data set.
I have written a Python GUI that shows all blobs from each track as thumbnails, allows these tracks to be split or merged, uninteresting tracks to be deleted and a species or higher taxon to be added as an identification for each track. The outputs are a local taxon dictionary (for assisting entry of identifications – this output grows over time) and a CSV file with the identifications for each track. Since the track identifiers are changed by this tool, it also writes an updated version of the blob CSV file.
I have not yet implemented this component, but it will take the data from the image, blob and track CSV files and produce a derived CSV file with minimum counts for each species or taxon recorded during the night, packaging this (along with all metadata from the configuration files) as a sampling effect dataset (Darwin Core Archive or Frictionless Data) ready for publication to GBIF or other biodiversity data platforms.
Training Set Manager
Given the outputs from the Track Editor, it will be possible also to build lists of blob images (and associated metadata) for each species identified. These should be managed to allow selection of a good training dataset to build a machine learning model for species identification. As well as the image content, the metadata will have good information on size, movement, time of appearance, etc. which may improve the models.
The images in the training set can be used to develop a machine learning model to identify the same species in subsequent samples. Metadata from associated configuration files will be captured to assist future interpretation.
The final component will be a module that runs the machine learning model and generates similar data to the Track Editor (but with additional metadata). These results can then be fed directly to the Event Reporter or (more likely, especially in early phases) into a validation process.