The effects of image augmentations when training machine learning models in astronomy

Butterworth, Leon H and Spindler, Ashley (2026) The effects of image augmentations when training machine learning models in astronomy. Monthly Notices of the Royal Astronomical Society, 548 (4): stag773. ISSN 0035-8711
Copy

We measure the influence of image augmentations and training dataset size when training a deep neural network to classify galaxy morphology. Data augmentation is an integral step when training machine learning models and often astronomers add augmentations assuming they will always improve the performance of their models. We train multiple versions of the same pre-existing Zoobot model using different image augmentations and different data set sizes from 230 000 galaxy images from Galaxy Zoo DECaLS to determine whether this assumption is necessarily true. We find that generally, the addition of image augmentations does improve a deep neural network’s performance; however, this improvement is significantly diminished as the training data set size increases. The choice of specific augmentations (provided they are sensible) does not seem to be as important as simply having augmentations as different augmentations result in similar increases in performances. We find that for a model of a given size, there exists a saturation point (when the model’s capacity has been filled with data) that cannot be surpassed with data augmentations. We find that more complex augmentations result in longer training times and might not lead to improved performance. If augmentations are added to the training process (which is recommended), simpler augmentations might be sufficient, depending on the size of the data set and model. We therefore encourage astronomers to carefully consider their use of image augmentations in an effort to reduce wasted time and computational resources.


picture_as_pdf
stag773.pdf
subject
Published Version
Available under Creative Commons: BY 4.0

View Download

EndNote BibTeX Reference Manager Refer Atom Dublin Core ASCII Citation METS Data Cite XML HTML Citation OpenURL ContextObject in Span RIOXX2 XML MPEG-21 DIDL MODS OpenURL ContextObject OPENAIRE
Export

Downloads