Galaxy morphology plays an important role in our studies and understanding of galaxy evolution. Structural components such as bulges, disks, spiral arms and bars formed during galaxies aggregated formation histories. As such, morphology is related to other properties that depend on formation and assembly history, such as colour, stellarmass and recent Star Formation Rate (SFR). By looking at the relation between mass and SFR, astronomers have been able to distinguish between three different populations. Most star-forming galaxies belong to the main sequence, and present morphological features typical of spiral or irregular galaxies. Objects in this population are also called Late-Type Galaxies (LTG). We can identify another population with much lower SFR and different shapes, mostly elliptical or bulge-dominated morphologies: we refer to these as Early-Type Galaxies (ETG). The transition between ETG and the main sequence is smoothed by an intermediate and less heavily populated region, called the “green valley”.
The recent increase in the use of machine learning methods has been beneficial for astronomy research, and is of particular interest for extracting information on the evolutionary paths of galaxies from their morphologies. Especially with the exponential rise in the amount of data from modern surveys it has become important to understand and apply intelligent algorithms able to classify galaxies with the same accuracy as human experts, if not even outperforming them.
The algorithms used for image classification typically rely on multiple costly steps, such as the Point Spread Function (PSF) deconvolution and the training and application of complex Convolutional Neural Networks (CNN) of thousands or even millions of parameters. CNNs classify galaxy images by processing different levels of information in each layer, aiming at a progressive recognition of complex features. In this approach, image recognition works well if the objects have clear edges. However galaxies’ outskirts are smooth: even traditional methods like PSF, used to measure structural properties, namely the 2D parametric fitting and the non-parametric analyses, are often prone to inaccuracies due to the difficulty of separating galaxy wings from the background. These boundary effects can be mitigated by using model constraints, but cannot completely prevent inaccurate estimations of structural parameters. Machine learning techniques are also subject to misclassifications for the same reasons, especially with low-resolution images. Another factor to account for when adopting intelligent algorithms is the data management and the speed of the analyses. The increasing volume of available images is difficult to manage and the number of operations processed in CNN models is high. Both training and testing large image data sets requires a lot of time and significant computational costs.
“These limiting factors led us to search for an alternative method, which performs an isophotal analysis of the galaxy light distribution, stores the information in a more manageable data format and performs classification lowering the total computational costs.”— Tarsitano, first author of the study
Thus, Tarsitano and colleagues, proposed a new approach to extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types, as opposed to smooth Sérsic profiles (Note: Sérsic profile is the most commonly used model. It is a parametric function with parameters describing structural properties such as size, magnitude, ellipticity, inclination and the rate at which light intensity falls off with radius (Sérsic index))..
Then, they trained and classified the sequences with machine learning algorithms, designed through the platform Modulos AutoML, and studied how they optimize the classification task. As a demonstration of this method, they used the second public release of the Dark Energy Survey (DES DR2).
They showed that, by applying it to this sample, they are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater then 300. This yields an accuracy of 86% for the early-type galaxies and 93% for the late-type galaxies, which is on par with most contemporary automated image classification approaches.
Finally, they demonstrated that their method allows for galaxy images to be accurately classified and is faster than other approaches. Data dimensionality reduction also implies a significant lowering in computational cost. While, in the perspective of future data sets obtained with e.g. Euclid and the Vera Rubin Observatory (VRO), their work represents a path towards using a well-tested and widely used platform from industry in efficiently tackling galaxy classification problems at the peta-byte scale.
“In the future, we will expand upon our promising results by developing a more robust isophotal measurement approach to focus on performance at low S/N, and target higher context features, such as bars, spiral arms and clumps.”— concluded authors of the study
Featured image: Confusion matrix representing the accuracy achieved in classifying galaxy profiles. The x-axis shows the true values, while the y-axis are the predicted categories. The main diagonal shows the correct classifications. The model seems quite robust in classifying the early-type galaxies of the sample. © Tarsitano et al.
Reference: F. Tarsitano, C. Bruderer, K. Schawinski, W. G. Hartley, “Image feature extraction and galaxy classification: a novel and efficient approach with automated machine learning”, Arxiv, pp. 1-9, 2021. https://arxiv.org/abs/2105.01070
Note for editors of other websites: To reuse this article fully or partially, kindly give credit either to our author S. Aman or provide link of our article