Varieties and hybrids of agricultural crops are characterized by a large number of indicators: morphological, economically valuable, biochemical. Usually, when conducting a comparative analysis of selection samples at the initial stage of research, only a few traits are used, which are assessed using one-dimensional criteria. In research on rapeseed breeding, an integrated approach is also important in the assessment and selection of promising samples, taking into account the morphological characteristics that are components of productivity; oil content and quality, as well as the glucosinolate content of the seeds. Cluster analysis is a multivariate method for determining the optimal values of the estimated indicators. The aim of the research is the analysis and selection at the initial stage of research of promising breeding samples of winter rape, suitable for further work, using the "k-means" clustering method. The material of the research is 125 breeding samples of winter rapeseed. The number of pods on the central branch, the content of oil and glucosinolates in the seeds was determined, and the fatty acid composition of the oil was analyzed (the content of palmitic, stearic, oleic, linoleic, linolenic and erucic acids in it). The studies were carried out during 2018-2019. in the conditions of the southern Steppe of Ukraine. Statistical processing and evaluation of research results was carried out using a modified "k-means" clustering method, which is carried out using Data Mining. It differs from the classical clustering method in the selection of the optimal number of model clusters, which is performed by the Statistica software package. The processing and analysis of the material under study was carried out in two stages. At the first stage, using cluster analysis by the "k-means" method, separately for economically valuable traits and fatty acid composition of the oil, clusters of samples with the best ratio of the corresponding indicators were determined. At the second stage, the best samples from these clusters were selected only by the content of oil and oleic acid, and again by clustering the group of samples with the maximum value of these indicators was selected. From the cluster analysis for the fatty acid composition of the oil, the sign of the content of linoleic acid was excluded due to its high correlation with oleic acid, as well as erucic acid due to the discrepancy between its sample and the normal distribution. Reduction of samples to dimensionless form, preceding cluster analysis, is carried out by normalization on the z-scale. As a result of cluster analysis, we obtained the distribution of samples according to economically valuable characteristics into four clusters, and according to the fatty acid composition of oil into two clusters. The samples that form these clusters were also identified. The first cluster for economically valuable characteristics unites 26 samples, the second 33, the third 39 and the fourth 27. The first cluster for the fatty acid composition of the oil contains 72 samples, the second 53. The highest content of oil in the seeds and the number of pods on the central branch with the minimum content of glucosinolates in the seeds are inherent in the third cluster, and the maximum content of oleic acid in the oil - in the samples that form the second cluster. Analysis of variance of the clustering results showed that the average values of the economically valuable traits and the fatty acid composition of the oil in the clusters differ statistically significantly. Thus, clustering by the "k-means" method formed clusters of samples that statistically significantly differ from each other in the studied characteristics. Only 15 samples are simultaneously included in the third cluster, formed according to economically valuable characteristics, and in the second cluster according to the fatty acid composition of the oil. The second stage is the selection of the best samples from this group based on the oil content in the seeds and the oleic acid content in it for further breeding work. Based on the results of cluster analysis, a distribution into four clusters was obtained. Finally, for further selection studies in order to obtain a high content of oleic acid in the oil, five samples of the first cluster were selected (the content of oleic acid in the oil is 69.4-70.6%, the oil content is 49.0-52.1%). And also three samples combined into the second cluster with an oil content of 51.1-51.8%. Thus, the effectiveness of the application of the modified clustering method "k-means" for the analysis of a large number of samples of winter rapeseed for several characteristics simultaneously with the aim of selecting genotypes with an optimal ratio of economically valuable indicators has been proved.
Read full abstract