Determining the geographical origin of seafood is crucial for regulators and fishing industries who seek to prevent commercial fraud, enforce food safety regulations, and encourage high standards in sustainable fisheries management. The cockle, Cerastoderma edule (Linnaeus 1768), is a key species in estuarine ecosystems and is harvested all over Europe. Therefore, traceability tools using quick and inexpensive techniques to identify the origin of this bivalve are of paramount importance to support law enforcement. In this work, we explore the potential of using Geometric Morphometric (GM) methods to identify the geographical origin of cockle specimens. This method is based on landmarks identified in the shell to trace the origin of specimens obtained in nearby aquatic systems (from <35 km to <250 km distance). Specimens were collected in five aquatic systems (Ria de Aveiro, the Tagus and Sado estuaries, and the Albufeira and Óbidos coastal lagoons) in Portugal. Shells were digitalized and 16 landmarks were identified in each right valve and analyzed using Generalized Procrustes Superimposition. The discriminating power for traceability of 12 statistical and machine learning methods was assessed based on the corresponding shape variables, using R and Python (Linear Discriminant Analysis (LDA), Canonical Variable Analysis (CVA), Principal Component Analysis (PCA), Between-Group PCA (bgPCA), Partial Least Squares Discriminant (PLSD), Classification Regression Tree (CRT), Logistic Regression (LR), Random Forest (RF), Gradient Boosting (GB), K Nearest Neighbors (KNN), Support Vector Machines (SVM), Extending Gradient Boosting (XGBoost) and Neural Networks (NNET). LDA, CVA, SVM, and NNET demonstrated greater accuracy and a F1-score >80%, even with a small and unbalanced sample size. The highest percentage of correctly assigned individuals was obtained in the Tagus estuary (mean 89%) and in the Albufeira lagoon (mean 93%), which were also the systems with more specimens measured (174 and 59 respectively), whereas the worst results were obtained in the Sado estuary (50%, 56 specimens). In the Albufeira coastal lagoon, the best classification methods reached 100% correct classifications. It further highlights the importance of establishing statistical standards, such as the ones developed in the current work, to evaluate different methods, as small changes in the procedure may cause substantial differences in the results and conclusions. The revision of previous works (presented as a table) showed often >90% of correct classification in both bivalves and gastropods, highlighting the potential of the techniques for other mollusks. Our results support the use of GM based on landmarks as a reliable tool for bivalve's traceability, since it is a quick, simple and inexpensive approach. Further research should extend these findings to other species and other shape analysis techniques.
Read full abstract