In population genetics, the amount of information for an analytical task is governed by the number of individuals sampled and the amount of genetic information measured on each of those individuals. In this work, we assessed the numbers of individual yellowfin tuna (Thunnus albacares) and genetic markers required for ocean-basin scale inferences. We assessed this for three distinct data analysis tasks that are often employed: testing for differences between genetic profiles; stock delineation, and; assignment of individuals to stocks. For all analytical tasks, we used real (not simulated) data from four sampling locations that span the tropical Pacific Ocean. Whilst spatially separated, the genetic differences between the sampling sites were not substantial, a maximum of approximately Fst = 0.02, which is quite typical of large pelagic fish. We repeatedly sub-sampled the data, mimicking a new survey, and performed the analyses. False positive rates were also assessed by re-sampling and randomly assigning fish to groups. Varying the sample sizes indicated that some analytical tasks, namely profile testing, required relatively few individuals per sampling location (n ≳ 10) and single nucleotide polymorphisms (SNPs, m ≳ 256). Stock delineation required more individuals per sampling location (n ≳ 25). Assignment of fish to sampling locations required substantially more individuals, more in fact than we had available (n > 50), although this sample size could be reduced to n ≳ 30 when individual fish were assumed to belong to one of the groups sampled. With these results, designers of molecular ecological surveys for yellowfin tuna, and users of information from them, can assess whether the information content is adequate for the required inferential task.
Read full abstract