Abstract

Freshwater catfish of the genus Clarias, known as the airbreathing catfish, are widespread and important for food security through small scale inland fisheries and aquaculture. Limited genomic data are available for this important group of fishes. The bighead catfish (Clarias macrocephalus) is a commercial aquaculture species in southeast Asia used for aquaculture and threatened in its natural environment through habitat destruction, over-exploitation and competition from other introduced species of Clarias. Despite its commercial importance and threats to natural populations, public databases do not include any genomic data for C. macrocephalus. We present the first genomic data for the bighead catfish from Illumina sequencing. A total of 128 Gb of sequence data in paired-end 150 bp reads were assembled de novo, generating a final assembly of 883 Mbp contained in 27,833 scaffolds (N50 length: 80.8 kbp) with BUSCO completeness assessments of 96.3% and 87.6% based on metazoan and Actinopterygii ortholog datasets, respectively. Annotation of the genome predicted 21,124 gene sequences, which were assigned putative functions based on homology to existing protein sequences in public databases. Raw fastq reads and the final version of the genome assembly have been deposited in the NCBI (BioProject: PRJNA604477, WGS: JAAGKR000000000, SRA: SRR11188453). The complete C. macrocephalus mitochondrial genome was also recovered from the same sequence read dataset and is available on NCBI (accession: MT109097), representing the first mitogenome for this species. Lastly, we find an expansion of the mb and ora1 genes thought to be associated with adaptations to air-breathing and a semi-terrestrial life style in this genus of catfish.

Highlights

  • Dataset for genome sequencing and de novo assembly of the Vietnamese bighead catfish (Clarias macrocephalus Günther, 1864)

  • Biology Genomics Sequencing raw reads, Assembly, Table, Figure, Illumina NovaSeq Raw Reads, Assembly, Protein and Transcript sequences DNA from a white muscle tissue sample of an adult catfish specimen was used for library preparation and sequencing

  • The tissue was preserved in ethanol and approximately 50 mg was used for genomic DNA extractions using a modified SDSchloroform method (Sokolov, 2000)

Read more

Summary

Data accessibility

Mitochondrial genome is available on NCBI under accession number MT109097 (https://www.ncbi.nlm.nih.gov/nuccore/MT109097). Raw data and final assembled contigs were deposited in the NCBI database under BioProject: PRJNA604477 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA604477), WGS: JAAGKR000000000 (https://www.ncbi.nlm.nih.gov/nuccore/1821738013), SRA: SRR11188453 (https: //www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=604477). Value of the Data First genomic dataset for the wild bighead catfish High BUSCO completeness and an assembly close to the estimated genome size indicate it will enable its use in selective breeding and population and conservation genetic studies of this native Vietnamese and commercial species. The data will facilitate genetic management for the genetic improvement and the conservation of bighead catfish populations including competition with introduced non-native species of Clarias. The data adds to the limited genomic available for the highly diverse catfish lineage [1,2,3,4]

Data description
DNA library construction and sequencing
Read pre-processing
Whole genome assembly and annotation
Enriched genes in Clarias macrocephalus
Findings
Mitochondrial genome assembly and annotation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call