Abstract:[Objectives] The genus Paedocypris comprises the smallest known fish species and one of the smallest vertebrates, with adult sizes ranging between 8﹣10 mm. Previous studies have indicated that the loss of Hox gene family members and the reduction of intron length and repetitive sequences might be potential causes for the miniaturization of the Paedocypris species and the simplification of their genomes, as demonstrated through draft genome assembly and comparative genomic analysis of P. carbunculus and P. micromegethes. The objectives of this study include characterizing the full-length transcriptome of the two species, P. carbunculus and P. micromegethes, identifying potential regulatory changes that could contribute to their miniaturized stature, with a particularly focus on Hox family and developmental gene expression patterns and gene structure. [Methods] All samples were purchased from an aquarium market in Guangzhou, comprising seven P. carbunculus and eleven P. micromegethes live specimens (Fig. 1). RNA was extracted from whole fish, and the purity, concentration, and integrity were tested. RNA from the seven P. carbunculus and the eleven P. micromegethes was mixed respectively and used for the construction of a full-length transcriptome library. The sequencing was performed on the PacBio Sequel platform. Quality control of full-length transcriptome sequencing data from the two Paedocypris species were performed. Then, we described the data quality, mapping statistics, annotation statistics and transcript classification of the full-length transcriptome data. By employing different software and pipelines, we annotated the gene structure, transcription factors, alternative splicing sites, long non-coding RNAs (LncRNAs), and gene fusions. To characterize the expression situation of Hox family genes in Paedocypris, we downloaded 81 zebrafish Hox family gene coding sequences belonging to 48 different Hox genes and align to the full-length transcriptome sequences of Paedocypris and other four species using Blastn with a threshold e-value < 10﹣20. To explore the expression of genes in important developmental pathways in the Paedocypris genus, we selected seven key GO developmental pathways encompassing 1 581 non-redundant zebrafish genes and align with the full-length transcriptome data of Paedocypris and the other four species for analysis. To explore whether there are significant structural differences between the developmental-related genes mapped in the Paedocypris genus and other genes on the Paedocypris genome, based on the GO annotation information of the Paedocypris reference genome, we conducted significance tests (t-test) on the developmental-related genes mapped in the Paedocypris genus against all annotated genes on the Paedocypris reference genome for alternative splicing, polyadenylation, exon count, and fusion genes. [Results] In this study, we conducted full-length transcriptome sequencing on whole-body samples of P. carbunculus and P. micromegethes (Tables 1﹣4), obtaining the first full-length transcriptome library for Paedocypris. After correction and deduplication, 38 651 and 23 165 corrected consensus sequences were obtained, with average lengths of 2 413 bp and 2 460 bp, respectively. Among these sequences, 35 883 (92.84%) and 20 381 (87.98%) were mapped to the Paedocypris reference genome (Table 2). The classification and characterization analyses of the full-length transcriptome data for these two species resulted in 19 352 and 11 139 full-length transcripts, respectively (Tables 4 and 5). In the full-length transcriptome data of P. carbunculus and P. micromegethes, 9 404 and 6 068 alternative splicing events were identified, predominantly of the A5 type (Table 6). We also observed that the majority of genes (74.55% and 84.75%, respectively) had only one alternative polyadenylation site. Annotation and classification of transcription factors revealed that the most abundant transcription factor family in Paedocypris was zf-C2H2, followed by ZBTB. Additionally, we predicted 1 382 and 1 649 LncRNAs in the two Paedocypris species, respectively (Fig. 2). Comparative analysis of the full-length transcriptome data between Paedocypris and carp, silver carp, salmon, and rainbow trout revealed that the number of expressed genes in the Hox gene family in Paedocypris was significantly lower than that in the other four fish species (Fig. 3). We also identified that Paedocypris has the most expressed developmental genes belonging to the seven GO terms than the four fish species (Fig. 3). We found that those developmental-related genes have fewer gene fusion events than other genes in the full-length transcriptome of Paedocyrpis (Fig. 4). [Conclusion] We provide a full-length transcriptome landscape of the genus Paedocypris by performing full-length transcriptome sequencing. Data quality, mapping statistics, annotation statistics, and transcript classification of the full-length transcriptome data of Paedocypris are described. The analysis highlights that the loss of Hox genes at the expression level may be one of the functional reasons for the miniaturization of Paedocypris. This study provides new insights into the molecular mechanisms underlying the miniaturization of Paedocypris.