Intragenomic variations of multicopy ITS2 marker in Agrodiaetus blue butterflies (Lepidoptera, Lycaenidae)

Abstract The eukaryotic ribosomal DNA cluster consists of multiple copies of three genes, 18S, 5. 8S and 28S rRNAs, separated by multiple copies of two internal transcribed spacers, ITS1 and ITS2. It is an important, frequently used marker in both molecular cytogenetic and molecular phylogenetic studies. Despite this, little is known about intragenomic variations within the copies of eukaryotic ribosomal DNA genes and spacers. Here we present data on intraindividual variations of ITS2 spacer in three species of Agrodiaetus Hübner, 1822 blue butterflies revealed by cloning technique. We demonstrate that a distinctly different intragenomic ITS2 pattern exists for every individual analysed. ITS2 sequences of these species show significant intragenomic variation (up to 3.68% divergence), setting them apart from each other on inferred phylogenetic tree. This variation is enough to obscure phylogenetic relationships at the species level.


Introduction
The eukaryotic ribosomal DNA (rDNA) cluster consists of three genes, 18S, 5.8S and 28S rRNAs, separated by two internal transcribed spacers, ITS1 and ITS2. This array forms a transcription unit, which is are typically represented in a genome by several hundred tandemly repeated copies (Long andDavid 1980, Gerbi 1985). The number of rDNA sequence variants can vary within a wide range both at the species and individual level. For example, different species of Drosophila Linnaeus, 1758 are estimated to have three to 18 variants of rDNA sequences (Stage and Eickbush 2007). The genome of sea sponge Amphimedon queenslandica Hooper & van Soest, 2006 was found to contain approximately 14.5 copies of rDNA sequences per haploid complement (Srivastava et al. 2010). Furthermore, individuals of the same species can have very different numbers of rDNA copies because the clusters display both meiotic rearrangements and somatic mosaicism. It has been shown that in humans for example, the number of rDNA sequences even within a single cluster can vary in an enormous extent, from one repeat unit up to 140 repeats (Stults et al. 2008).
Ribosomal RNA genes have been widely used in taxonomy, biogeographic, phylogenetic analyses, and molecular cytogenetic studies (Hillis and Davis 1986, Mindell and Honeycutt 1990, Wesson et al. 1993, Vogler and DeSalle 1994. In particular, more detailed and precise karyotypes studies became available since fluorescence in situ hybridization (FISH) technology was applied to the chromosomal physical mapping. FISH mapping identifies useful chromosomal markers that can be applied to studies of genome organization and species evolution and can also identify specific chromosomes, homologous chromosomes, chromosome rearrangements and sex chromosomes, among others (Nakajima et al. 2012). Ribosomal RNA genes are among the most mapped sequences in chromosomes in many animal groups including insects (Cabrero and Camacho 2008, Grozeva et al. 2010, Nguyen et al. 2010, Kuznetsova et al. 2012, Maryańska-Nadachowska et al. 2013, Gokhman et al. 2014, Kuznetsova et al. 2015, Vershinina et al. 2015. Accordingly, rDNA can be excellent source of cytogenetic markers for comparative genomic studies, evolutionary studies as well as the genetic identification of species (Mantovani et al. 2005, Pedrosa-Harand et al. 2006, Cabral-de-Mello et al. 2011).
At the nucleotide sequence level coding regions and spacers can reveal phylogenetic relationships ranging from the level of major phyla of living organisms to the population level, because they differ widely in their rate of evolution (Hillis and Dixon 1991, Wesson et al. 1992, Kuperus and Chapco 1994, Muccio et al. 2000, Wiegmann et al. 2000. 18S and 28S rDNA genes are reported to be highly informative to reconstruct higher-level phylogenies in plants and animals (see e.g. Soltis et al. 2000, Mukha et al. 2002.
Unlike highly conserved rRNA genes, non-coding fast evolving transcribed spacers have high level of interspecific variability. Therefore, the internal transcribed spacers are considered to be useful phylogenetic markers, specifically for low-level phylogenetic analyses. ITS1 and ITS2 have been used extensively in phylogenetic reconstruc-tion of closely related species and cryptic species complexes (Wilkerson et al. 2004). For instance, ITS have become the standard barcode of choice in most investigations for plants and fungi (Stoeckle 2003, Kress et al. 2005, Sass et al. 2007, Bellemain et al. 2010, Hollingsworth et al. 2011, Schocha et al. 2012, Li et al. 2015.
During PCR all variants of ITS sequences presented in genome are amplified, therefore, direct sequencing could lead to inaccurate or erroneous phylogenetic reconstructions. Accordingly identifying and examination levels of intragenomic and intraspecific variation among ITS sequences are of real importance.
Agrodiaetus is a species-rich subgenus within the Palearctic genus Polyommatus (Talavera et al. 2013). The subgenus includes ca. 130 described species (see Vila et al. 2010, Lukhtanov et al. 2008, 2015a, Vershinina and Lukhtanov 2010, Przybyłowicz et al. 2014, Lukhtanov and Tikhonov 2015. The subgenus was estimated to have originated only about three million years ago (Kandul et al. 2004). Nowadays this rapidly radiated group of butterflies is a model system in studies of speciation (Lukhtanov et al. 2005, Lukhtanov et al. 2015b, and rapid karyotype evolution (Kandul et al. 2007). Several molecular phylogenetic studies have been conducted on Agrodiaetus, also based on ITS2 molecular marker (Wiemers 2003, Wiemers et al. 2009, Wiemers et al. 2010, Lukhtanov et al. 2015a). However, until now rate of ITS2 intragenomic variations in this rapidly evolved group have never been analyzed.
This paper addresses a more detailed analysis of intraindividual variability of ITS2 region in three Polyommatus  (Lukhtanov et al. 2015b). Direct sequencing of ITS2 give ambiguous results; thus, we sought to clone and sequence ITS2 from these species to quantify the prevalence of intragenomic ITS2 variation and determine its effect on phylogenetic reconstructions.

Material and methods
Butterflies (only males) were collected in NW Iran (Zagros mt., Kordestan provience) in 2007-2014. Bodies were placed in 2 ml plastic vials with 100% ethanol for DNA analysis. Wings were stored in glassine envelopes for morphological study. All samples are stored at Zoological Institute, St Petersburg, Russia.
The PCR amplifications were performed either in 50 µl reaction volume containing ca. 10-20 ng genomic DNA and 0.5 mM of each primer, using 26 PCR Master Mix (Fermentas, Lithuania). The temperature profile was as follows: initial denaturation at 94 °C for 1 min, followed by 30 cycles of denaturation at 94 °C for 45 s, annealing at 50 °C for 45 s, and extension at 72 °C for 1 min with a final extension at 72 °C for 10 min.
Amplified fragments were purified using GeneJET Gel Extraction Kit (Fermentas, Lithuania). Purification was carried out according to the manufacturer's protocol. The success of PCR amplification and purification was evaluated by electrophoresis of the products in 1% agarose gel. Purified PCR product was used for direct sequencing or subsequent cloning.
ITS2 PCR products were cloned into blunt-end cloning vector pJET1.2 (Fermentas, Lithuania) according to the manufacturer's protocol for 10 minutes at room temperature. The pJet1.2 plasmid selects successful ligations through the disruption of an otherwise lethal gene, eco47IR, which enables positive selection of the recombinants. Before ligation, a 3'-A overhang were removed from the PCR products by treating the PCR product with a proofreading DNA polymerase. For transformation 5 µl of the ligation mixture reaction were added to 50 µl of chemo-competent E. coli DH101B cells an incubated for 10 min. on ice. After incubation transformation mixture were pipetted onto pre-warmed LB Anp IPTG agar plate and spread by using inoculation loop. Agar plates with competent E. coli were incubated overnight at 37 °C.
For each cloning, more than 500 clones were obtained. To check if the cloning procedures were successful, PCR with ITS2-speciffic primers were conducted for 20 colonies per cloning reaction. GeneJET Plasmid Miniprep Kit (Fermentas, Lithuania) was used for preparation of plasmid DNA from recombinant E. coli culture. A single colony from a freshly streaked selective plate were picked to inoculate 1-5 mL of LB medium supplemented with ampicillin and incubated for 12-16 hours at 37 °C while shaking at 200-250 rpm. The bacterial culture was harvested by centrifugation at 8000 rpm (6800 × g) in a microcentrifuge for 2 min at room temperature. The supernatant was decanted and all remaining medium was removed. The pelleted cells were resuspended and subjected to SDS/alkaline lysis to liberate the plasmid DNA. The resulting lysate was neutralized to create appropriate conditions for binding of plasmid DNA on the silica membrane in the spin column. Cell debris and SDS precipitate were pelleted by centrifugation, and the plasmid DNA were washed to remove contaminants and eluted.
Se quencing was car ried out using 3500xL analyzer (Applied Biosystems). Not less than 300 ng of plasmid DNA template was used for sequencing procedure. Cloned fragments were analyzed edited and aligned in Bioedit Software.
A Bayesian approach for estimating phylogeny was used. Bayesian trees were inferred using partitioned models: GTR for nucleotide substitutions and standard model for indels as implemented in MRBAYES v. 3.2 (Ronquist and Huelsenbeck 2012). Each gap (indel) was treated as a single character regardless of the length of the gap, under the assumption that a given gap is a result from one mutational event (Simmons and Ochoterena 2000).

Results
Sequenced region contained 3` end of 5.8S gene, ITS2, and the 5` end of the 28S gene. Direct sequencing of amplicons of 30 individuals (10 individuals per each species) displayed intra-individual heterogeneities in all specimens analyzed. There are two kinds of heterogeneities: single nucleotide substitutions and mono, bi-and multi-nucleotide insertions/deletions. The presence of heterogeneities was indicated by double peaks in substitution positions, and by a series of mixed peaks in case of indel events, both positioned after a sequence of good quality. The examples of heterogeneities revealed by direct sequencing are displayed in Figure 1.
To elucidate the visible heterogeneity, the amplicons for 2 specimens of P.(A.) peilei, 2 specimens of P.(A.) karindus and one specimen of P.(A.) morgani were cloned and 10 clones per specimen were sequenced. The summary of the heterogeneities in the ITS region displayed by the clones is depicted in Table 1. Partial sequences of 5,8S and 28S genes were cropped from further analysis. Total length of ITS2 varied from 477 bp up to 512 bp depending on the presence of insertions\deletions. Uncorrected "p" pairwise distances for all clones are given in Table 2.

W127 P.(A.) morgani #01 A A C -C G T ---T C A C A C G T T T T T T T T ----A A C G ---A A A A G G W127 P.(A.) morgani #02 A A C -C G T ---C C A C A C G T T T T T T T T T T T T A A C G ---A A T A G G W127 P.(A.) morgani #03 A A C -A T C C G C T G ---------------------G C A A G A G G T W127 P.(A.) morgani #04 A G C G A T T C G C T G ---------------------G C A A G A G G G W127 P.(A.) morgani #05 G G C G C G T ---T C A C A C G T T C T T T T T T T T -A A C G ---A A A A G G W127 P.(A.) morgani #06 A A C -C G T ---T C A C A C G T T T T T T T T T T --A A C G ---A A A A A G W127 P.(A.) morgani #07 A A C -C G T C G C T G ---------------------G C A A G A G G G W127 P.(A.) morgani #08 A A C G A T T C G C T C ---------------------G C A -A A A G T W127 P.(A.) morgani #09 A A T G A T T C G C T G ---------------------G C A A G A G G G W127 P.(A.) morgani #10 A A C -C G T ---T G ---------------------G C A A G A G G G
Clones of P.(A.) morgani ITS2 showed greater diversity than the other 2 species. For instance, the genetic distance between V127#05 clone and V127#03 was 3.68%. The average intragenomic genetic distance was also significantly higher for this species -1.77% In Bayesian analysis 50 cloned amplicons from P. (A.) peilei, P.(A.) karindus, P.(A.) morgani and ITS2 sequences from all Agrodiaetus species available in the GenBank were included, giving a total of 127 sequences. Since Polyommatus icarus (Rottemburg, 1775) was earlier inferred as sister clade to the subgenus Agrodi aetus (Talavera et al. 2013), we used one specimen (GenBank accession number AY556732) as outgroup to root the phylogeny. Fragment of consensus Bayesian tree, showing clusterization of cloned sequences is given in Figure 2. The complete tree is given online in the Suppl. material 1.

Discussion
Despite the popularity of the ITS2 nuclear rDNA marker in systematics of different groups of animals and plants, its variability on intraspecific and intraindividual level is still poorly known. The occurrence of multiple ITS2 copies within a single genome should be accounted for before rDNA is used for phylogenetic or population studies. Furthermore, investigation of rates of intra-individual polymorphism can lighten addressing questions regarding speciation, species hybridization end evolutionary history. It is generally considered that multigene families, such as rDNA maintain homogeneity of all copies as a result of concerted evolution (processes of gene conversion and unequal crossing over) (Zimmer et al. 1980, Dover 1982. Mutations rapidly spread to all members of the gene family even if there are arrays located on different chromosomes (Dover 1982, Arnheim1983, Gerbi 1985, Tautz et al. 1988). The efficiency of homogenization of rDNA is usually high (Liao 1999). Concerted evolution of noncoding sequences, such as internal transcribed spacers, can result in fixed interspecific differences and intraspecific homogeneity. Despite this assumption, our results show, that intraindividual variability can be maintained, when mutation rates are higher than rates of homogenization. This can lead to erroneous phylogenetic reconstructions and species misidentification.
Here we contribute with the first insight into the intraspecific ITS2 diversity in the blue butterflies of subgenus Agrodiaetus.
The ITS2 of all specimens of three Agrodiaetus species -(P.(A.) peilei, P.(A.) karindus and P.(A.) morgani) were intragenomically variable. There were a number of indels and base substitutions accounting for both the length and sequence variabilities. Numerous indels lead to length variation (477-512 bp) of studied sequences. Bayesian phylogenetic reconstruction revealed that cloned sequences of certain individuals did not form a monophyletic unanimity, but the majority of clones clustered together within species borders. In particular, clones of P. ) karindus (Z704_#08) also was recovered as sister taxa to abovementioned clade. Finally, W136_#08 clone of P. (A.) peilei is found to be more genetically distant from other clones of this individual than the great number of other species of the subgenus Agrodiaetus (Figure 2).
Recent works showed that tandem arrays of rRNA genes in most Lepidoteran species form one or two so-called rDNA clusters, although some exceptions in cluster number exist (Nguen et al. 2010). Data on the number and distribution of rDNA clusters in genomes of lycaenid butterflies are very scarce. Previous investigation by Vershinina et. al. (2015) examined ribosomal clusters in seven blue butterflies of the genus Polyommatus and showed the presence of two different variants of the location of major rDNA clusters in Polyommatus species: with one or two rDNA-carrying chromosomes in haploid karyotype (Vershinina et al. 2015). P.(A.) peilei, P.(A.) karindus and P.(A.) morgani were among studied species, which bear a single rDNA cluster. Thus, all intragenomic ITS2 patterns for every individual analysed, belong to a single rDNA cluster, which means that examined level of intragenomic variability not caused by sequencing ITS2 copies located on different chromosomes.
To conclude, our study demonstrates that the results of direct sequencing may not describe the actual and entire set of sequence variants. Level of divergence between clones of one individual can be comparable to interspecific genetic differences variations or even exceed them. Hence, cloning and subsequent intraindividual haplotypes handling are required for reliable phylogenetic reconstructions.