Research Article |
Corresponding author: Ilya Kirov ( kirovez@gmail.com ) Academic editor: Gennady Karlov
© 2018 Ilya Kirov, Marina Gilyok, Andrey Knyazev, Igor Fesenko.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Kirov I, Gilyok M, Knyazev A, Fesenko I (2018) Pilot satellitome analysis of the model plant, Physcomitrella patens, revealed a transcribed and high-copy IGS related tandem repeat. Comparative Cytogenetics 12(4): 493-513. https://doi.org/10.3897/CompCytogen.v12i4.31015
|
Satellite DNA (satDNA) constitutes a substantial part of eukaryotic genomes. In the last decade, it has been shown that satDNA is not an inert part of the genome and its function extends beyond the nuclear membrane. However, the number of model plant species suitable for studying the novel horizons of satDNA functionality is low. Here, we explored the satellitome of the model “basal” plant, Physcomitrella patens (Hedwig, 1801) Bruch & Schimper, 1849 (moss), which has a number of advantages for deep functional and evolutionary research. Using a newly developed pyTanFinder pipeline (https://github.com/Kirovez/pyTanFinder) coupled with fluorescence in situ hybridization (FISH), we identified five high copy number tandem repeats (TRs) occupying a long DNA array in the moss genome. The nuclear organization study revealed that two TRs had distinct locations in the moss genome, concentrating in the heterochromatin and knob-rDNA like chromatin bodies. Further genomic, epigenetic and transcriptomic analysis showed that one TR, named PpNATR76, was located in the intergenic spacer (IGS) region and transcribed into long non-coding RNAs (lncRNAs). Several specific features of PpNATR76 lncRNAs make them very similar with the recently discovered human lncRNAs, raising a number of questions for future studies. This work provides new resources for functional studies of satellitome in plants using the model organism P. patens, and describes a list of tandem repeats for further analysis.
Physcomitrella patens , Bryophyta , satellite DNA, chromosomes, fluorescence in situ hybridization, long non-coding RNAs, rDNA
A substantial part of eukaryotic genomes is composed of different families of repetitive elements (REs). Some REs are ancient viruses (e.g., mobile elements), whereas others are de novo generated sequences without a specific structure. The latter include satellites, or tandem repeats (TRs), dispersed repeats and other repeat groups. TRs are the main components of heterochromatin, centromeres and telomeres (
The rapid evolution and high intra-monomer identity of TRs significantly hamper their study at the genome level. TRs are often collapsed or placed into an unassembled portion of the genome (e.g. Chr0, (
The latest discoveries, including the specific transcription of some TRs as satRNAs and lncRNAs, which play important roles in regulatory processes, have moved satellite DNA biology from structural genomics to functional genomics. Satellite DNA annotation has been performed for a long list of plant species, but there are only a few model plants that are suitable for deep functional studies of TRs. In addition, no model basal plants are present on this list, although they could facilitate the study of the TR evolution mechanisms on a long timescale. Here, we performed a pilot satellitome analysis of the model basal plant, the moss Physcomitrella patens (Hedwig, 1801) Bruch & Schimper, 1849. It is a widely used model plant for molecular and developmental biology, evolution and biochemistry studies (
pyTanFinder was written in python v3.6 using biopython (
For chromosome and nucleus preparation, the Gransden strain of P. patens was grown in Knop medium with 500 mg/L ammonium tartrate with 1.5% agar (Helicon, Moscow, Russian Federation) in a Sanyo Plant Growth Incubator MLR-352H (Panasonic, Osaka, Japan) with a 16-hour photoperiod at 24 °C and 61 μmol/m2s. Gametophores at different stages (green – light green sporophyte colors) were used for analyses. Chromosome preparation was performed according to the “SteamDrop” protocol (
Isolated DNA was used in NGS sequencing. A sequencing library was prepared by the NEBNext ultra DNA Library Prep Kit for Illumina (New England Biolabs, UK). After preparation of the samples, the libraries were analyzed using Qubit (Invitrogen) and 2100 Bioanalyzer (Agilent Technologies). Amplification of the samples was performed according to the protocol (Illumina) using MiSeq. Raw Illumina fastq files were de-multiplexed, quality filtered and analyzed using FastQC (
FISH was performed as previously described (
ID | Sequence |
---|---|
17_50 | (TAMRA)-AACCTTCTAGAAGAGAAGTTT |
21_215 | (TAMRA)-ACTTCCAGAGAGCATCGGCAA |
602_86 | (TAMRA)-AAGTGATGAACAAAATTTCTC |
04_78 | (TAMRA)-AACTTGCATTCTTCATTTTCA |
592_108 | (TAMRA)-ATTTCTTAGAAAATACGTTCT |
20_76 | (TAMRA)-AGTCCCGTCGCGAGTCCCGGA |
19_95 | (TAMRA)-ATAATTCTATCGGTTATGTTT |
05_92 | (TAMRA)-AATAATAGTAAAAGTTATAGC |
21_43 | (TAMRA)-ACCTTCAAGTGGACCTTAGTA |
01_31 | (TAMRA)-AATCAGCTCGAGTCGAGCTGA |
08_44 | (TAMRA)-AGCTGATGGCAGGTAAGGGAG |
02_27 | (TAMRA)-CTTCCGTCTTGGATCCGGAAT |
08_217 | (TAMRA)-AAAGTAGATCTAAAAATAAAA |
05_178 | (TAMRA)-ACACGAAACTCACAACTTACT |
21_43 | (TAMRA)-ACCTTAGTGGAGAAGTTCTGC |
18_62 | (TAMRA)-AGGGGAGTTTTCAAGTTTTTG |
10_116 | (TAMRA)-ATTGGAGAAGTATCATTGTAA |
16_64 | (TAMRA)-ATCGAAGAGCTAGCTTCAAGC |
1004_43 | (TAMRA)-AGAGAAGTTCTGTCCTTGCCT |
Total RNA from protonemata tissue was isolated according to Cove et al. 2000. The RNA quality and quantity were evaluated by electrophoresis in an agarose gel with ethidium bromide staining. The exact concentration was measured using the Quant-iT RNA Assay Kit, 5–100 ng on a Qubit 3.0 (Invitrogen, US). The cDNA for RT-PCR was synthesized using the MMLV RT Kit (Evrogen, Russia). Primers (Table
Gene id | Forward | Reverse |
---|---|---|
Pp3c20_303V3.1 | ATGGAGCGGGACAAGAGG | GAGTCCCGACCTCTGGCG |
Pp3c20_283V3.1 | CCCCCGCCAAAAATGGTTAC | CGGGACAAGGAAGAGGAGGA |
Pp3c19_9271V3.1 | ACTGGGCTCAAAGAAGGCAG | AGGAGGAAGAGGAGGAAGGC |
Pp3c14_12290V3.2 | CCCTAGCCTTTGGTTGCGTT | ACTCTCCCTTGCAATGGTCG |
Pp3c4_8299V3.1 | GTGTCGGGGTTAGGAAGTGG | TAGCTCTTGGAACTCGCTGC |
To find the TRs in the P. patens genome, we used the Tandem Repeat Finder tool (TRF, (
The monomer length of the TRs ranged from 27 to 217 bp (Fig.
General information about identified tandem repeats used for FISH analysis.
Id | Monomer length, bp | Repeat Explorer cluster | Abundancy, bp | Sequence |
---|---|---|---|---|
Pp17_50 | 50 | 10 | 285023 | GAACCTTCTAGAAGAGAAGTTTCTAGAACCTTCTAGAAAAGAAGCCTCTG |
Pp21_215 | 215 | 309 | 156974 | CACTTCCAGAGAGCATCGGCAATTTGAACTCTCTTGTGGAGTTGAATTTGTATAGATGTCGATCCTTGAAGGCACTTCCAGAGAGCATCGGCAATTTGAACTCTCTTGTGGAGTTGAATTTGTATGGATGTCGATCCTTGAAGGCACTTCCAGAGAGCATCGGCAATTTGAACTCTCTTGTGAAGTTGAATTTGGTAGATGTCGATCCTTGAAGG |
Pp602_86 | 86 | 2626 | 60915 | AAGTGATGAACAAAATTTCTCATTTTGCCAAGTGATGAACAAAATTTCTCATTTGCCAAGTGATGAACAAAATTTCTCATTTTGCC |
Pp04_78 | 78 | 340 | 38748 | CAACTTGCATTCTTCATTTTCATGCTCAACTTACATTCTCTATTTCCATGCTCAACTTGCATTCTCTATTTCCATGCT |
Pp592_108 | 108 | 1758 | 34258 | ATTTCTTAGAAAATACGTTCTAAATGCAAAGATACAATTTCTTAGAAAATACGTTCTAAATGCAAAGATACAATTTCTTAGAAAATACGTTCTAAATGCAAAGATACA |
Pp20_76 | 76 | 226 | 22386 | TCCCAGTCCCGTCGCGAGTCCCGGACTTCCTCCTCCTCTTCCTTGTCCCGCCGCGACTCCCTAGTCCCGGCGCGAG |
Pp19_95 | 95 | 363 | 18717 | ATAATTCTATCGGTTATGTTTAAGGTATTCAAGATATTATCATATACCAATGAATGAATAATGTGCCATTGCCCACCCAAATATTGGAGTTTACC |
Pp05_92 | 92 | 209 | 13907 | CCTCTAATAATAGTAAAAGTTATAGCAATAAATAATAATTATCAGACTTCCAATAATAGTAAAATTTATAGCAATAAATAATAATTATCGGA |
Pp21_43 | 43 | 1161 | 10324 | CCTTGCCTTCACCTTCAAGTGGACCTTAGTAGAGAAGTTTTGT |
Pp01_31 | 31 | 178 | 5381 | AATCAGCTCGAGTCGAGCTGATTTGCTTCTC |
Pp08_44 | 44 | 193 | 3978 | AGCTGATGGCAGGTAAGGGAGATTGCATGAATCAGCTCGAGTCG |
Pp02_27 | 27 | 118 | 3648 | CTTCCGTCTTGGATCCGGAATTGGCTC |
Pp08_217 | 217 | 227 | 3472 | TTTCTTAAAGTAGATCTAAAAATAAAAGTTTTGTCAAAAAAGTAGGCTTTGCTAAGTGATGACTAGAAGTGATTTCTATGTTTGAAGATGCAAAGCTCCTCTTGTTTGTTGTTAAGAAGTATAATTTACTAAAATAAGTTATTAAATAAACAGGAAAATCAAGACGTAAGATTCCTCACAAGATTTGGGATTTACTTCAGAAAACCAACAATTCAAG |
Pp05_178 | 718 | 2110 | 2848 | CACACGAAACTCACAACTTACTCCGCACACAACTGATCGTCGACAACGTCGTAAAGCAAGGCAACATCAGTGACAACAACGGGGAATCCTACAGTTTTGTGTCCACAACCTTCTCCTCACAAGTGAGATGAGGAACCCATCCGATATCTTTGTGAGGGAGTGATGATACCGGAGGAAT |
Pp21_43 | 43 | 1161 | 2648 | GTGGACCTTAGTGGAGAAGTTCTGCCCTTGCCTTCACCTTCAA |
Pp18_62 | 62 | 13 | 2608 | AGGGGAGTTTTCAAGTTTTTGCAAGGTTACTAGTTCGGTTTCATTGGAGGTTTTTGAAGATC |
Pp10_116 | 116 | 115 | 1619 | ATTGGAGAAGTATCATTGTAAAGCAAGACTATGGAGGTATAAAAAGGGAGGTACATTTACAAGATATAGATGCCTTTGATTTAAGTTTTATTAAAAAAAAAAAAAAAAAAAAAAAA |
Pp16_64 | 64 | 116 | 1572 | GGGGTTTTTTGGATCGAAGAGCTAGCTTCAAGCTCTTTTCAAGGTCACTAGGTTGGTTTCATTA |
According to the pyTanFinder results, 7 (37%) TRs have high (>18000 bp, hcTRs) and 12 (63%) TRs have low (<15000 bp, lcTRs) total abundance. We were able to design primers for 5 hcTRs and obtained ladder-like or smear PCR products (Fig.
We used FISH to determine whether the identified TRs occupy large clusters in the moss genome. A molecular cytogenetic approach to visualize DNA sequence loci on chromosomes and nuclei is challenging for bryophites (
Mitotic and meiotic chromosomes of P. patens after DAPI staining. Anaphase (A), 1n ((B) protonema, n=27) and 2n ((D) sporophyte, 2n=54) metaphases and pachytene (C) stages. Scale bar: 5 µm.
We designed 19 TAMRA oligonucleotide probes to perform a nuclei-FISH assay. To validate that the obtained slides were suitable for FISH experiments, we used known tandemly organized sequences, Arabidopsis-type telomeric repeat ((TTTAGGG)n) and 45S rDNA, as positive controls. FISH experiments revealed many dot-like (Fig.
Results of FISH with labeled probes designed on Arabidopsis-type telomere repeat (A), 45S rDNA (B) and 5 identified TRs: Pp602_86 (C), Pp21_215 (D), Pp20_76 (E), Pp19_95 (F) and Pp592_108 (G).
Three repeats (Pp602_86, Pp21_215, Pp592_108) gave several signals that occupied two distinct territories in the nucleus. FISH signals from one TRs, Pp19_95 (95bp monomer size), were associated with heterochromatin regions of the nucleus (Fig.
Thus, nuclei FISH analysis of 19 TRs identified by pyTanFinder pipeline showed 5 TRs with pronounced signals. Moreover, one (Pp19_95) of the repeats was associated with heterochromatin structures while another one (Pp20_76) was associated with perinucleolar bodies. The 5 TRs were used for further analysis.
Nuclear organization of Pp19_95 (A, C) and Pp20_76 (B, D) TRs. A and B picture series shows fluorescence on DAPI and TAMRA channels and merged pictures C RGB profile of the nucleus; blue and red lines show the pixel intensity for two Pp19_95 FISH signals and DAPI staining, respectively D Digitally zoomed in part of the nucleus with red Pp20_76 FISH signals. nc marks the nucleolus. Scale bar: 5 µm.
To integrate our data with the P. patens genome sequence, we mapped 5 TRs back to the assembled P. patens genome sequence and estimated the genomic distribution of the TRs. Up to 45% (for Pp19_95) of BLAST hits belonged to the sequences that were not included in any assembled chromosomes (scaffolds), suggesting a challenge in the assembly of the genomic regions carrying the TRs (Fig.
To further verify the results of nuclei-FISH and bioinformatics mapping, we performed FISH on moss chromosomes using two probes, Pp602_86 (single locus) and Pp20_76 (multiple loci). Although the chromosome preparation protocol needs to be further improved for P. patens, we were able to identify FISH signals from Pp20_76, located at the ends of two chromosome pairs, and from Pp602_86, located in the proximal positions of one chromosome pair (Fig.
Chromosome location of 5 TRs. A Bar plot showing the number of BLAST hits derived from scaffolds and chromosome sequences B Circos plot: the inner layer corresponds to the bar plot showing the number of BLAST hits of the TRs on the chromosomes; FISH localization of Pp20_76 (C) and Pp602_86 (D). Scale bar: 5 µm.
Because of the special location of Pp20_76 in the nucleus (near nucleolus) and the detected nucleus bodies enriched by this TR, we named this TR as PpNATR76(76 bp P. patens periNucleolar Associated Tandem Repeat) and analyzed it further. The alignment of 200 PpNATR76 sequences found in the moss genome showed a high conservation level between monomers. In addition, sequence analysis of the consensus PpNATR76 monomer revealed a long polypyrimidine track ((CCT)n motif). To determine why PpNATR76 DNA was located proximal to the nucleolus, we mapped the 45S rDNA to the moss genome. Using A. thaliana 45S rDNA gene (GenBank: X52320.1), we found two minor rDNA loci in the moss genome located on chromosomes 18 and 26 and one major rDNA locus on chromosome 20. The chromosomal location of 45S rDNA and PpNATR76 were identical on chromosomes 20 and 26, where they occupied c. 250Kb and 16Kb regions, respectively. Moreover, a detailed analysis of the loci revealed that PpNATR76 was located between 45S rDNA genes, in the IGS regions (Fig.
Genomic organization and epigenetic landscape of 45SrDNA/PpNATR76 locus. Top panel is a snapshot of CoGe GBrowser for P. patens (https://genomevolution.org/) . Logo picture from multiple alignment of 200 PpNATR76 monomers is shown at the bottom.
Because of the transcription activity of the PpNATR7-occupying region, our next aim was to find P. patens transcripts possessing the PpNATR76 TR. This analysis revealed 16 transcripts whose genes were located on 5 chromosomes (Chr20, Chr19, Chr4, Chr17, Chr14). Only 4 of the transcripts possessed annotated canonical ORFs (Pp3c19_9270V3.1, p3c19_9271V3.1, Pp3c4_8299V3.1 and Pp3c14_12290V3.1). Pp3c14_12290V3.1 was the only transcript that had ORF with homology to known proteins and was annotated as NADH:ubiquinone reductase, whereas predicted proteins from other PpNATR76 possessing transcripts did not show any homology to known proteins. These data suggested that the PpNATR76 transcripts mostly belonged to lncRNAs. To assess the robustness of the results, we performed a quantitative RT-PCR (qRT-PCR) validation of 5 PpNATR76 transcript genes (Pp3c20_303V3.1, Pp3c19_9271V3.1, Pp3c20_283V3.1, Pp3c14_12290V3.2, Pp3c4_8299V3.1) using protonemata RNA samples. For this experiment, DNA was taken as a positive control, whereas extracted RNA and MQ were negative controls. We then calculated the difference between the Cq values of pure RNA (DNA contamination control) and cDNA specific amplification. The results of qRT-PCR showed that all transcripts were expressed on detectable levels of > 5 delta. In addition, for 3 out of 5 genes, sense as well as antisense transcriptions were observed, whereas for two genes (Pp3c20_283, Pp3c14_12290) only one-way directed transcription was detected. Collectively this data proved the existence of the pPNATR76 transcripts in somatic cells and strongly suggested that PpNATR76 was transcribed as part of both protein coding and lncRNAs.
TRs with different monomer sizes are integral parts of most eukaryotic organisms, in which they are involved in diverse biological processes. Although many efforts have been made to understand the genomic organization, structure and evolution of TRs, their functions in a cell are still poorly understood. Here, we performed a pioneering identification and FISH verification of satellite repeats, forming a long array in the genome of the model plant, P. patens. We developed a pipeline, pyTanFinder, and identified 19 TRs, of which 5 TRs produced FISH signals. We found both heterochromatin associated and transcribed TRs. Genomic and transcriptomic analyses identified IGS-associated moss TR, PpNATR76, which was sequestered in the perinucleolar space and transcribed as a part of lncRNAs.
Advances in genome sequencing and bioinformatics approaches in the last decades has triggered the progress in satellite repeat isolation (reviewed by (
We found one IGS-related satellite repeat, named PpNATR76, that had several distinguishable features at the genome and transcriptome levels: 1) its DNA occupied distinct perinucleolar-associated chromatin bodies and most of its copies were located in IGS 45S rDNA spacer; 2) its DNA was hypomethylated and associated histones were enriched in ‘active’ chromatin marks and 3) it was transcribed into lncRNAs. The number (four signals for diploid nucleus used in this study) of PpNATR76 FISH signals was in agreement with previously observed 1–2 rDNA loci in moss and other bryophytes (
Satellite DNA repeats frequently originate in plant IGS DNA and have similar organization between closely related species (
In this study we extended the list of model plant species for TR studies with a well-known model “basal” plant, P. patens, and provided a set of new FISH-verified TRs for further functional and evolutionary analysis in moss. We described a new pipeline pyTanFinder for the identification of TR in fragmented genome sequences and demonstrated the conservation principle of IGS-related TR lncRNA expression between human and early diverged land plants. The results of our work will accelerate further studies of TR biology and function in a plant cell using the model “basal” plant P. patens.
This work was supported by the Russian Science Foundation (project No.17-14-01189). We thank Dr. Igor Mozhaiko for his help in moss propagation and Anna Philippova for her technical assistance in manuscript preparation.