Combination of Various Sequencing, Assembly and Mapping Approaches
Creating a Reliably Genome Assembly for Emerging Aquatic Crops
- © Jack Moreh, freerangestock.com
- Fig. 1: S. polyrhiza growing on nutrient medium (left) and its 40 DAPI stained chromosomes (right).
- Fig. 2: Localization of pseudomolecules 08 and 04 on S. polyrhiza chromosomes. Pooled BAC probes of Ψ 04 (red) and Ψ 08 (yellow) labeled two different chromosome pairs (not one pair as suggested by Bionano optical map ) for seven investigated S. polyrhiza clones (white number on the left corner of each panel indicates clone ID). Probes were labeled by Cy3 (yellow) and Texas-Red (red), the chromosomes were counterstained with DAPI. Scale bars = 5 µm.
- Fig. 3: The complete karyotype of S. polyrhiza clone 9509. Multicolor FISH with 20 chromosome-specific probes shows the predicted pseudomolecule linkage for each chromosome on the same metaphase plate. The right image shows signals for (18S+26S) ribosomal DNA (red) on ChrS 01 and for 5S rDNA (yellow) on ChrS 13. The chromosomes were counterstained with DAPI. Scale bar = 5 µm.
With the improvement of next-generation sequencing (NGS) techniques, whole genomes of subjects of interest can be sequenced within one day. However, each NGS techniques (for instance: Illumina, PacBio, Oxford Nanopore) has its own characteristics and challenges regarding de novo genome assembly into a high-quality genome sequence map. By combination of various sequencing, assembly and mapping approaches a “gold-standard” genome sequence map can be achieved, even if a genetic linkage map is not available, as exemplified for the Greater duckweed Spirodela polyrhiza, an emerging aquatic crop of mainly asexual propagation.
Duckweeds are small-sized, free-floating, aquatic plants with the fastest growth rate among flowering plants and with highly reduced and miniaturized organs. The duckweed family (Lemnaceae) comprises 37 species within 5 genera: Spirodela (2 species), Landoltia (1), Lemma (13), Wolffiella (10) and Wolffia (11) with Spirodela as the most ancenstral and Wolffia as the most derived genus . Evolutionarily younger duckweeds display a decrease in size from 1.5 cm to less than one millimeter in diameter and expansion of their genomes from 160 Mbp to 2203 Mbp .
Duckweed with Small Genome Size
The first duckweed species chosen for whole genome sequencing was the asexual S. polyrhiza (fig. 1) due to its ancestral phylogenetic position, its economic potential as well as its small genome size (160 Mbp) indicating a low content of repetitive DNA . The genome was assembled for clone 7498 by integration of sequence reads from Roche/454 and Sanger ABI-3730Xl platforms, BAC and fosmid paired ends as well as 24 entire fosmids and by DNA fingerprinting of the BAC library. The genome assembly yielded 32 pseudomolecules of at least 1 Mbp, comprising 90% of the estimated genome size . Then, all the 32 pseudomolecules were validated and integrated into the 20 S. polyrhiza chromosomes by applying consecutive multicolor fluorescence in situ hybridization (mcFISH) experiments with 96 anchored BACs.
Three pseudomolecules turned out to be chimeric ones, revealing three mis-assemblies .
Genome assembly for another S. polyrhiza clone (9509) was established by the combination of high-depth short-read Illumina sequencing (95x coverage generated from 180 and 500 bp paired-end as well as 2, 5 and 20 kb mate-pair libraries) and high-throughput optical genome mapping using the BioNano Genomics Irys System . After treatment of high-molecular weight DNA with suitable restriction endonucleases, single-strand break points are refilled by labeled nucleotides and the nick-site labeled molecules are stretched within nanochannels, then their images are captured and converted by different algorithms into digital representations of nick-site-labeled molecules which are aligned de novo to generate a whole genome optical map.
Because of several discrepancies between the cytogenetic map for clone 7498  and the optical map for clone 9509 , the chromosomal assignment of pseudomolecules, and, as a consequence, the chromosome enumeration differed for 18 of the 20 chromosomes. These discrepancies could be due to clone-specific chromosome rearrangements and/or to mis-assembly of either of the genomes and/or too low DNA marker coverage in the cytogenetic study.
Serial mcFISH experiments with 106 fingerprinted BACs applied for chromosomal sequence assignment to seven S. polyrhiza clones (including both 7498 and 9509) resolved all discrepancies between previous maps which were due to errors or incompleteness of former studies but not to chromosomal rearrangements between clones (for instance, fig. 2), and revealed farther four chimeric pseudomolecules. In addition, integration of high-depth Oxford Nanopore assemblies which are based on much longer reads than obtained from sequencing strategies applied before, supported and extended the mcFISH results. Thus, the combined results of five different approaches (454, Illumina, Oxford Nanopore sequence assemblies, BioNano optical map and the cytogenetic map) enabled to correct and revise several linkages that were mis-interpreted or mis-assembled by single approaches before. Furthermore, FISH with ribosomal rDNA probes confirmed two 5S rDNA loci in S. polyrhiza which represent different copy numbers (60 vs 12 copies) as postulated by Oxford Nanopore sequence assembly.
Lastly, mcFISH with distinct pools out of the 101 BACs that have been anchored to the 38 pseudomolecules or pseudomolecule fragments allowed to label all 20 chromosome pairs specifically on a single metaphase plate (fig. 3). These results together improved and updated the genome sequence map of S. polyrhiza chromosomes as a reference for comparative genome mapping of other duckweed species  and proved the utility of cytogenetic and long-read sequencing approaches to validate and correct genome assembly result .
Overall, short-read DNA sequence techniques (e.g. Illumina platform) produce high quality reads for draft genome assembly, while long-read DNA sequence techniques (e.g.PacBio or Oxford Nanopore platform) provide less precise sequences, but longer contigs and fewer gaps and thus together reduce mis-assembly in a genome draft. The cytogenetic approach (FISH) can span the largest distances, is independent of repetitive sequences, that often break contiguity of assemblies and is unique that it enables a direct (microscopic) and independent control of assembly and optical mapping results at the chromosome level. Although mcFISH is laborious and time-consuming and requires an end-sequenced BAC library, it plays an important role in genome assembly validation especially for species for which it is difficult to obtain comprehensive genetic maps because of their mainly asexual mode of propagation, such as S. polyrhiza and other Lemnaceae species. In most cases, creating a “gold-standard” reference genome at the chromosome level of resolution will require more than two independent approaches to generate high-confidence genome sequence maps.
Phuong Thi Nhu Hoang1 und Ingo Schubert1
1Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Seeland, Germany
Prof. Dr. Ingo Schubert
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)
 Tippery, N. P., Les, D. H. & Crawford, D. J. Evaluation of phylogenetic relationships in Lemnaceae using nuclear ribosomal data. Plant biology 17 Suppl 1, 50-58, doi:10.1111/plb.12203 (2015).
 Hoang, P. T. N., Schubert, V., Meister, A., Fuchs, J. & Schubert, I. Variation in genome size, cell and nucleus volume, chromosome number and rDNA loci between duckweed genera. (submitted).
 Wang, W. et al. The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nature communications 5, 3311, doi:10.1038/ncomms4311 (2014).
 Cao, H. X. et al. The map-based genome sequence of Spirodela polyrhiza aligned with its chromosomes, a reference for karyotype evolution. The New phytologist 209, 354-363, doi:10.1111/nph.13592 (2016).
 Michael, T. P. et al. Comprehensive Definition of Genome Features in Spirodela polyrhiza by High-Depth Physical Mapping and Short-Read DNA Sequencing Strategies. The Plant journal : for cell and molecular biology, doi:10.1111/tpj.13400 (2017).
 Hoang, P. T. N. & Schubert, I. Reconstruction of chromosome rearrangements between the two most ancestral duckweed species Spirodela polyrhiza and S. intermedia. Chromosoma 126, 729-739, doi:10.1007/s00412-017-0636-7 (2017).
 Hoang, P. N. T. et al. Generating a high-confidence reference genome map of the Greater Duckweed by integration of cytogenomic, optical mapping and Oxford Nanopore technologies. The Plant journal : for cell and molecular biology, doi:10.1111/tpj.14049 (2018).