Non-coding RNAs: Old Molecules, New Functions
Shifting Paradigms in Genomics?
- Fig. 1: Genomic location of the non-coding RNA DD3 (also known as PCA3) according to the Human Genome Browser (http://genome.ucsc.edu/). DD3 is mapped to human chromosome 9 as an anti-sense transcript to the PRUNE2 gene. The blue box indicates the orientation of DD3 and exon/intron distribution. The red arrow indicates an alternative splicing event for this gene.
- Fig. 2: Schematic representation of different types of ncRNAs and and their location in eukaryotic genomes. Examples of anti-sense ncRNAs, promoter-associated ncRNAs and ncRNAs in repetitive elements are shown. Bent arrows represent expression orientation. ncRNA, non-coding RNA; gDNA, genomic DNA.
- Fabricio F. Costa, PhD, Researcher at Children’s Memorial Research Center and Northwestern University, Chicago, USA and Genomic Enterprise CEO and Founder
Recent advance in DNA sequencing technologies are allowing deeper analyses of transcriptomes. The whole set of transcripts in eukaryotic cells (defined as the transcriptome) have been surveyed with a higher resolution showing that the majority of the genomic DNA is somehow transcribed in yeast, mice and human. These new transcripts might function as RNA and were named non-coding RNAs (ncRNAs) mainly because they do not code for proteins.
ncRNAs are not new in eukaryotic biology as exemplified by ribosomal RNA, transfer RNAs, and others. However, recent ncRNA classes have been described with diverse functions that range from control of gene expression to structural mechanisms. ncRNAs also play a role in different pathologies and have been considered as molecular biomarkers and drug targets.
Contemporary Non-coding RNAs - A New Generation
ncRNAs are defined as transcripts or "genes" produced by cells with a very low protein-coding capacity. It has been suggested that they do not produce a protein product but a few exceptions have been described. Deep transcriptome analyses have shown that these molecules are more common than expected in eukaryotic genomes [for review see 1]. Several examples have a defined function or were identified as linked to a specific phenotype or disease [2,3]. They can be divided in different groups as exemplified by microRNAs that can control gene expression by base-pairing with target messenger RNAs ; small RNAs such as promoter-associated RNAs that might function in gene regulation  and long ncRNAs that range from hundreds to thousands of kilobases and are implicated in epigenetic mechanisms and other processes . For example, HOTAIR is a long ncRNA that regulate genes by physically sitting in trans in a different chromosome . This ncRNA is mapped to human chromosome 12 in the HOX gene cluster, which is very important for proper embryo development . Interestingly, HOTAIR expression is regulated by epigenetic mechanisms that include chromatin-modifying complexes and it was recently associated to cancer metastasis . Another example of a long ncRNA is DD3 (also known as PCA3) .
This ncRNA was identified as specifically over-expressed in prostate cancer and has been suggested as a molecular biomarker and drug target for this type of cancer . As shown in figure 1, DD3 is an anti-sense transcript to another gene mapped to human chromosome 9. DD3 has also splicing variants indicating that it might have a specific function; some lines of evidence indicate that this ncRNA is responsive to hormones such as estrogen. Several examples of long ncRNAs are anti-sense to other genes (mostly protein-coding genes) and anti-sense transcription has become a common event in eukaryotic genomes. Importantly, long ncRNAs have been associated to a wide range of diseases from neurological syndromes to cancer [for review see 10]. Thus, recent evidence indicates that these new classes of ncRNAs are becoming important players in different aspects of eukaryotic biology.
Eukaryotic Genomes Work as RNA Machines
Recent reports have shown that, depending on the cell type, up to 90% of eukaryotic genomes can be transcribed with the great majority being ncRNAs . Transcription can occur in regions that were not previously associated to gene activity such as repetitive sequences that were thought to be inactive. Compared to protein-coding mRNA sequences, the amount of RNA generated from the genome with low-protein-coding potential is surprising (see figure 2 for examples of ncRNAs and their location in the genome). It has been also proposed that ncRNAs could represent a molecular advance for higher organisms in the evolutionary scale. In order to get new insights about ncRNA expression, our group has also been using different approaches with the advent of new generation DNA sequencing technologies. We have been able to produce extensive information from the transcriptome of normal and tumor samples from the brain. With a new methodology, whole transcriptomes were surveyed and we were able to generate ~5 Gigabases of sequence from different tissues. The main objective is to identify new ncRNAs that could be used as molecular markers for brain tumors. This approach is also useful to identify transcribed repetitive elements and fusion transcripts in cancer cells since the sensitivity of new generation DNA sequencing is much higher than old technologies.
We are now facing an important paradigm shift in the genomics field with ncRNAs becoming the rule instead of an exception in eukaryotes (noteworthy is the fact that prokaryotes produce ncRNAs but in a smaller number when compared to eukaryotic organisms). Probably the number of "genes" in eukaryotic genomes is underestimated if we take into account that most of the RNAs transcribed might be somehow functional. The next steps in this new field will be to catalogue all ncRNAs produced by eukaryotic genomes, specially the human genome, in normal and pathological conditions. Both biotechnology and academic sectors will have a lot of information derived from transcription to explore in the years to come bringing new opportunities in biomarker identification and drug development for specific diseases .
In conclusion, it is becoming clear that eukaryotic genes can be divided into two distinct categories of transcripts: protein-coding and non-coding. However, recent studies have been showing that the proportion of non-coding transcripts (~70-90%) is much higher than that of protein-coding (~3%). It is still too soon to affirm that all non-coding transcripts are functional but these new discoveries are clearly changing the "status quo" in genomic science. Why eukaryotic cells transcribe all this information without coding potential? These and other questions still remain without answers. However, the future of ncRNAs is promising and we can expect more surprises in this young evolving field in molecular genetics.
 Costa F.F.: Gene 386 (1-2), 1-10 (2007)
 Bond A.M. et al.: Nat Neurosci. 12 (8), 1020-1027 (2009)
 Perez D.S. et al.: Hum Mol Genet.17 (5), 642-655 (2008)
 Bartel D.P.: Cell 136 (2), 215-233 (2009)
 Seila A.C. et al.: Science 322 (5909), 1849-1851 (2008)
 Khalil A.M. et al.: Proc Natl Acad Sci U S A 106 (28), 11667-11672 (2009)
 Rinn J.L. et al.: Cell 129 (7), 1311-1323 (2007)
 Gupta R.A. et al.: Nature. 464 (7291), 1071-1076 (2010)
 de Kok J.B. et al.: Cancer Res. 62(9), 2695-2698 (2002)
 Costa F.F.: Gene 357 (2), 83-94 (2005)
 Wilhelm B.T. et al.: Nature 453,1239-1243 (2008)
 Costa F.F.: Drug Discov Today 14 (9-10), 446-452 (2009)