Most of our genomics tools were built for DNA however RNA-level biological insights possess a deep sense of structure that heralds transformational potential for the practice and translation of medicine.1 Imagine every mRNA as a shipping order leaving the nucleus - the protein you eventually get is not determined by how many orders that we print (mRNA abundance) but also by whether each order actually gets fulfilled on the factory floor (translation). 2
The central dogma (DNA to RNA to protein to peptide) has multiple choke points, where cells regulate protein output at many steps (transcription, RNA processing, export, translation and decay). In a seminal work by Navickas et al, we discover that metastatic cancer cells change proteins in ways that cannot be captured by mRNA expression alone, but rather that translation is a major control layer for the metastatic regime 3. In traditional data infrastructure paradigms for genomics, this form of information is not sufficiently captured - the goal of the human genome project was to capture a cohesive picture of the genetic drivers of disease, yet to truly understand human disease (i.e. in metastatic breast cancer) regulatory behavior, translational efficiency and mRNA decay dynamics must also be captured; not just DNA-level genetic driver mutations.
Navickas et al demonstrate that metastatic cells do not only change what they transcribe - they also change what they allow to be translated from RNA, and that this control starts in the nucleus via alternative polyadenylation. 4 Meirona's single cell foundry captures this regulatory information as a first class citizen.
Today, scientists "look at" gene expression as a flat spreadsheet of gene-level counts (RNA-seq DEGs) plus a separate pile of papers and assays. This form and structure of data is not conducive to the ground-breaking discoveries that groups like the Arc Institute are endeavouring to uncover. For Navickas et al, this form of "static" gene expression data makes it difficult to recognize the HNRPNPC to APA (alternative polyadenylation) to TE (translational efficiency) to metastasis mechanistic flow difficult to see tabula rasa since this causal chain of relationships exist in structure (3' ends, motifs and binding sites) and flow (nucleus to cytoplasm) but not in gene-level abundance.
Meirona treats poly(A) site usage, 3'UTR isoforms, RBP binding event, translational efficiency and phenotypic events as first class objects in our data model.
{
Poly(A) site usage: (gene, polyA_site_id, proximal/distal, usage_delta, confidence),
3' UTR isoform: (transcript_end_isoform_id, length, gained/lost motifs, gained/lost miRNA sites),
RBP binding event: (RBP, peak_coord, target_isoform, binding_strength, CLIP_evidence),
Translational efficiency event: (isoform_id or gene_id, TE_delta, ribo_footprints, RNA_counts),
Phenotype event: (metastasis_metric, delta, assay, model)
}
To us, discovering fundamental mechanisms such as what Navickas et al found with HNRPC cannot be captured with static gene expression data, but rather causal data ontologies that can capture how APA selection and 3'UTR structures propagate into translational efficiency; without this, fundamental mechanistic insights that drive the next generation of therapeutics will be smeared out.
References and footnotes
Footnotes
-
- Navickas, A., Asgharian, H., Winkler, J. et al. An mRNA processing pathway suppresses metastasis by governing translational control from the nucleus. Nat Cell Biol 25, 892–903 (2023). https://doi.org/10.1038/s41556-023-01141-9
-
Culbertson, B., Garcia, K., Markett, D. et al. A sense-antisense RNA interaction promotes breast cancer metastasis via regulation of NQO1 expression. Nat Cancer 4, 682–698 (2023). https://doi.org/10.1038/s43018-023-00554-7
↩ -
- Translational efficiency measures the "protein produced per mRNA", by calculating the ratio between the ribosome footprints to the mRNA abundance (measured through Ribo-seq - how many ribosomes sit on each mRNA - and RNA-seq - how many mRNA molecules exist). There is a broad translational reprogramming (TE discrepancy) between poorly metastatic and highly metastatic breast cancer cells.
-
Alternative polyadenylation is a key gene regulation process where a single gene produces multiple mRNA transcripts with different 3' end locations, creating varied 3' untranslated regions (3' UTRs) that affect mRNA stability, translation, and localization, crucial for cell differentiation, proliferation, and disease states like cancer. APA generates different mRNA isoforms, some with shorter 3'UTRs (often linked to cancer) or longer 3'UTRs (common in neurons). ↩