
Cancer could be considered an accelerated version of natural selection. This fast version of evolution allows us to peer into how cells navigate a world that wasnāt built for them and yet thrives. It also leads to a cornucopia of cell types and spatial distributions with various mutations, challenging diagnosis, and therapy development. The heterogeneity of cells, i.e. distribution of cells with different *omics content, makes the analysis, causality, and forecasting particularly difficult since most of our measurements in biology come from an ensemble of cells. Yet, pathological symptoms or phenotypic changes may only arise due to a distinct subset of cells undergoing specific genomic, transcriptomic, or proteomic changes.
Contrastingly, agricultural practices often involve cultivating large spatial regions with genetically uniform plants exhibiting similar *omics profiles. This is done due to consistency in supply forecasting and to take advantage of the economics of scale. Over the millennia, weāve learned the associated issues and challenges and introduced heterogeneity to the spatial locations through crop rotation and cover crops or additional inputs. However, large monoculture fields continue to have large challenges associated with disease, pests, and other potentially persistent ecological problems. The homogeneity of plants allows for an individual disease and pest occurrence to quickly be passed from plant to plant, turning the occurrence into a pandemic.
Just as cancer cells diversify and adapt through evolutionary forces over time, agricultural practices are built upon centuries of evolutionary trial and error. This agricultural evolution has honed the resilience and productivity of crop varieties across diverse environmental conditions. Yet, contemporary farming practices often involve sowing monocultures of select high-yielding cultivars across extensive tracts of land to optimize productivity and ensure uniformity in crop quality. However, even within these genetically identical monocultures, significant phenotypic diversity can arise as a result of micro-scale variations in environmental factors such as soil nutrient composition, microclimatic conditions, pest infestations, and disease prevalence. A central aim of precision agriculture is to unravel and manage this phenotypic diversity using technologies such as remote sensing, geographic information systems (GIS), and precision machinery. Doing so can achieve a more sustainable balance between high crop productivity and environmental stewardship.
Interestingly and tangentially related, most plants do not get cancers as we do; their cell walls create an additional barrier to humorous growth.
A plethora of cutting-edge tools and analytical methodologies, particularly the advent of single-cell RNA-sequencing (scRNA-seq), are enabling unprecedented insights into cellular heterogeneity. The wealth of information gathered by scRNA-seq can be overwhelming, making it difficult to focus on one particular plan of attack. However, it does allow us to distill actionable insights. It is, therefore, no wonder for at least two years, transcriptomics has been part of Nature Methods, āMethod of the year,ā12. (We should even include the 2022 method of the year long-sequence reads3 as part of this trend, making it thrice)
In this review, we delve into RNA-seq technology, the challenges presented by cellular heterogeneity, the intricacies of handling RNA-seq data, and the application of machine learning (ML) techniques to harness these technologies for advancements in life sciences and agriculture. Our primary focus will be on the analytical techniques that allow us to clarify and tackle this problem. Therefore biology will be a high-level overview.
RNA-seq
RNA sequencing is a technique used to quantify the presence of specific RNA sequences in cells. RNA indicates how much of each protein is expressed in cells; One either performs an RNA sequence in:
Bulk, where one performs RNA sequence on the constituents of a group of cells. Allowing to have a bulk measurement of RNA among a group of cells. Most common method due to its cost and speed. The comparisons would be among samples.
Single-Cell, where we perform RNA sequence in each cell. This requires separating each cell's contents, which can be done chemically or physically.Ā

Spatial; where we take a thin slice of tissue and perform RNA sequence in situ. This allows visualization of the RNA quantification, its relative location to other cells, and overall cellular organization within the tissue slide.Ā Note the resolution of these methods tends to be lower than the cell resolution.
Figure 4. Researchers developed an approach in which fixed, stained tissue is imaged, permeabilized, and the mRNAs attach to an array of barcoded oligos.ā allowing for spatially resolved transcriptomic information. Credit Marx, V. Method of the Year: spatially resolved transcriptomics. Nat Methods 18, 9ā14 (2021)
It is important to note that only a small percentage of the DNA codes for RNA that generate proteins, Other coded RNA omes in a variety of types and only a subsection of these type leads to proteins, messenger RNA (mRNA.) Other functional RNA type include; tRNA, rRNA, miRNA, piRNA, piwiRNA, 22G and 26G RNA etc. These can be used to regulated other genes, proteins, and RNA in often complex ways.
Sources of Cell HeterogeneityĀ
Similarly, unlike the examples and images in your biology textbook examples, there is no such thing as an average cell. Each cell in your body will uniquely express its genes. But what are the factors that drive this diversity? While there are many individual factors, they are usually categorized as Developmental Stage, Cell Cycle, Spatial (the relative location to other cells), and Environmental.Ā
Developmental Stage
Cells react and behave corresponding to their environment, further developing and activating and deactivating genes as they settle into their place and role within the tissue/organ. Even when they reach a similar developmental stage, no two cells in your organs are the same; however, statistically, they behave more similarly and have similar activation levels.Ā See Figure 5 on the developmental stages of a type of white blood cell called a B-Cell. The proteins and their distribution change over their developmental stages.
Cell LifeĀ cycle
Depending on which part of their life cycle, proteins, and RNA can be expressed in different quantities within cells. Some processes need to happen during mitosis that does not during normal function. This leads to further differences in the expression of different genes.
Spatial Location
Cells that make up the barrier between organs, such as in an epithelial layer (think surface), have different processes and quantities expressed due to their relative location within the organ tissue.Ā See Figure 4 for an example.
Environmental
Overlapping the above, cells can have processes initiated or controlled by the chemical environment due to signaling processes, receptor responses, or diffusion of new proteins and chemicals into the cells themselves, leading to specific genes being expressed in different ways.Ā For example, in Figure 5. some of the proteins changing in the diagram are receptors. Receptors allow the cell to interact with external variables, leading to a cascade of processes, which may further alter a cell's protein and RNA content.
All these factors, along with random mutation and potential mistakes in RNA translation and protein encoding, lead to initial genetic identical cells becoming heterogeneous. Evolution has exploited this heterogeneity and led to the explosion in multicellular life we see today. This allowed changes to occur even without much change in the original genomic content, allowing for fast adaptation to stressful species events.
Precision Solutions
In both the domains of oncology and agriculture, the overarching objective is precision. In oncology, this involves tailoring therapeutic interventions based on a deep understanding of the individual tumor's unique cellular and molecular heterogeneity. In agriculture, crop management strategies must be customized, driven by high-resolution omics data and spatially explicit environmental variables, to address the unique genomic and phenotypic landscape of individual plants or crop varieties. We call these, respectively, precision medicine and precision agriculture.
In the next part, we will go over how we obtain and process this data from animals and plants. We will then go through the SOTA machine learning methods associated with turning this information into decisions for researchers, doctors, patients, and growers.
Marx, V. Method of the Year: spatially resolved transcriptomics. Nat Methods 18, 9ā14 (2021). https://doi.org/10.1038/s41592-020-01033-y
Method of the Year 2019: Single-cell multimodal omics. Nat Methods 17, 1 (2020). https://doi.org/10.1038/s41592-019-0703-5
Marx, V. Method of the year: long-read sequencing. Nat Methods 20, 6ā11 (2023). https://doi.org/10.1038/s41592-022-01730-w