Inference | cGRN Lab

Gene regulatory networks (GRNs) underpin cell identity and function. The ability to infer GRNs at scale opens up exciting opportunities to understand gene and cell functions at the systems level and discover potential therapeutic targets. This effort traces back to the microarray era, led by pioneers in the field. We began by exploring approaches grounded in causal inference and Mendelian randomization using genetic and bulk RNA-seq data [1-3]. However, bulk technologies suffer from limited cell-type resolution and high cost, inherently constraining what biology GRN inference could reveal.

Single-cell technology has revolutionized this landscape by allowing for cost-effective measurements of individual cells. We were fortunate to be among the early contributors to this shift. We likely produced the first genome-scale causal GRNs from Perturb-seq [4] and population-scale scRNA-seq [5], as well as dynamic GRNs that rewire continuously along differentiation trajectories using scRNA-seq and scATAC-seq [6]. These projects were based on different frameworks, including instrumental variables [4,5] and stochastic differential equations [6] (see more in Causality). Our current focus [5,6] is primary cells under natural perturbations, which yield more biologically relevant insights than many Perturb-seq experiments based on artificial cell lines and strong loss-of-function perturbations. Occasionally, we also explored related topics like cell-type specific co-expression networks [4].

Our work brings a distinctive blend of tastes. We emphasize computational efficiency, both in algorithmic design and implementation, often making our methods orders of magnitude faster and crucial for GRN inference at genome scale [1,4,5]. We pair this with comprehensive benchmarks using real and simulated datasets, which might take decades to run with existing tools but only minutes with ours. Our process typically starts with rapid prototyping of statistical or machine learning models, followed by iterative comparison and refinement using these benchmarks. We also value thinking out of the box and building ab initio models that don’t necessarily follow trends or norms. While some research groups may share one or two of these qualities, we’ve found that the whole combination, particularly computational efficiency and unconventional thinking is rare in this field.

Looking ahead, we will continue to illustrate the unique conceptual advantages of single-cell multiomics in GRN inference. We are actively developing computational methods to efficiently harness state-of-the-art data modalities and design comprehensive benchmarks to compare with alternative approaches. We also release open-source computer software to empower individual research labs to perform understand the GRNs in their own datasets.

[1] Wang, L. & Michoel, T. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. PLOS Computational Biology 13, e1005703 (2017).

[2] Wang, L. & Michoel, T. Controlling false discoveries in Bayesian gene networks with lasso regression p-values. arXiv:1701.07011 (2017).

[3] Wang, L., Audenaert, P. & Michoel, T. High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Front. Genet. 10, (2019).

[4] Wang, L. Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr. Nat Commun 12, 6395 (2021).

[5] Wang, L. Airqtl dissects cell state-specific causal gene regulatory networks with efficient single-cell eQTL mapping. bioRxiv: 2025.01.15.633041.

[6] Wang, L. et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat Methods 1–11 (2023).