scVelo - RNA velocity generalized through dynamical modeling¶

scVelo is a scalable toolkit for RNA velocity analysis in single cells, based on Bergen et al. (Nature Biotech, 2020).
RNA velocity enables the recovery of directed dynamic information by leveraging splicing kinetics. scVelo generalizes the concept of RNA velocity (La Manno et al., Nature, 2018) by relaxing previously made assumptions with a stochastic and a dynamical model that solves the full transcriptional dynamics. It thereby adapts RNA velocity to widely varying specifications such as non-stationary populations.
scVelo is compatible with scanpy and hosts efficient implementations of all RNA velocity models.
scVelo’s key applications¶
estimate RNA velocity to study cellular dynamics.
identify putative driver genes and regimes of regulatory changes.
infer a latent time to reconstruct the temporal sequence of transcriptomic events.
estimate reaction rates of transcription, splicing and degradation.
use statistical tests, e.g., to detect different kinetics regimes.
scVelo has, for instance, recently been used to study immune response in COVID-19 patients and dynamic processes in human lung regeneration. Find out more in this list of application examples.
Latest news¶
Aug/2021: Perspectives paper out in MSB
Feb/2021: scVelo goes multi-core
Dec/2020: Cover of Nature Biotechnology
Nov/2020: Talk at Single Cell Biology
Oct/2020: Helmholtz Best Paper Award
Oct/2020: Map cell fates with CellRank
Sep/2020: Talk at Single Cell Omics
Aug/2020: scVelo out in Nature Biotech
References¶
La Manno et al. (2018), RNA velocity of single cells, Nature.
Bergen et al. (2020), Generalizing RNA velocity to transient cell states through dynamical modeling, Nature Biotech.
Bergen et al. (2021), RNA velocity - current challenges and future perspectives, Molecular Systems Biology.
Support¶
Found a bug or would like to see a feature implemented? Feel free to submit an issue. Have a question or would like to start a new discussion? Head over to GitHub discussions. In either case, you can also always send us an email. Your help to improve scVelo is highly appreciated. For further information visit scvelo.org.
About scVelo¶
Measuring gene activity in individual cells requires destroying these cells to read out their content, making it challenging to study dynamic processes and to learn about cellular decision making. The introduction of RNA velocity by La Manno et al. (Nature, 2018) has enabled the recovery of directed dynamic information by leveraging the fact that newly transcribed, unspliced pre-mRNAs and mature, spliced mRNAs can be distinguished in common single-cell RNA-seq protocols, the former detectable by the presence of introns. This concept of measuring not only gene activity, but also their changes in individual cells (RNA velocity), has opened up new ways of studying cellular differentiation. The originally proposed framework obtains velocities as the deviation of the observed ratio of spliced and unspliced mRNA from an inferred steady state. Errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated.
With scVelo, developed by Bergen et al. (Nature Biotechnology, 2020), these restrictions are addressed by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to a wide variety of systems comprising transient cell states, which are common in development and in response to perturbations. Further, scVelo infers gene-specific rates of transcription, splicing and degradation, and recovers the latent time of the underlying cellular processes. This latent time represents the cell’s internal clock and approximates the real time experienced by cells as they differentiate, based only on its transcriptional dynamics. Moreover, scVelo identifies regimes of regulatory changes such as stages of cell fate commitment and, therein, systematically detects putative driver genes.
RNA velocity models¶
With RNA velocity, inference of directional trajectories is explored by connecting measurements to the underlying mRNA splicing kinetics: Transcriptional induction for a particular gene results in an increase of (newly transcribed) precursor unspliced mRNAs while, conversely, repression or absence of transcription results in a decrease of unspliced mRNAs. Hence, by distinguishing unspliced from spliced mRNA, the change of mRNA abundance (RNA velocity) can be approximated. The combination of velocities across mRNAs can then be used to estimate the future state of an individual cell.
RNA velocity estimation can currently be tackled with three existing approaches:
steady-state / deterministic model (using steady-state residuals)
stochastic model (using second-order moments),
dynamical model (using a likelihood-based framework).
The steady-state / deterministic model, as being used in velocyto, estimates velocities as follows: Under the assumption that transcriptional phases (induction and repression) last sufficiently long to reach a steady-state equilibrium (active and inactive), velocities are quantified as the deviation of the observed ratio from its steady-state ratio. The equilibrium mRNA levels are approximated with a linear regression on the presumed steady states in the lower and upper quantiles. This simplification makes two fundamental assumptions: a common splicing rate across genes and steady-state mRNA levels to be reflected in the data. It can lead to errors in velocity estimates and cellular states as the assumptions are often violated, in particular when a population comprises multiple heterogeneous subpopulation dynamics.
The stochastic model aims to better capture the steady states. By treating transcription, splicing and degradation as probabilistic events, the resulting Markov process is approximated by moment equations. By including second-order moments, it exploits not only the balance of unspliced to spliced mRNA levels but also their covariation. It has been demonstrated on the endocrine pancreas that stochasticity adds valuable information, overall yielding higher consistency than the deterministic model, while remaining as efficient in computation time.
The dynamical model (most powerful while computationally most expensive) solves the full dynamics of splicing kinetics for each gene. It thereby adapts RNA velocity to widely varying specifications such as non-stationary populations, as does not rely on the restrictions of a common splicing rate or steady states to be sampled.
The splicing dynamics
is solved in a likelihood-based expectation-maximization framework, by iteratively estimating the parameters of reaction rates and latent cell-specific variables, i.e. transcriptional state k and cell-internal latent time t.
It thereby aims to learn the unspliced/spliced phase trajectory. Four transcriptional states are modeled to account for all possible configurations of gene activity: two dynamic transient states (induction and repression) and two steady states (active and inactive) potentially reached after each dynamic transition.
In the expectation step, for a given model estimate of the unspliced/spliced phase trajectory, a latent time is assigned to an observed mRNA value by minimizing its distance to the phase trajectory. The transcriptional states are then assigned by associating a likelihood to the respective segments on the phase trajectory (induction, repression, active and inactive steady states). In the maximization step, the overall likelihood is then optimized by updating the parameters of reaction rates.
The model yields more consistent velocity estimates and better identification of transcriptional states. It further enables the systematic identification of dynamics-driving genes in a likelihood-based way, thereby finding the key drivers that govern cell fate transitions. Moreover, the dynamical model infers a universal cell-internal latent time shared across genes that enables relating genes and identifying regimes of transcriptional changes.
For best results and above-described additional insights, we recommend using the dynamical model. If runtime is of importance, the stochastic model is advised to be used as it very efficiently approximates the dynamical model, taking few minutes on 30k cells. The dynamical yet can take up to one hour, however, enhancing efficiency is in progress.
See Bergen et al. (2020) for a detailed exposition of the methods.
Installation¶
scVelo requires Python 3.6 or later. We recommend to use Miniconda.
PyPI¶
Install scVelo from PyPI using:
pip install -U scvelo
-U
is short for --upgrade
.
If you get a Permission denied
error, use pip install -U scvelo --user
instead.
Development Version¶
To work with the latest development version, install from GitHub using:
pip install git+https://github.com/theislab/scvelo@develop
or:
git clone https://github.com/theislab/scvelo && cd scvelo
git checkout --track origin/develop
pip install -e .
-e
is short for --editable
and links the package to the original cloned
location such that pulled changes are also reflected in the environment.
To contribute to scVelo, cd
into the cloned directory and
install the latest packages required for development together with the pre-commit hooks:
pip install -r requirements-dev.txt
pre-commit install
Dependencies¶
Parts of scVelo (directed PAGA and Louvain modularity) require (optional):
pip install python-igraph louvain
Using fast neighbor search via hnswlib further requires (optional):
pip install pybind11 hnswlib
Jupyter Notebook¶
To run the tutorials in a notebook locally, please install:
conda install notebook
and run jupyter notebook
in the terminal. If you get the error Not a directory: 'xdg-settings'
,
use jupyter notebook --no-browser
instead and open the url manually (or use this
bugfix).
If you run into issues, do not hesitate to approach us or raise a GitHub issue.
scvelo - RNA velocity generalized through dynamical modeling
API¶
Import scVelo as:
import scvelo as scv
After reading the data (scv.read
) or loading an in-built dataset (scv.datasets.*
),
the typical workflow consists of subsequent calls of
preprocessing (scv.pp.*
), analysis tools (scv.tl.*
) and plotting (scv.pl.*
).
Further, several utilities (scv.utils.*
) are provided to facilitate data analysis.
Read / Load¶
|
Read file and return |
|
Read .loom-formatted hdf5 file. |
Preprocessing (pp)¶
Basic preprocessing (gene selection and normalization)
|
Filter genes based on number of cells or counts. |
|
Extract highly variable genes. |
|
Normalize each cell by total counts over all genes. |
|
Logarithmize the data matrix. |
|
Filtering, normalization and log transform |
Moments (across nearest neighbors in PCA space)
|
Principal component analysis [Pedregosa11]. |
|
Compute a neighborhood graph of observations. |
|
Computes moments for velocity estimation. |
Tools (tl)¶
Clustering and embedding (more at scanpy-docs)
|
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. |
|
Embed the neighborhood graph using UMAP [McInnes18]. |
Velocity estimation
|
Estimates velocities in a gene-specific manner. |
|
Computes velocity graph based on cosine similarities. |
|
Projects the single cell velocities into any embedding. |
Dynamical modeling
|
Recovers the full splicing kinetics of specified genes. |
|
Test to detect cell types / lineages with different kinetics. |
Dynamical genes
|
Rank genes for velocity characterizing groups. |
|
Rank genes by likelihoods per cluster/regime. |
Pseudotime and trajectory inference
|
Computes terminal states (root and end points). |
|
Computes a pseudotime based on the velocity graph. |
|
Computes a gene-shared latent time. |
|
PAGA graph with velocity-directed edges. |
Further tools
|
Computes velocity clusters via louvain on velocities. |
|
Computes confidences of velocities. |
|
Score cell cycle genes. |
Plotting (pl)¶
Base scatter plot
|
Scatter plot along observations or variables axes. |
Velocity embeddings
|
Scatter plot of velocities on the embedding. |
|
Scatter plot of velocities on a grid. |
|
Stream plot of velocities on the embedding. |
Velocity graph
|
Phase and velocity plot for set of genes. |
|
Plot of the velocity graph. |
|
Plot PAGA graph with velocity-directed edges. |
Further plotting
|
Plot pie chart of spliced/unspliced proprtions. |
|
Plot time series for genes as heatmap. |
|
Plot a histogram. |
Datasets¶
Pancreatic endocrinogenesis. |
|
|
Dentate Gyrus neurogenesis. |
Developing human forebrain. |
|
Dentate Gyrus neurogenesis. |
|
Mouse gastrulation. |
|
Mouse gastrulation subset to E7.5. |
|
Mouse gastrulation subset to erythroid lineage. |
|
Human bone marrow. |
|
Peripheral blood mononuclear cells. |
|
|
Simulation of mRNA splicing kinetics. |
Utils¶
Get data by key
|
Get dataframe for a specified adata key. |
Get gene info
|
Retrieve gene information from biothings client. |
Data preparation
|
Delete not needed attributes. |
|
Clean up the obs_names. |
|
Merge two annotated data matrices. |
|
Proportions of abundances of modalities in layers. |
Getters
|
Computes moments for a specified layer. |
|
Computes cell-to-cell transition probabilities |
|
Simulate cell transitions |
|
Get extrapolated cell state. |
Converters
|
Retrieve ensembl IDs from a list of gene names. |
|
Retrieve gene names from ensembl IDs. |
Least squares and correlation
|
Solves least squares X*b=Y for b. |
|
Pearsons/Spearmans correlation coefficients. |
|
Test for bimodal distribution. |
Settings¶
|
Set resolution/size, styling and format of figures. |
Release Notes¶
Version 0.2.4 Aug 26, 2021¶
Perspectives:
Landing page and two notebooks accompanying the perspectives manuscript at MSB.
New datasets: Gastrulation, bone marrow, and PBMCs.
New capabilities:
Added vignettes accompanying the NBT manuscript.
Kinetic simulations with time-dependent rates.
New arguments for tl.velocity_embedding_stream (PR 492).
Introduced automated code formatting flake8 and isort (PR 360, PR 374).
tl.velocity_graph parallelized (PR 392).
legend_align_text parameter in pl.scatter for smart placing of labels without overlapping.
Save option for pl.proportions.
Bugfixes:
Pinned sphinx<4.0 and nbsphinx<0.8.7.
Fix IPython import at CLI.
Version 0.2.3 Feb 13, 2021¶
tl.recover_dynamics: Multicore implementation thanks to M Klein, Y Schaelte, P Weiler
CI now runs on GitHub Actions
New utility functions:
utils.gene_info: Retrieve gene information from biothings client.
utils.convert_to_ensembl and utils.convert_to_gene_names: Converting ensembl IDs into gene names and vice versa.
Version 0.2.2 July 22, 2020¶
tl.paga: PAGA graph with velocity-directed edges.
Black code style
Version 0.2.1 May 28, 2020¶
Bugfixes:
Correct identification of root cells in tl.latent_time thanks to M Lange
Correct usage of latent_time prior in tl.paga thanks to G Lubatti
Version 0.2.0 May 12, 2020¶
New vignettes:
RNA velocity basics
Dynamical Modeling
Differential Kinetics
Tools:
tl.differential_kinetic_test: introduced a statistical test to detect different kinetic regimes.
tl.rank_dynamical_genes: introduced a gene ranking by cluster-wise likelihoods.
tl.paga: introduced directed PAGA graph
Plotting:
pl.scatter enhancements: linear and polynomical fits, gradient coloring
pl.proportions: Pie and bar chart of spliced/unspliced proprtions.
GridSpec: multiplot environment.
Version 0.1.20 Sep 5, 2019¶
Tools:
tl.recover_dynamics: introduced a dynamical model inferring the full splicing kinetics, thereby identifying all kinetic rates of transcription, splicing and degradation.
tl.recover_latent_time: infers a shared latent time across all genes based on the learned splicing dynamics.
Plotting:
pl.scatter enhancements: multiplots, rugplot, linear and polynomial fits, density plots, etc.
pl.heatmap: heatmap / clustermap of genes along time coordinate sorted by expression along dynamics.
Preprocessing:
New attributes in pp.filter_genes: min_shared_counts and min_shared_genes.
Added fast neighbor search method: Hierarchical Navigable Small World graphs (HNSW)
Version 0.1.14 Dec 7, 2018¶
Plotting:
New attriutes arrow_length and arrow_size for flexible adjustment of embedded velocities.
pl.velocity_graph: Scatter plot of embedding with cell-to-cell transition connectivities.
pl.velocity_embedding_stream: Streamplot visualization of velocities.
Improve visualization of embedded single cell velocities (autosize, colors etc.)
Tools:
tl.cell_fate: compute cell-specific terminal state likelihood
New attribute approx=True in tl.velocity_graph to enable approximate graph computation by performing cosine correlations on PCA space.
Preprocessing:
Automatically detect whether data is already preprocessed.
Version 0.1.11 Oct 27, 2018¶
Plotting:
settings.set_figure_params(): adjust matplotlib defaults for beautified plots
improved default point and arrow sizes; improved quiver autoscale
enable direct plotting of
Tools:
tl.velocity_confidence: Added two confidence measures ‘velocity_confidence’ and ‘velocity_confidence_transition’.
tl.rank_velocity_genes: Added functionality to rank genes for velocity characterizing groups using a t-test.
New attribute perc in tl.velocity enables extreme quantile fit, e.g. set perc=95.
New attribute groups in tl.velocity enables velocity estimation only on a subset of the data.
Improved tl.transition_matrix by incorporating self-loops via self_transitions=True and state changes that have negative correlation with velocity (opposite direction) via use_negative_cosines=True
Utils:
utils.merge to merge to AnnData objects such as already existing AnnData and newly generated Loom File.
Version 0.1.8 Sep 12, 2018¶
Plotting:
support saving plots as pdf, png etc.
support multiple colors and layers
quiver autoscaling for velocity plots
attributes added: figsize and dpi
Preprocessing:
filter_and_normalize() instead of recipe_velocity()
normalization of layers is done automatically when computing moments
Tools:
terminal_states: computes root and end points via eigenvalue decomposition thanks to M Lange
Version 0.1.5 Sep 4, 2018¶
Support writing loom files
Support both dense and sparse layers
Plotting bugfixes
Added pp.recipe_velocity()
Version 0.1.2 Aug 21, 2018¶
First alpha release of scvelo.
References¶
- Bergen20
Bergen et al. (2020), Generalizing RNA velocity to transient cell states through dynamical modeling, Nature Biotech.
- Manno18
La Manno et al. (2018), RNA velocity of single cells, Nature.
- Wolf18
Wolf et al. (2018), Scanpy: large-scale single-cell gene expression data analysis, Genome Biology.
- Wolf19
Wolf et al. (2019), PAGA: graph abstraction reconciles clustering with trajectory inference, Genome Biology.
Getting Started¶
Here, you will be briefly guided through the basics of how to use scVelo. Once you are set, the following tutorials go straight into analysis of RNA velocity, latent time, driver identification and many more.
First of all, the input data for scVelo are two count matrices of pre-mature (unspliced) and mature (spliced) abundances, which can be obtained from standard sequencing protocols, using the velocyto or loompy/kallisto counting pipeline.
scVelo workflow at a glance¶
Import scvelo as:
import scvelo as scv
For beautified visualization you can change the matplotlib settings to our defaults with:
scv.set_figure_params()
Read your data¶
Read your data file (loom, h5ad, csv, …) using:
adata = scv.read(filename, cache=True)
which stores the data matrix (adata.X
),
annotation of cells / observations (adata.obs
) and genes / variables (adata.var
), unstructured annotation such
as graphs (adata.uns
) and additional data layers where spliced and unspliced counts are stored (adata.layers
) .
If you already have an existing preprocessed adata object you can simply merge the spliced/unspliced counts via:
ldata = scv.read(filename.loom, cache=True)
adata = scv.utils.merge(adata, ldata)
If you do not have a datasets yet, you can still play around using one of the in-built datasets, e.g.:
adata = scv.datasets.pancreas()
The typical workflow consists of subsequent calls of preprocessing (scv.pp.*
), analysis tools (scv.tl.*
) and plotting (scv.pl.*
).
Basic preprocessing¶
After basic preprocessing (gene selection and normalization), we compute the first- and second-order moments (means and uncentered variances) for velocity estimation:
scv.pp.filter_and_normalize(adata, **params)
scv.pp.moments(adata, **params)
Velocity Tools¶
The core of the software is the efficient and robust estimation of velocities, obtained with:
scv.tl.velocity(adata, mode='stochastic', **params)
The velocities are vectors in gene expression space obtained by solving a stochastic model of transcriptional dynamics.
The solution to the deterministic model is obtained by setting mode='deterministic'
.
The solution to the dynamical model is obtained by setting mode='dynamical'
, which requires to run
scv.tl.recover_dynamics(adata, **params)
beforehand.
The velocities are stored in adata.layers
just like the count matrices.
The velocities are projected into a lower-dimensional embedding by translating them into likely cell transitions. That is, for each velocity vector we find the likely cell transitions that are in accordance with that direction. The probabilities of one cell transitioning into another cell are computed using cosine correlation (between the potential cell transition and the velocity vector) and are stored in a matrix denoted as velocity graph:
scv.tl.velocity_graph(adata, **params)
Visualization¶
Finally, the velocities can be projected and visualized in any embedding (e.g. UMAP) on single cell level, as gridlines, or as streamlines:
scv.pl.velocity_embedding(adata, basis='umap', **params)
scv.pl.velocity_embedding_grid(adata, basis='umap', **params)
scv.pl.velocity_embedding_stream(adata, basis='umap', **params)
For every tool module there is a plotting counterpart, which allows you to examine your results in detail, e.g.:
scv.pl.velocity(adata, var_names=['gene_A', 'gene_B'], **params)
scv.pl.velocity_graph(adata, **params)
RNA Velocity Basics¶
Here you will learn the basics of RNA velocity analysis.
For illustration, it is applied to endocrine development in the pancreas, with lineage commitment to four major fates: α, β, δ and ε-cells. See here for more details. It can be applied to your own data along the same lines.
The notebook is also available at Google Colab and nbviewer.
[ ]:
# update to the latest version, if not done yet.
!pip install scvelo --upgrade --quiet
[1]:
import scvelo as scv
scv.logging.print_version()
Running scvelo 0.2.0 (python 3.8.2) on 2020-05-16 12:55.
[2]:
scv.settings.verbosity = 3 # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True # set max width size for presenter view
scv.set_figure_params('scvelo') # for beautified visualization
Load the Data¶
The analysis is based on the in-built pancreas data. To run velocity analysis on your own data, read your file (loom, h5ad, csv …) to an AnnData object with adata = scv.read('path/file.loom', cache=True)
. If you want to merge your loom file into an already existing AnnData object, use scv.utils.merge(adata, adata_loom)
.
[3]:
adata = scv.datasets.pancreas()
adata
[3]:
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
var: 'highly_variable_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
scVelo is based on adata
, an object that stores a data matrix adata.X
, annotation of observations adata.obs
, variables adata.var
, and unstructured annotations adata.uns
. Names of observations and variables can be accessed via adata.obs_names
and adata.var_names
, respectively. AnnData objects can be sliced like dataframes, for example, adata_subset = adata[:, list_of_gene_names]
. For more details, see the anndata docs.
[4]:
scv.pl.proportions(adata)

Here, the proportions of spliced/unspliced counts are displayed. Depending on the protocol used (Drop-Seq, Smart-Seq), we typically have between 10%-25% of unspliced molecules containing intronic sequences. We also advice you to examine the variations on cluster level to verify consistency in splicing efficiency. Here, we find variations as expected, with slightly lower unspliced proportions at cycling ductal cells, then higher proportion at cell fate commitment in Ngn3-high and Pre-endocrine cells where many genes start to be transcribed.
Preprocess the Data¶
Preprocessing requisites consist of gene selection by detection (with a minimum number of counts) and high variability (dispersion), normalizing every cell by its total size and logarithmizing X. Filtering and normalization is applied in the same vein to spliced/unspliced counts and X. Logarithmizing is only applied to X. If X is already preprocessed from former analysis, it will not be touched.
All of this is summarized in a single function scv.pp.filter_and_normalize
, which essentially runs the following:
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)
Further, we need the first and second order moments (means and uncentered variances) computed among nearest neighbors in PCA space, summarized in scv.pp.moments
, which internally computes scv.pp.pca
and scv.pp.neighbors
. First order is needed for deterministic velocity estimation, while stochastic estimation also requires second order moments.
[5]:
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
Filtered out 20801 genes that are detected in less than 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
finished (0:00:03) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
finished (0:00:00) --> added
'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
Further preprocessing (such as batch effect correction) may be used to remove unwanted sources of variability. See the best practices for further details. Note, that any additional preprocessing step only affects X and is not applied to spliced/unspliced counts.
Estimate RNA velocity¶
Velocities are vectors in gene expression space and represent the direction and speed of movement of the individual cells. The velocities are obtained by modeling transcriptional dynamics of splicing kinetics, either stochastically (default) or deterministically (by setting mode='deterministic'
). For each gene, a steady-state-ratio of pre-mature (unspliced) and mature (spliced) mRNA counts is fitted, which constitutes a constant transcriptional state. Velocities are then obtained as
residuals from this ratio. Positive velocity indicates that a gene is up-regulated, which occurs for cells that show higher abundance of unspliced mRNA for that gene than expected in steady state. Conversely, negative velocity indicates that a gene is down-regulated.
The solution to the full dynamical model is obtained by setting mode='dynamical'
, which requires to run scv.tl.recover_dynamics(adata)
beforehand. We will elaborate more on the dynamical model in the next tutorial.
[6]:
scv.tl.velocity(adata)
computing velocities
finished (0:00:01) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
The computed velocities are stored in adata.layers
just like the count matrices.
The combination of velocities across genes can then be used to estimate the future state of an individual cell. In order to project the velocities into a lower-dimensional embedding, transition probabilities of cell-to-cell transitions are estimated. That is, for each velocity vector we find the likely cell transitions that are accordance with that direction. The transition probabilities are computed using cosine correlation between the potential cell-to-cell transitions and the velocity vector,
and are stored in a matrix denoted as velocity graph. The resulting velocity graph has dimension \(n_{obs} \times n_{obs}\) and summarizes the possible cell state changes that are well explained through the velocity vectors (for runtime speedup it can also be computed on reduced PCA space by setting approx=True
).
[7]:
scv.tl.velocity_graph(adata)
computing velocity graph
finished (0:00:10) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
For a variety of applications, the velocity graph can be converted to a transition matrix by applying a Gaussian kernel to transform the cosine correlations into actual transition probabilities. You can access the Markov transition matrix via scv.utils.get_transition_matrix
.
As mentioned, it is internally used to project the velocities into a low-dimensional embedding by applying the mean transition with respect to the transition probabilities, obtained with scv.tl.velocity_embedding
. Further, we can trace cells along the Markov chain to their origins and potential fates, thereby getting root cells and end points within a trajectory, obtained via scv.tl.terminal_states
.
Project the velocities¶
Finally, the velocities are projected onto any embedding, specified by basis
, and visualized in one of these ways: - on cellular level with scv.pl.velocity_embedding
, - as gridlines with scv.pl.velocity_embedding_grid
, - or as streamlines with scv.pl.velocity_embedding_stream
.
Note, that the data has an already pre-computed UMAP embedding, and annotated clusters. When applying to your own data, these can be obtained with scv.tl.umap
and scv.tl.louvain
. For more details, see the scanpy tutorial. Further, all plotting functions are defaulted to using basis='umap'
and color='clusters'
, which you can set accordingly.
[8]:
scv.pl.velocity_embedding_stream(adata, basis='umap')
computing velocity embedding
finished (0:00:00) --> added
'velocity_umap', embedded velocity vectors (adata.obsm)

The velocity vector field displayed as streamlines yields fine-grained insights into the developmental processes. It accurately delineates the cycling population of ductal cells and endocrine progenitors. Further, it illuminates cell states of lineage commitment, cell-cycle exit, and endocrine cell differentiation.
The most fine-grained resolution of the velocity vector field we get at single-cell level, with each arrow showing the direction and speed of movement of an individual cell. That reveals, e.g., the early endocrine commitment of Ngn3-cells (yellow) and a clear-cut difference between near-terminal α-cells (blue) and transient β-cells (green).
[9]:
scv.pl.velocity_embedding(adata, arrow_length=3, arrow_size=2, dpi=120)

Interprete the velocities¶
This is perhaps the most important part as we advise the user not to limit biological conclusions to the projected velocities, but to examine individual gene dynamics via phase portraits to understand how inferred directions are supported by particular genes.
See the gif here to get an idea of how to interpret a spliced vs. unspliced phase portrait. Gene activity is orchestrated by transcriptional regulation. Transcriptional induction for a particular gene results in an increase of (newly transcribed) precursor unspliced mRNAs while, conversely, repression or absence of transcription results in a decrease of unspliced mRNAs. Spliced mRNAs is produced from unspliced mRNA and follows the same trend with a time lag. Time is a hidden/latent variable. Thus, the dynamics needs to be inferred from what is actually measured: spliced and unspliced mRNAs as displayed in the phase portrait.
Now, let us examine the phase portraits of some marker genes, visualized with scv.pl.velocity(adata, gene_names)
or scv.pl.scatter(adata, gene_names)
.
[10]:
scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2', 'Adk'], ncols=2)

The black line corresponds to the estimated ‘steady-state’ ratio, i.e. the ratio of unspliced to spliced mRNA abundance which is in a constant transcriptional state. RNA velocity for a particular gene is determined as the residual, i.e. how much an observation deviates from that steady-state line. Positive velocity indicates that a gene is up-regulated, which occurs for cells that show higher abundance of unspliced mRNA for that gene than expected in steady state. Conversely, negative velocity indicates that a gene is down-regulated.
For instance Cpe explains the directionality in the up-regulated Ngn3 (yellow) to Pre-endocrine (orange) to β-cells (green), while Adk explains the directionality in the down-regulated Ductal (dark green) to Ngn3 (yellow) to the remaining endocrine cells.
[11]:
scv.pl.scatter(adata, 'Cpe', color=['clusters', 'velocity'],
add_outline='Ngn3 high EP, Pre-endocrine, Beta')

Identify important genes¶
We need a systematic way to identify genes that may help explain the resulting vector field and inferred lineages. To do so, we can test which genes have cluster-specific differential velocity expression, being siginificantly higher/lower compared to the remaining population. The module scv.tl.rank_velocity_genes
runs a differential velocity t-test and outpus a gene ranking for each cluster. Thresholds can be set (e.g. min_corr
) to restrict the test on a selection of gene candidates.
[12]:
scv.tl.rank_velocity_genes(adata, groupby='clusters', min_corr=.3)
df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
df.head()
ranking velocity genes
finished (0:00:02) --> added
'rank_velocity_genes', sorted scores by group ids (adata.uns)
'spearmans_score', spearmans correlation scores (adata.var)
[12]:
Ductal | Ngn3 low EP | Ngn3 high EP | Pre-endocrine | Beta | Alpha | Delta | Epsilon | |
---|---|---|---|---|---|---|---|---|
0 | Notch2 | Ptpn3 | Pde1c | Pam | Pax6 | Zcchc16 | Zdbf2 | Tmcc3 |
1 | Sox5 | Hacd1 | Ptprs | Sdk1 | Unc5c | Nlgn1 | Spock3 | Heg1 |
2 | Krt19 | Hspa8 | Pclo | Baiap3 | Nnat | Nell1 | Akr1c19 | Gpr179 |
3 | Hspa8 | Gm8113 | Rap1gap2 | Abcc8 | Tmem108 | Prune2 | Ptprt | Ica1 |
4 | Ano6 | Kcnq1 | Ttyh2 | Gnas | Ptprt | Ksr2 | Snap25 | Ncoa7 |
[13]:
kwargs = dict(frameon=False, size=10, linewidth=1.5,
add_outline='Ngn3 high EP, Pre-endocrine, Beta')
scv.pl.scatter(adata, df['Ngn3 high EP'][:5], ylabel='Ngn3 high EP', **kwargs)
scv.pl.scatter(adata, df['Pre-endocrine'][:5], ylabel='Pre-endocrine', **kwargs)


The genes Ptprs, Pclo, Pam, Abcc8, Gnas, for instance, support the directionality from Ngn3 high EP (yellow) to Pre-endocrine (orange) to Beta (green).
Velocities in cycling progenitors¶
The cell cycle detected by RNA velocity, is biologically affirmed by cell cycle scores (standardized scores of mean expression levels of phase marker genes).
[14]:
scv.tl.score_genes_cell_cycle(adata)
scv.pl.scatter(adata, color_gradients=['S_score', 'G2M_score'], smooth=True, perc=[5, 95])
calculating cell cycle phase
--> 'S_score' and 'G2M_score', scores of cell cycle phases (adata.obs)

For the cycling Ductal cells, we may screen through S and G2M phase markers. The previous module also computed a spearmans correlation score, which we can use to rank/sort the phase marker genes to then display their phase portraits.
[15]:
s_genes, g2m_genes = scv.utils.get_phase_marker_genes(adata)
s_genes = scv.get_df(adata[:, s_genes], 'spearmans_score', sort_values=True).index
g2m_genes = scv.get_df(adata[:, g2m_genes], 'spearmans_score', sort_values=True).index
kwargs = dict(frameon=False, ylabel='cell cycle genes')
scv.pl.scatter(adata, list(s_genes[:2]) + list(g2m_genes[:3]), **kwargs)

Particularly Hells and Top2a are well-suited to explain the vector field in the cycling progenitors. Top2a gets assigned a high velocity shortly before it actually peaks in the G2M phase. There, the negative velocity then perfectly matches the immediately following down-regulation.
[16]:
scv.pl.velocity(adata, ['Hells', 'Top2a'], ncols=2, add_outline=True)

Speed and coherence¶
Two more useful stats: - The speed or rate of differentiation is given by the length of the velocity vector. - The coherence of the vector field (i.e., how a velocity vector correlates with its neighboring velocities) provides a measure of confidence.
[17]:
scv.tl.velocity_confidence(adata)
keys = 'velocity_length', 'velocity_confidence'
scv.pl.scatter(adata, c=keys, cmap='coolwarm', perc=[5, 95])
--> added 'velocity_length' (adata.obs)
--> added 'velocity_confidence' (adata.obs)
--> added 'velocity_confidence_transition' (adata.obs)

These provide insights where cells differentiate at a slower/faster pace, and where the direction is un-/determined.
On cluster-level, we find that differentiation substantially speeds up after cell cycle exit (Ngn3 low EP), keeping the pace during Beta cell production while slowing down during Alpha cell production.
[18]:
df = adata.obs.groupby('clusters')[keys].mean().T
df.style.background_gradient(cmap='coolwarm', axis=1)
[18]:
clusters | Ductal | Ngn3 low EP | Ngn3 high EP | Pre-endocrine | Beta | Alpha | Delta | Epsilon |
---|---|---|---|---|---|---|---|---|
velocity_length | 5.707904 | 6.227023 | 13.257741 | 13.429206 | 14.219882 | 10.491913 | 7.623143 | 10.477606 |
velocity_confidence | 0.775042 | 0.756474 | 0.896944 | 0.830438 | 0.725973 | 0.752715 | 0.691188 | 0.784467 |
Velocity graph and pseudotime¶
We can visualize the velocity graph to portray all velocity-inferred cell-to-cell connections/transitions. It can be confined to high-probability transitions by setting a threshold
. The graph, for instance, indicates two phases of Epsilon cell production, coming from early and late Pre-endocrine cells.
[19]:
scv.pl.velocity_graph(adata, threshold=.1)

Further, the graph can be used to draw descendents/anscestors coming from a specified cell. Here, a pre-endocrine cell is traced to its potential fate.
[20]:
x, y = scv.utils.get_cell_transitions(adata, basis='umap', starting_cell=70)
ax = scv.pl.velocity_graph(adata, c='lightgrey', edge_width=.05, show=False)
ax = scv.pl.scatter(adata, x=x, y=y, s=120, c='ascending', cmap='gnuplot', ax=ax)

Finally, based on the velocity graph, a velocity pseudotime can be computed. After inferring a distribution over root cells from the graph, it measures the average number of steps it takes to reach a cell after walking along the graph starting from the root cells.
Contrarily to diffusion pseudotime, it implicitly infers the root cells and is based on the directed velocity graph instead of the similarity-based diffusion kernel.
[21]:
scv.tl.velocity_pseudotime(adata)
scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
computing terminal states
identified 2 regions of root cells and 1 region of end points
finished (0:00:00) --> added
'root_cells', root cells of Markov diffusion process (adata.obs)
'end_points', end points of Markov diffusion process (adata.obs)

PAGA velocity graph¶
PAGA graph abstraction has benchmarked as top-performing method for trajectory inference. It provides a graph-like map of the data topology with weighted edges corresponding to the connectivity between two clusters. Here, PAGA is extended by velocity-inferred directionality.
[ ]:
# PAGA requires to install igraph, if not done yet.
!pip install python-igraph --upgrade --quiet
[22]:
# this is needed due to a current bug - bugfix is coming soon.
adata.uns['neighbors']['distances'] = adata.obsp['distances']
adata.uns['neighbors']['connectivities'] = adata.obsp['connectivities']
scv.tl.paga(adata, groups='clusters')
df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
df.style.background_gradient(cmap='Blues').format('{:.2g}')
running PAGA
finished (0:00:01) --> added
'paga/transitions_confidence', connectivities adjacency (adata.uns)
'paga/connectivities', connectivities adjacency (adata.uns)
'paga/connectivities_tree', connectivities subtree (adata.uns)
[22]:
Ductal | Ngn3 low EP | Ngn3 high EP | Pre-endocrine | Beta | Alpha | Delta | Epsilon | |
---|---|---|---|---|---|---|---|---|
Ductal | 0 | 0.15 | 0 | 0 | 0 | 0 | 0 | 0 |
Ngn3 low EP | 0 | 0 | 0.24 | 0 | 0 | 0 | 0 | 0 |
Ngn3 high EP | 0 | 0 | 0 | 0.22 | 0 | 0 | 0 | 0 |
Pre-endocrine | 0 | 0 | 0 | 0 | 0.48 | 0.12 | 0.21 | 0.11 |
Beta | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Alpha | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Delta | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Epsilon | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
This reads from left/row to right/column, thus e.g. assigning a confident transition from Ductal to Ngn3 low EP.
This table can be summarized by a directed graph superimposed onto the UMAP embedding.
[23]:
scv.pl.paga(adata, basis='umap', size=50, alpha=.1,
min_edge_width=2, node_size_scale=1.5)

Dynamical Modeling¶
Here, we use the generalized dynamical model to solve the full transcriptional dynamics.
That yields several additional insights such as latent time and identification of putative driver genes.
As in the previous tutorial, it is illustratively applied to endocrine development in the pancreas.
[ ]:
# update to the latest version, if not done yet.
!pip install scvelo --upgrade --quiet
[1]:
import scvelo as scv
scv.logging.print_version()
Running scvelo 0.2.0 (python 3.8.2) on 2020-05-15 00:27.
[2]:
scv.settings.verbosity = 3 # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True # set max width size for presenter view
scv.settings.set_figure_params('scvelo') # for beautified visualization
Prepare the Data¶
Processing consists of gene selection, normalizing by total size, logarithmizing X, and computing moments for velocity estimation. See the previous tutorial for further explanation.
[3]:
adata = scv.datasets.pancreas()
[4]:
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
Filtered out 20801 genes that are detected in less than 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
finished (0:00:03) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
finished (0:00:00) --> added
'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
Dynamical Model¶
We run the dynamical model to learn the full transcriptional dynamics of splicing kinetics.
It is solved in a likelihood-based expectation-maximization framework, by iteratively estimating the parameters of reaction rates and latent cell-specific variables, i.e. transcriptional state and cell-internal latent time. It thereby aims to learn the unspliced/spliced phase trajectory for each gene.
[5]:
scv.tl.recover_dynamics(adata)
recovering dynamics
finished (0:13:31) --> added
'fit_pars', fitted parameters for splicing dynamics (adata.var)
[6]:
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
computing velocities
finished (0:00:04) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
finished (0:00:08) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
Running the dynamical model can take a while. Hence, you may want to store the results for re-use, with adata.write('data/pancreas.h5ad'
, which can later be read with adata = scv.read('data/pancreas.h5ad')
.
[7]:
#adata.write('data/pancreas.h5ad', compression='gzip')
#adata = scv.read('data/pancreas.h5ad')
[8]:
scv.pl.velocity_embedding_stream(adata, basis='umap')
computing velocity embedding
finished (0:00:00) --> added
'velocity_umap', embedded velocity vectors (adata.obsm)

Kinetic rate paramters¶
The rates of RNA transcription, splicing and degradation are estimated without the need of any experimental data.
They can be useful to better understand the cell identity and phenotypic heterogeneity.
[9]:
df = adata.var
df = df[(df['fit_likelihood'] > .1) & df['velocity_genes'] == True]
kwargs = dict(xscale='log', fontsize=16)
with scv.GridSpec(ncols=3) as pl:
pl.hist(df['fit_alpha'], xlabel='transcription rate', **kwargs)
pl.hist(df['fit_beta'] * df['fit_scaling'], xlabel='splicing rate', xticks=[.1, .4, 1], **kwargs)
pl.hist(df['fit_gamma'], xlabel='degradation rate', xticks=[.1, .4, 1], **kwargs)
scv.get_df(adata, 'fit*', dropna=True).head()

[9]:
fit_r2 | fit_alpha | fit_beta | fit_gamma | fit_t_ | fit_scaling | fit_std_u | fit_std_s | fit_likelihood | fit_u0 | fit_s0 | fit_pval_steady | fit_steady_u | fit_steady_s | fit_variance | fit_alignment_scaling | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | ||||||||||||||||
Sntg1 | 0.401981 | 0.015726 | 0.005592 | 0.088792 | 23.404254 | 42.849447 | 1.029644 | 0.030838 | 0.406523 | 0.0 | 0.0 | 0.159472 | 2.470675 | 0.094304 | 0.149138 | 5.355590 |
Sbspon | 0.624803 | 0.464865 | 2.437113 | 0.379645 | 3.785993 | 0.154771 | 0.058587 | 0.178859 | 0.252135 | 0.0 | 0.0 | 0.182088 | 0.164805 | 0.430623 | 0.674312 | 1.193015 |
Mcm3 | 0.292389 | 3.096367 | 39.995796 | 0.638543 | 2.049463 | 0.013943 | 0.016253 | 0.673142 | 0.228207 | 0.0 | 0.0 | 0.467683 | 0.051432 | 1.927742 | 0.687468 | 0.887607 |
Fam135a | 0.384662 | 0.172335 | 0.118088 | 0.204538 | 11.239574 | 1.124040 | 0.356525 | 0.149868 | 0.283343 | 0.0 | 0.0 | 0.387921 | 1.345830 | 0.393197 | 0.671096 | 3.390194 |
Adgrb3 | 0.384552 | 0.046828 | 0.006750 | 0.196856 | 6.992542 | 71.850736 | 2.153206 | 0.030417 | 0.250195 | 0.0 | 0.0 | 0.068851 | 5.214500 | 0.093570 | 0.556878 | 1.893389 |
The estimated gene-specific parameters comprise rates of transription (fit_alpha
), splicing (fit_beta
), degradation (fit_gamma
), switching time point (fit_t_
), a scaling parameter to adjust for under-represented unspliced reads (fit_scaling
), standard deviation of unspliced and spliced reads (fit_std_u
, fit_std_s
), the gene likelihood (fit_likelihood
), inferred steady-state levels (fit_steady_u
, fit_steady_s
) with their corresponding p-values
(fit_pval_steady_u
, fit_pval_steady_s
), the overall model variance (fit_variance
), and a scaling factor to align the gene-wise latent times to a universal, gene-shared latent time (fit_alignment_scaling
).
Latent time¶
The dynamical model recovers the latent time of the underlying cellular processes. This latent time represents the cell’s internal clock and approximates the real time experienced by cells as they differentiate, based only on its transcriptional dynamics.
[10]:
scv.tl.latent_time(adata)
scv.pl.scatter(adata, color='latent_time', color_map='gnuplot', size=80)
computing terminal states
identified 2 regions of root cells and 1 region of end points
finished (0:00:00) --> added
'root_cells', root cells of Markov diffusion process (adata.obs)
'end_points', end points of Markov diffusion process (adata.obs)
computing latent time
finished (0:00:02) --> added
'latent_time', shared time (adata.obs)

[11]:
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
scv.pl.heatmap(adata, var_names=top_genes, sortby='latent_time', col_color='clusters', n_convolve=100)

Top-likelihood genes¶
Driver genes display pronounced dynamic behavior and are systematically detected via their characterization by high likelihoods in the dynamic model.
[12]:
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index
scv.pl.scatter(adata, basis=top_genes[:15], ncols=5, frameon=False)

[13]:
var_names = ['Actn4', 'Ppp3ca', 'Cpe', 'Nnat']
scv.pl.scatter(adata, var_names, frameon=False)
scv.pl.scatter(adata, x='latent_time', y=var_names, frameon=False)


Cluster-specific top-likelihood genes¶
Moreover, partial gene likelihoods can be computed for a each cluster of cells to enable cluster-specific identification of potential drivers.
[14]:
scv.tl.rank_dynamical_genes(adata, groupby='clusters')
df = scv.get_df(adata, 'rank_dynamical_genes/names')
df.head(5)
ranking genes by cluster-specific likelihoods
finished (0:00:03) --> added
'rank_dynamical_genes', sorted scores by group ids (adata.uns)
[14]:
Ductal | Ngn3 low EP | Ngn3 high EP | Pre-endocrine | Beta | Alpha | Delta | Epsilon | |
---|---|---|---|---|---|---|---|---|
0 | Dcdc2a | Dcdc2a | Rbfox3 | Abcc8 | Pcsk2 | Cpe | Pcsk2 | Tox3 |
1 | Top2a | Adk | Mapre3 | Tmem163 | Ank | Gnao1 | Rap1b | Rnf130 |
2 | Nfib | Mki67 | Btbd17 | Gnao1 | Tmem163 | Pak3 | Pak3 | Meis2 |
3 | Wfdc15b | Rap1gap2 | Sulf2 | Ank | Tspan7 | Pim2 | Abcc8 | Adk |
4 | Cdk1 | Top2a | Tcp11 | Tspan7 | Map1b | Map1b | Klhl32 | Rap1gap2 |
[15]:
for cluster in ['Ductal', 'Ngn3 high EP', 'Pre-endocrine', 'Beta']:
scv.pl.scatter(adata, df[cluster][:5], ylabel=cluster, frameon=False)




Differential Kinetics¶
One important concern is dealing with systems that represent multiple lineages and processes, where genes are likely to show different kinetic regimes across subpopulations. Distinct cell states and lineages are typically governed by different variations in the gene regulatory networks and may hence exhibit different splicing kinetics. This gives rise to genes that display multiple trajectories in phase space.
To address this, the dynamical model can be used to perform a likelihood-ratio test for differential kinetics. This way, we can detect clusters that display kinetic behavior that cannot be well explained by a single model of the overall dynamics. Clustering cell types into their different kinetic regimes then allows fitting each regime separately.
For illustration, we apply differential kinetic analysis to dentate gyrus neurogenesis, which comprises multiple heterogeneous subpopulations.
[ ]:
# update to the latest version, if not done yet.
!pip install scvelo --upgrade --quiet
[1]:
import scvelo as scv
scv.logging.print_version()
Running scvelo 0.2.0 (python 3.8.2) on 2020-05-15 00:57.
[2]:
scv.settings.verbosity = 3 # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True # set max width size for presenter view
scv.settings.set_figure_params('scvelo') # for beautified visualization
Prepare the Data¶
Processing consists of gene selection, log-normalizing, and computing moments. See the previous tutorials for further explanation.
[3]:
adata = scv.datasets.dentategyrus()
[4]:
scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
Filtered out 11019 genes that are detected in less than 30 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
finished (0:00:02) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
finished (0:00:00) --> added
'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
Basic Velocity Estimation¶
[5]:
scv.tl.velocity(adata)
scv.tl.velocity_graph(adata)
computing velocities
finished (0:00:00) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
finished (0:00:05) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
[6]:
scv.pl.velocity_embedding_stream(adata, basis='umap')
computing velocity embedding
finished (0:00:00) --> added
'velocity_umap', embedded velocity vectors (adata.obsm)

Differential Kinetic Test¶
Distinct cell types and lineages may exhibit different kinetics regimes as these can be governed by a different network structure. Even if cell types or lineages are related, kinetics can be differential due to alternative splicing, alternative polyadenylation and modulations in degradation.
The dynamical model allows us to address this issue with a likelihood ratio test for differential kinetics to detect clusters/lineages that display kinetic behavior that cannot be sufficiently explained by a single model for the overall dynamics. Each cell type is tested whether an independent fit yields a significantly improved likelihood.
The likelihood ratio, following an asymptotic chi-squared distribution, can be tested for significance. Note that for efficiency reasons, by default an orthogonal regression is used instead of a full phase trajectory to test whether a cluster is well explained by the overall kinetic or exhibits a different kinetic.
[7]:
var_names = ['Tmsb10', 'Fam155a', 'Hn1', 'Rpl6']
scv.tl.differential_kinetic_test(adata, var_names=var_names, groupby='clusters')
recovering dynamics
finished (0:00:02) --> added
'fit_pars', fitted parameters for splicing dynamics (adata.var)
outputs model fit of gene: Rpl6
testing for differential kinetics
finished (0:00:00) --> added
'fit_diff_kinetics', clusters displaying differential kinetics (adata.var)
'fit_pval_kinetics', p-values of differential kinetics (adata.var)
outputs model fit of gene: Rpl6
[7]:
<scvelo.tools.dynamical_model.DynamicsRecovery at 0x12390f4a8>
[8]:
scv.get_df(adata[:, var_names], ['fit_diff_kinetics', 'fit_pval_kinetics'], precision=2)
[8]:
fit_diff_kinetics | fit_pval_kinetics | |
---|---|---|
index | ||
Tmsb10 | Endothelial | 6.02e-16 |
Fam155a | Cajal Retzius | 8.35e-161 |
Hn1 | Microglia | 3.02e-03 |
Rpl6 | Microglia | 6.27e-15 |
[9]:
kwargs = dict(linewidth=2, add_linfit=True, frameon=False)
scv.pl.scatter(adata, basis=var_names, add_outline='fit_diff_kinetics', **kwargs)

In Tmsb10, for instance, Endothelial display a kinetic behaviour (outlined, with the black line fitted through), that cannot be well explained by the overall dynamics (purple curve).
[10]:
diff_clusters=list(adata[:, var_names].var['fit_diff_kinetics'])
scv.pl.scatter(adata, legend_loc='right', size=60, title='diff kinetics',
add_outline=diff_clusters, outline_width=(.8, .2))

Testing top-likelihood genes¶
Screening through the top-likelihood genes, we find some gene-wise dynamics that display multiple kinetic regimes.
[11]:
scv.tl.recover_dynamics(adata)
#adata.write('data/pancreas.h5ad', compression='gzip')
#adata = scv.read('data/pancreas.h5ad')
recovering dynamics
finished (0:06:39) --> added
'fit_pars', fitted parameters for splicing dynamics (adata.var)
[12]:
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:100]
scv.tl.differential_kinetic_test(adata, var_names=top_genes, groupby='clusters')
testing for differential kinetics
finished (0:00:21) --> added
'fit_diff_kinetics', clusters displaying differential kinetics (adata.var)
'fit_pval_kinetics', p-values of differential kinetics (adata.var)
Particularly, cell types that are distinct from the main granule - such as Cck/Tox, GABA, Endothelial, and Microglia - occur frequently.
[13]:
scv.pl.scatter(adata, basis=top_genes[:15], ncols=5, add_outline='fit_diff_kinetics', **kwargs)

[14]:
scv.pl.scatter(adata, basis=top_genes[15:30], ncols=5, add_outline='fit_diff_kinetics', **kwargs)

Recompute velocities¶
Finally, velocities can be recomputed leveraging the information of multiple competing kinetic regimes.
[15]:
scv.tl.velocity(adata, diff_kinetics=True)
scv.tl.velocity_graph(adata)
computing velocities
finished (0:00:00) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
finished (0:00:05) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
[16]:
scv.pl.velocity_embedding(adata, dpi=120, arrow_size=2, arrow_length=2)
computing velocity embedding
finished (0:00:00) --> added
'velocity_umap', embedded velocity vectors (adata.obsm)

Challenges and Perspectives¶
This page complements our manuscript Bergen et al. (MSB, 2021) RNA velocity - Current challenges and future perspectives
We provide several examples to discuss potential pitfalls of current RNA velocity modeling approaches, and provide guidance on how the ensuing challenges may be addressed. Our aspiration is to suggest promising future directions and to stimulate a communities effort on further model extensions.
In the following, you find two vignettes with several use cases, as well as an in-depth analysis of time-dependent kinetic rate parameters.
Potential pitfalls¶

This notebook reproduces Fig. 2 with several use cases, including multiple kinetics in Dentate Gyrus, transcriptional boost in erythroid lineage, and misleading arrow projections in mature PBMCs.
Notebook: Perspectives
Kinetic parameter analysis¶

This notebook reproduces Fig. 3, where we demonstrate how time-variable kinetic rates shape the curvature patterns of gene activation.
Notebook: Kinetic parameter analysis