docs(manuscript): add preprocessing methods

cameronraysmith · cameronraysmith · commit 0a76b8bbdd23 · 2024-08-21T01:07:40.000-04:00
Signed-off-by: Cameron Smith &lt;cameron.ray.smith@gmail.com&gt;
diff --git a/reproducibility/manuscript/manuscript.qmd b/reproducibility/manuscript/manuscript.qmd
@@ -501,6 +501,85 @@ differential equations proposed in velocyto [@La_Manno2018-lj] and scVelo
   &= \beta_g u\left(\tau^{\left(k_{cg}\right)}\right)-\gamma_g s\left(\tau^{\left(k_{cg}\right)}\right). \label{eq-dsdt}
 \end{align}
 
+## Single-cell data preprocessing {#sec-methods-preprocessing}
+
+We used scanpy and scVelo to handle the data input and output; thus, both h5ad
+and loom files generated by velocyto and kallisto [@Melsted2021-ap] are
+supported. The fully mature PBMC dataset was processed with the same procedure
+proposed in a review paper [@Bergen2021-qz]
+(<https://scvelo.readthedocs.io/perspectives/Perspectives/>). We reproduced this
+procedure using the scVelo package and raw read counts of the same top three
+dynamical genes NKG7, IGHM, and GNLY with the best likelihoods. The pancreas
+dataset was processed with scVelo using the following options
+
+```python
+scv.pp.filter_and_normalize(
+  adata=adata,
+  min_shared_counts=30, 
+  n_top_genes=2000,
+)
+scv.pp.moments(
+  adata,
+  n_pcs=30,
+  n_neighbors=30,
+)
+```
+
+The same top variable genes with raw spliced and unspliced read counts were used
+as input for the Pyro&thinsp;-Velocity model. The original LARRY dataset of in
+vitro Hematopoiesis containing $130,887$ cells was first filtered to remove
+cells without LARRY barcoding. $49,302$ cells were recovered after this step
+with at least one LARRY barcode. For simplicity, we termed this filtered dataset
+with multiple cell fate (multi-fate) as the full dataset. Based on this dataset,
+we created two datasets with uni-fate progression toward monocyte or neutrophil
+based on the lineage LARRY barcodes and time information. Namely, we selected
+sets of cells with a single LARRY barcode, spanning three time points (day 2, 4,
+6), and all the cells from the last time point (day 6) belong to a unique cell
+type (either monocyte or neutrophil). The two uni-fate datasets were combined to
+represent the bi-fate LARRY dataset. The multi-fate full dataset was processed
+using the same options as the pancreas dataset; the rest of the uni-fate and
+bi-fate datasets were processed using the following parameters
+
+```python
+scv.pp.filter_and_normalize(
+  adata=adata,
+  n_top_genes=2000,
+  min_shared_counts=20,
+)
+scv.pp.moments(adata)
+```
+
+## scVelo model {#sec-methods-scvelo}
+
+We benchmarked the dynamical RNA velocity model implemented in scVelo `v0.2.4`
+for the pancreas and the four LARRY datasets using the same user options
+
+```python
+scvelo.tl.recover_dynamics(
+  data=adata, 
+  n_jobs=30,
+)
+scvelo.tl.velocity(
+  data=adata, 
+  mode="dynamical",
+)
+```
+
+Then, we tested a set of user options, including the neighboring cell numbers
+and the top variable gene numbers, in the pancreas dataset to explore the
+stability of the scVelo dynamical model. For the fully mature PBMC dataset, we
+followed the notebook proposed by the original authors
+<https://scvelo.readthedocs.io/perspectives/Perspectives/>, i.e., we used the
+stochastic RNA velocity model implemented in scVelo with the top three
+likelihood genes. The latent time from scVelo was computed using their provided
+function 
+
+```python
+scvelo.tl.latent_time(
+  data=adata,
+)
+```
+
 ## Trajectory inference {#sec-methods-trajectory-inference}
 
 ### Velocity vector field
@@ -517,10 +596,10 @@ scvelo.tl.velocity_embedding(
 ```
 
 We used the default options for projecting the vector fields from scVelo models.
-Unlike scVelo, Pyro-Velocity uses statistics derived from posterior samples of
+Unlike scVelo, Pyro&thinsp;-Velocity uses statistics derived from posterior samples of
 the denoised spliced gene expression and posterior samples of the velocity
 estimation for building the cell state transition matrix estimates using cosine
-similarity. Pyro-Velocity uses the same projection method as scVelo for
+similarity. Pyro&thinsp;-Velocity uses the same projection method as scVelo for
 projecting the transition matrix into the two-dimensional vector field on the
 user-provided embedding space.