To upload a new dataset, select a data file containing a gene expression matrix. This file can be a sparse matrix file in Matrix Market format (.mtx, as produced by scipy.io.mmwrite) or a dense matrix as a space or tab-delimited file (as produced by np.savetxt). The file should not contain any headers. Make sure to indicate whether the file is genes-by-cells or cells-by-genes.
After updating the data, you will eventually be redirected to a view that looks like the one above. On the top, there are two plots. The plot on the left shows the distribution of total read counts per cell. The plot on the right shows the relationship between mean and variance for all genes.
- Number of clusters: If the number of cell types is known, use the number of cell types here. Otherwise, this can be set to 0, which automatically infers the number of clusters using a gap score metric. Alternatively, setting this to between 10 and 20 usually produces reasonable results (clusters can be merged or split later on).
- Visualization method: How to create a scatterplot of cells using the output of Uncurl. One of MDS, PCA, tSNE, or UMAP. Default: tSNE.
- Baseline visualization method: How to create a scatterplot of cells without using Uncurl. One of PCA, tSNE, or UMAP. Default: tSNE.
- Remove cells with read counts greater/less than: This should be self-explanatory. Cells removed this way will not be used in the downstream processing. The default is to remove the top and bottom 5% of cells by total read count. If you want to include all cells, set the first number to 0 and the second number to a very high number.
- Distribution: For UMI or count-valued data, Poisson usually is best. For normalized counts (FPKM, RPKM,...), you might want to use log-normal. If the data has already been log-transformed, Gaussian might be preferable.
- Fraction of genes to include: As a preprocessing step, Uncurl selects a subset of genes for inference using variance-based selection. By default, this is 0.2.
- Fraction of cells to include for visualization: If there is a large number of cells, then the number of cells can be subsampled for the dimensionality reduction, to reduce computational and data transfer requirements. By default, this sets it so that the number of cells retained is at most about 1500. Set this to 1.0 to include all cells.
- Normalize cells by read count: If this is set to true, then for all cells, the read counts will be divided by the total read count for that cell, and then multiplied by the median read count for all cells.
After Uncurl has finished running, you will be redirected to a page that looks like the one above. The graph on the left is a scatterplot that shows a dimensionally reduced view of the cells or cluster means. The graph on the right shows the top genes for the selected cluster, or relationships between clusters.
To change the scatterplot view, click on the radio buttons above the plot, circled in red. The default view shows the cluster means. The "Processed cells" option shows a scatterplot with all cells.
To change the cluster being shown on the barplot, click on any cell or cluster mean.
To change the color scheme, use the "Label scheme" dropdown. This also allows you to upload a new color scheme for visualization.
Interacting with the plot
Double click on a cluster name to see only that cluster. Single click on a cluster name to toggle its visibility.
Mousing over the top right of the plot will show a control panel. This allows you to zoom in/out, select cells, or save the plot.
Gene set databases
UNCURL-App currently contains interfaces to three gene set databases: Enrichr, CellMarker, and CellMeSH. These databases can be used to aid in identifying cell types corresponding to clusters.
This is an interface to the Enrichr tool (http://amp.pharm.mssm.edu/Enrichr/). This does not include all gene set libraries present in Enrichr, just the ones that might be helpful in identifying cell types.
New cell labels
New labels for each cell can be uploaded by selecting "New color track" from the "Label scheme" dropdown.
Merging, splitting, re-analysis
Using the scatterplot, the user can merge or split existing clusters, or create a new cluster from selected cells.