COMPOSITE (COMpound POiSson multIplet deTEction model) is a computational tool for multiplet detection in both single-cell single-omics and multiomics settings. It has been implemented as an automated pipeline and is available as both a cloud-based application with a user-friendly interface and a Python package.
COMPOSITE accepts up to three mtx
matrix (columns are cells and rows are features) files as input, corresponding to the matrices of three modalities. The RNA.mtx
and ADT.mtx
matrices are simple raw count matrices. However, the ATAC.mtx
matrix requires some preprocessing. The details are outlined below.
Like we said before, the RNA.mtx
and ADT.mtx
are just raw counts matrices. Here is a simple example to save a Scanpy object into a mtx
file.
import scipy
#adata is a scanpy.AnnData object.
sparse_X = scipy.sparse.coo_matrix(adata.X)
scipy.io.mmwrite('PATH_OF_YOUR_MTX_FILE', sparse_X)
For ATAC.mtx
, we rely on the GeneActivity function from Signac to preprocess it. Signac is written in R, but we provide an easy-to-use script GeneActivity.R
for you. Just run
Rscript GeneActivity.R TARGET_PATH PATH_TO_ATAC_H5_FILE PATH_TO_ATAC_METADATA PATH_TO_FRAGMENT
Note that there are four path arguments. Then the preprocessed ATAC.mtx
will be under TARGET_PATH
.
To prepare the data from a Seurat object: Preparing data for COMPOSITE.
We also provide readily available demo datasets that can be used directly as input: RNA.mtx, ADT.mtx, and ATAC.mtx.
Users may directly upload the data files to the COMPOSITE cloud-based app. The results will be sent to the provided email address as a .csv file.
Installation:
pip install sccomposite==1.0.0
Store the RNA data, ADT data, and ATAC data respectively as "RNA.mtx", "ADT.mtx", and "ATAC.mtx" in the working directory. Import the sccomposite
package.
import sccomposite
from sccomposite import RNA_modality
from sccomposite import ADT_modality
from sccomposite import ATAC_modality
from sccomposite import Multiomics
We recommend users to use the default parameter settings when running COMPOSITE. COMPOSITE is a robust statistical model and the default parameters are suitable for most of the cases. All the results in our manuscript were generated under the default parameter setting. We recommand the users to use all the available modalities of data as input.
When only one modality of data is available:
# RNA modality only
multiplet_classification, consistency = RNA_modality.composite_rna("RNA.mtx")
# ADT modality only
multiplet_classification, consistency = ADT_modality.composite_adt("ADT.mtx")
# ATAC modality only
multiplet_classification, consistency = ATAC_modality.composite_atac("ATAC.mtx")
The multiplet_classification
variable contains the predicted multiplet label for each droplet, with "1" representing multiplet and "0" representing singlet.
The consistency
variable contains the droplet-specific modality consistency. A higher value of consistency indicates the data in the corresponding modality are less noisy for the given droplet, resulting in a more reliable multiplet prediction result for the droplet.
When multiomics data are available:
# RNA+ADT
multiplet_classification, multiplet_probability = Multiomics.composite_multiomics(RNA = "RNA.mtx", ADT = "ADT.mtx")
# RNA+ATAC
multiplet_classification, multiplet_probability = Multiomics.composite_multiomics(RNA = "RNA.mtx", ATAC = "ATAC.mtx")
# RNA+ADT+ATAC
multiplet_classification, multiplet_probability = Multiomics.composite_multiomics(RNA = "RNA.mtx", ADT = "ADT.mtx", ATAC = "ATAC.mtx")
The multiplet_classification
variable contains the predicted multiplet label for each droplet, with "1" representing multiplet and "0" representing singlet.
The multiplet_probability
variable contains the predicted probability for each droplet to be multiplet, leveraging the information across all the provided modalities. It quantifies the uncertainty of multiplet prediction results.
To save the mutiplet classification result:
import pandas as pd
data = {'multiplet_classification': multiplet_classification}
data_file = pd.DataFrame(data)
data_file.index.name = 'index'
data_file.reset_index(inplace=True)
data_file.to_csv("Multiplet_prediction.csv",index=False)
We demonstrate how to use the COMPOSITE output to remove the predicted multiplets from the Seurat object: Eliminating multiplets.