COMPOSITE

COMPOSITE (COMpound POiSson multIplet deTEction model) is a computational tool for multiplet detection in both single-cell single-omics and multiomics settings. It has been implemented as an automated pipeline and is available as both a cloud-based application with a user-friendly interface and a Python package.

Data preparation

COMPOSITE accepts up to three mtx matrix (columns are cells and rows are features) files as input, corresponding to the matrices of three modalities. The RNA.mtx and ADT.mtx matrices are simple raw count matrices. However, the ATAC.mtx matrix requires some preprocessing. The details are outlined below.

For Python user

Like we said before, the RNA.mtx and ADT.mtx are just raw counts matrices. Here is a simple example to save a Scanpy object into a mtx file.

import scipy
#adata is a scanpy.AnnData object.
sparse_X = scipy.sparse.coo_matrix(adata.X)
scipy.io.mmwrite('PATH_OF_YOUR_MTX_FILE', sparse_X)

For ATAC.mtx, we rely on the GeneActivity function from Signac to preprocess it. Signac is written in R, but we provide an easy-to-use script GeneActivity.R for you. Just run

Rscript GeneActivity.R TARGET_PATH PATH_TO_ATAC_H5_FILE PATH_TO_ATAC_METADATA PATH_TO_FRAGMENT

Note that there are four path arguments. Then the preprocessed ATAC.mtx will be under TARGET_PATH.

For R (Seurat) user

To prepare the data from a Seurat object: Preparing data for COMPOSITE.

Demo datasets

We also provide readily available demo datasets that can be used directly as input: RNA.mtx, ADT.mtx, and ATAC.mtx.

Running COMPOSITE

Option 1: Cloud-based web app

Users may directly upload the data files to the COMPOSITE cloud-based app. The results will be sent to the provided email address as a .csv file.

Option 2: Install the Python package

Installation:

pip install sccomposite==1.0.0

Store the RNA data, ADT data, and ATAC data respectively as "RNA.mtx", "ADT.mtx", and "ATAC.mtx" in the working directory. Import the sccomposite package.

import sccomposite
from sccomposite import RNA_modality
from sccomposite import ADT_modality
from sccomposite import ATAC_modality
from sccomposite import Multiomics

We recommend users to use the default parameter settings when running COMPOSITE. COMPOSITE is a robust statistical model and the default parameters are suitable for most of the cases. All the results in our manuscript were generated under the default parameter setting. We recommand the users to use all the available modalities of data as input.

When only one modality of data is available:

# RNA modality only
multiplet_classification, consistency = RNA_modality.composite_rna("RNA.mtx")

# ADT modality only
multiplet_classification, consistency = ADT_modality.composite_adt("ADT.mtx")

# ATAC modality only
multiplet_classification, consistency = ATAC_modality.composite_atac("ATAC.mtx")

The multiplet_classification variable contains the predicted multiplet label for each droplet, with "1" representing multiplet and "0" representing singlet.

The consistency variable contains the droplet-specific modality consistency. A higher value of consistency indicates the data in the corresponding modality are less noisy for the given droplet, resulting in a more reliable multiplet prediction result for the droplet.

When multiomics data are available:

# RNA+ADT
multiplet_classification, multiplet_probability = Multiomics.composite_multiomics(RNA = "RNA.mtx", ADT =  "ADT.mtx")

# RNA+ATAC
multiplet_classification, multiplet_probability = Multiomics.composite_multiomics(RNA = "RNA.mtx", ATAC =  "ATAC.mtx")

# RNA+ADT+ATAC
multiplet_classification, multiplet_probability = Multiomics.composite_multiomics(RNA = "RNA.mtx", ADT =  "ADT.mtx", ATAC =  "ATAC.mtx")

The multiplet_classification variable contains the predicted multiplet label for each droplet, with "1" representing multiplet and "0" representing singlet.

The multiplet_probability variable contains the predicted probability for each droplet to be multiplet, leveraging the information across all the provided modalities. It quantifies the uncertainty of multiplet prediction results.

To save the mutiplet classification result:

import pandas as pd
data = {'multiplet_classification': multiplet_classification}

data_file = pd.DataFrame(data)
data_file.index.name = 'index'
data_file.reset_index(inplace=True)
data_file.to_csv("Multiplet_prediction.csv",index=False)

Using COMPOSITE output in R

We demonstrate how to use the COMPOSITE output to remove the predicted multiplets from the Seurat object: Eliminating multiplets.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
code		code
experiments		experiments
pictures		pictures
.gitignore		.gitignore
GeneActivity.R		GeneActivity.R
LICENSE		LICENSE
README.md		README.md
composite_data_preparation.html		composite_data_preparation.html
downstream.html		downstream.html
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMPOSITE

Data preparation

For Python user

For R (Seurat) user

Demo datasets

Running COMPOSITE

Option 1: Cloud-based web app

Option 2: Install the Python package

Using COMPOSITE output in R

About

Releases 1

Packages

Contributors 3

Languages

License

CHPGenetics/COMPOSITE

Folders and files

Latest commit

History

Repository files navigation

COMPOSITE

Data preparation

For Python user

For R (Seurat) user

Demo datasets

Running COMPOSITE

Option 1: Cloud-based web app

Option 2: Install the Python package

Using COMPOSITE output in R

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages