Contrastive Representation Learning for Single Cell Phenotyping in Whole Slide Imaging of Enrichment-free Liquid Biopsy

Documentation: https://csi-cancer.github.io/deep_phenotyping/

Overview

This work develops a deep contrastive learning framework for identifying and stratifying single cells in whole slide immunofluorescence microscopy images derived from liquid biopsies.

Tumor-associated cells obtained from liquid biopsies hold promise for cancer detection, diagnosis, prognosis, and monitoring. However, their rarity, heterogeneity, and plasticity pose challenges for precise identification and characterization, particularly in clinical contexts.

Overview of the Work and A Gallery of Cells Found in Enrichment-Free Liquid Biopsies

Model Training Schema

Key Contributions

Robust Cell Identification: Introduces a deep contrastive learning framework to robustly identify and classify circulating cells from whole slide images.
High Classification Accuracy: Demonstrates high accuracy (92.64%) in classifying diverse cell phenotypes.
Enhanced Downstream Performance: Improves the performance of downstream tasks, including outlier detection and clustering.
Automated Rare Cell Identification: Enables automated identification and enumeration of distinct rare cell phenotypes, achieving:
- Average F1-score of 0.93 across cell lines mimicking circulating tumor cells and endothelial cells.
- Average F1-score of 0.858 across circulating tumor cell (CTC) phenotypes in clinical samples.
Scalable Analysis Pipeline: Provides a scalable analysis pipeline for tumor-associated cellular biomarkers, facilitating clinical prognosis and personalized treatment strategies.

USAGE

Clone this repository
Create conda environment:

conda create -n deep_phenotyping python=3.9.7
conda activate deep_phenotyping
pip install -r requirements.txt

KNOWN ERRORS with installation: datrie (required for snakemake) will not install on some linux systems missing C compilers (gcc).

Fix:

sudo apt update
sudo apt install build-essential python3-dev

then, repeat installation as above.

a. Run Model Training on Data (train_cl.py)

python train_cl.py --config config/config.yml --sweep_config config/sweep_config.yml

b. Run pipeline in terminal

pipeline/run.sh

This code was developed on Linux machinery and has only been adequately tested on Ubuntu 22.04.

Directory Structure

project_root/
├── figures/                # Notebooks and scripts for figure generation and analysis
├── pipeline/               # Pipeline for running the model on whole slide images
├── src/                    # Core code for representation learning and classification
├── train_data/             # Datasets used for training the models
└── README.md               # Project documentation

Figures Directory (`figures/`)

Contains subdirectories with notebooks and scripts to generate figures for the manuscript. Each subdirectory corresponds to a figure and may contain:

Data Folders: Processed data used to generate specific plots. This data is available upon request
Jupyter Notebooks: Scripts to generate the figures.
Output Files: PDF and image files of the figures.

Subdirectories:

figure2/: Classification performance (e.g., confusion matrices, precision-recall curves).
figure3_blur/: Blur robustness analysis.
figure4_clustering/: Clustering performance visualization.
figure5_outlierdetection/: Outlier detection evaluation.
figure6_spikein/: Spike-in experiments and analysis.
figure7_patient/: Patient-specific analysis and figures.
segmentation_test/: Testing segmentation algorithms.

Pipeline Directory (`pipeline/`)

Contains scripts and configurations to process whole slide images through the deep learning model.

Subdirectories:

config/: Configuration files for model and pipeline settings.
metadata/: Input data descriptions and sample metadata. This metadata is created when running the utility scripts
model_weights/: Pretrained model weights. These weights are available upon request
output/: Contains the output of pipeline runs, separated by patients and spike-in experiments. Outputs are created upon running the pipeline
src/: Scripts to execute the pipeline (e.g., pipeline.py, combo_pipeline.py).
utils_scripts/: Auxiliary scripts, such as collecting slide metadata.

Key Files:

run.sh, runcombo.sh, run_wbc.sh: Scripts to execute the pipeline with various configurations.
Snakefile, SnakefileCombo, SnakefileWBC: Snakemake workflows for different data types.

Source Code Directory (`src/`)

This directory houses the core scripts for both representation learning and leukocyte classification.

Subdirectories:

leukocyte_classifier/
- Contains scripts for classifying leukocytes and related cell types.
- Scripts:
  - train_wbc.py: Train the leukocyte classifier.
  - wbc_classifier.py: Main clapip install -r requirements.txtntrastive learning model architecture.
  - data_loader.py: Data handling and loading functions.
  - cl_transforms.py: Data augmentation and transformation functions.
- Configuration:
  - config/config.yml: Main configuration file for contrastive learning.
  - config/sweep_config.yml: Configuration for hyperparameter sweeps.
**utils/**pip install -r requirements.txtning the models, including representation learning and leukocyte classification. This data is available upon request.

Subdirectories:

representation_learning/: Training data for the contrastive learning model.
wbc_classifier/: Data specific to white blood cell classification.

License

This project is licensed under the Apache License. See the LICENSE file for details.

Contact

For questions or collaboration, please contact the project maintainers through the GitHub repository's issue tracker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contrastive Representation Learning for Single Cell Phenotyping in Whole Slide Imaging of Enrichment-free Liquid Biopsy

Overview

Key Contributions

USAGE

Directory Structure

Figures Directory (`figures/`)

Subdirectories:

Pipeline Directory (`pipeline/`)

Subdirectories:

Key Files:

Source Code Directory (`src/`)

Subdirectories:

Subdirectories:

License

Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs		docs
docs_templates		docs_templates
figures		figures
images		images
pipeline		pipeline
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

CSI-Cancer/deep_phenotyping

Folders and files

Latest commit

History

Repository files navigation

Contrastive Representation Learning for Single Cell Phenotyping in Whole Slide Imaging of Enrichment-free Liquid Biopsy

Overview

Key Contributions

USAGE

Directory Structure

Figures Directory (figures/)

Subdirectories:

Pipeline Directory (pipeline/)

Subdirectories:

Key Files:

Source Code Directory (src/)

Subdirectories:

Subdirectories:

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Figures Directory (`figures/`)

Pipeline Directory (`pipeline/`)

Source Code Directory (`src/`)

Packages