TreeHarmonizer is a utility that is used to place called variants onto a pre-existing phylogenetic tree, allowing for visualization of variant trajectories and evolutionary progression. TreeHarmonizer was developed with single nucleotide (SNV), structural (SV), and copy number (CNA) variants in mind, allowing for placement of each variant type.
As of version 0.1.0, TreeHarmonizer works as a jupyter notebook, designed for the paper "Long-read sequencing of single cell-derived melanoma subclones reveals divergent and parallel genomic and epigenomic evolutionary trajectories", by Liu & Goretsky, et al.
Preprint available on biorxiv.
Unaligned BAM files available on NCBI SRA
Further project files available on Zenodo
It is being actively developed to serve as a standalone utility for multiple variant calling inputs and trees alike.
It is easiest to install TreeHarmonizer via the provided conda environment yaml files.
For the notebook version, use notebook_environment.yml
git clone [email protected]:KolmogorovLab/TreeHarmonizer.git
cd TreeHarmonizer
conda env create -f notebook_environment.yml -n tree_harmonizer_nb
conda activate tree_harmonizer_nb
If you would like to create the environment manually, TreeHarmonizer requires a Python 3.6.15 environment with the following packages -
- pandas (v1.1.5)
- intervaltree (v3.1.0) (https://pypi.org/project/intervaltree/)
- ete3 (v3.1.3) (Can be installed with conda - https://etetoolkit.org/download/)
- ete_toolchain (v3.0.0)
- jupyter
Versions listed are those that were tested to be stable and work together.
An example dataset is provided, with the input parameters pre-populated with the relative paths. This dataset consists of chromosome 1 only of the 23 sublines used by "Long-read sequencing of single cell-derived melanoma subclones reveals divergent and parallel genomic and epigenomic evolutionary trajectories", by Liu & Goretsky, et al. The full dataset can be found on Zenodo. All cells can be executed without any changes.
- Expected runtime on chr1 example dataset - ~2 minutes
- Expected runtime on the entire mouse melanoma cell line dataset is ~10 minutes. (Expected runtimes are likely to vary.)
- Standalone package version of TreeHarmonizer
- TreeHarmonizer is being updated to work with ete4, which will allow for Python versions > 3.6 when used with jupyter.
For bug reporting, help, and advising, please submit an [issue]. You may also contact the primary developer at [[email protected]].