Skip to content

artic-network/beast-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BEAST-NF Pipeline

A Nextflow pipeline for Bayesian phylogenetic analysis using BEAST X with exponential growth coalescent model and time-scaled tree visualization.

Overview

This pipeline performs:

  1. XML Generation: Creates BEAST X input XML from aligned FASTA using beastgen with a user-specified template
  2. BEAST Analysis: Runs Bayesian MCMC phylogenetic inference
  3. Tree Annotation: Summarizes posterior tree distribution using TreeAnnotator
  4. Visualization: Renders time-scaled tree using Python and Baltic library
  5. Report Generation: Creates comprehensive HTML report with analysis results

Requirements

Software

  • Nextflow (≥21.04.0)
  • BEAST X (with beastgen and loganalyser utilities)
  • Python 3.9+
  • Biopython
  • Baltic
  • Matplotlib

Installation

Using Conda

conda create -n beast-nf python=3.9 nextflow matplotlib biopython
conda activate beast-nf
pip install baltic
# Install BEAST X separately from https://www.beast2.org/

Using Docker

The pipeline includes Docker profile support. Docker images will be pulled automatically.

Input Requirements

FASTA File Format

  • Aligned sequences in FASTA format
  • Sequence names must contain dates in one of these formats:
    • name|YYYY-MM-DD or name_YYYY-MM-DD (full date)
    • name|YYYY or name_YYYY (year only)

Example:

>sample1|2023-01-15
ATCGATCGATCG...
>sample2|2023-03-20
ATCGATCGATCG...

Template File

  • BEAST XML template file for use with beastgen
  • Template should include variable placeholders using $(variable=default) syntax
  • See templates/exponential_growth.xml for an example

Available template variables:

  • chain_length - MCMC chain length
  • log_every - Logging frequency
  • screen_every - Screen output frequency

Usage

Basic Usage

nextflow run main.nf \
    --input aligned_sequences.fasta \
    --template templates/exponential_growth.xml

With Custom Parameters

nextflow run main.nf \\
    --input aligned_sequences.fasta \\
    --template templates/exponential_growth.xml \\
    --outdir results \\
    --prefix my_analysis \\
    --chain_length 50000000 \\
    --burnin 10

Using Docker

nextflow run main.nf \
    --input aligned_sequences.fasta \
    --template templates/exponential_growth.xml \
    -profile docker

Using Conda

nextflow run main.nf \
    --input aligned_sequences.fasta \
    --template templates/exponential_growth.xml \
    -profile conda

On HPC with SLURM

nextflow run main.nf \
    --input aligned_sequences.fasta \
    --template templates/exponential_growth.xml \
    -profile slurm

Parameters

Parameter Default Description
--input (required) Path to aligned FASTA file
--template (required) Path to BEAST XML template file
--outdir results Output directory
--prefix beast_analysis Prefix for output files
--chain_length 10000000 MCMC chain length
--log_every 1000 Logging interval
--screen_every 10000 Screen output interval
--burnin 10 Burnin percentage for TreeAnnotator
--max_cpus 4 Maximum CPUs for BEAST
--max_memory 8.GB Maximum memory for BEAST
--max_time 48.h Maximum runtime for BEAST

Model Configuration

The pipeline uses beastgen to generate BEAST XML files from templates. The provided template (templates/exponential_growth.xml) includes:

  • Substitution Model: HKY with estimated frequencies
  • Clock Model: Strict molecular clock
  • Tree Prior: Exponential growth coalescent
  • Tip Dates: Automatically parsed from sequence names by beastgen

Priors (in default template)

  • Population Size: 1/x prior
  • Growth Rate: Laplace distribution (μ=0, scale=30.7)
  • Kappa: Log-normal (mean=1.0, SD=1.25)
  • Clock Rate: Uniform (0, 1)

Creating Custom Templates

You can create your own BEAST XML templates for different models. Templates should:

  1. Use $(variable=default) syntax for replaceable parameters
  2. Include <data id="alignment".../> for sequence data
  3. Include tip dates trait if needed
  4. See BEAST X documentation for template format details

Output Structure

results/
├── beast_analysis_report.html          # Comprehensive HTML report
├── xml/
│   └── beast_analysis.xml              # BEAST input XML
├── beast/
│   ├── beast_analysis.log              # Parameter log
│   ├── beast_analysis.trees            # Sampled trees
│   └── beast_analysis.*.log            # Additional logs
├── trees/
│   └── beast_analysis.mcc.tree         # Maximum clade credibility tree
├── figures/
│   ├── beast_analysis_timetree.png     # Time tree visualization
│   └── beast_analysis_timetree.svg     # SVG version
├── pipeline_report.html                # Pipeline execution report
├── timeline.html                       # Execution timeline
├── trace.txt                           # Resource usage trace
└── dag.svg                             # Pipeline DAG

HTML Report

The pipeline generates a comprehensive HTML report (beast_analysis_report.html) that includes:

  • Input Data Summary: Number of taxa, sequence length, template used
  • Taxa Table: List of all taxa with sampling dates (if < 50 taxa)
  • Analysis Details: Chain length, logging frequency, burn-in, runtime
  • Parameter Estimates: Complete table from loganalyser with:
    • Mean, standard error, median
    • 95% HPD intervals
    • ESS values with quality indicators (Good/Fair/Low)
  • Tree Visualization: Embedded SVG of the time-scaled MCC tree

Open the report in any web browser to view all results in one place.

Workflow

graph LR
    A[FASTA File] --> B[Generate XML]
    C[Template] --> B
    B --> D[Run BEAST]
    D --> E[TreeAnnotator]
    E --> F[Visualize Tree]
    D --> G[Generate Report]
    F --> G
    A --> G
    C --> G
    G --> H[HTML Report]
Loading

Example Analysis

1. Prepare your data

# Your aligned sequences with dates in names
head aligned_sequences.fasta
>virus1|2023-01-15
ATCGATCG...
>virus2|2023-02-20
ATCGATCG...

2. Run the pipeline

nextflow run main.nf \
    --input aligned_sequences.fasta \
    --template templates/exponential_growth.xml \
    --chain_length 50000000

3. Check results

# View HTML report in browser
open results/beast_analysis_report.html

# Or view individual files
cat results/beast/beast_analysis.log
cat results/trees/beast_analysis.mcc.tree
open results/figures/beast_analysis_timetree.png

Monitoring

View pipeline progress:

# In terminal
tail -f .nextflow.log

# After completion
open results/pipeline_report.html

Troubleshooting

Date Parsing Issues

If dates aren't recognized, check sequence names match supported formats:

  • name|YYYY-MM-DD
  • name_YYYY-MM-DD
  • name|YYYY
  • name_YYYY

Memory Issues

Increase memory for BEAST:

nextflow run main.nf --input data.fasta --max_memory 16.GB

Long Runtime

For faster testing, reduce chain length:

nextflow run main.nf \
    --input data.fasta \
    --template templates/exponential_growth.xml \
    --chain_length 1000000

BEAST X or beastgen Not Found

Ensure BEAST X is installed and tools are in PATH:

beastgen -version
beast -version
treeannotator -version
loganalyser -version

Low ESS Values in Report

If the HTML report shows low ESS (Effective Sample Size) values:

  • Increase chain length: --chain_length 50000000
  • Check for convergence issues in Tracer
  • Consider adjusting operators in the template

Advanced Usage

Custom Templates

Create your own BEAST XML template with different models:

# Copy and modify the example template
cp templates/exponential_growth.xml templates/my_model.xml
# Edit my_model.xml to change substitution model, tree prior, etc.

# Run with custom template
nextflow run main.nf \
    --input data.fasta \
    --template templates/my_model.xml

Template Variables

Templates can use these variables (passed via beastgen):

  • $(chain_length=10000000) - MCMC chain length
  • $(log_every=1000) - Logging frequency
  • $(screen_every=10000) - Screen output frequency

Custom BEAST Parameters

To pass additional parameters to beastgen, modify the GENERATE_XML process in main.nf to add more -D flags:

beastgen \\
    -D chain_length=${params.chain_length} \\
    -D my_parameter=${params.my_parameter} \\
    ${template} \\
    ${fasta} \\
    ${params.prefix}.xml

Custom Visualization

Edit the visualization script (bin/visualize_tree.py) to customize:

  • Tree layout
  • Color schemes
  • Node annotations
  • Figure dimensions

Citation

If you use this pipeline, please cite:

  • BEAST: Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7:214.
  • BEAST X: Suchard MA, et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evolution 4(1): vey016.
  • Nextflow: Di Tommaso et al. (2017) Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319.
  • Baltic: https://github.com/evogytis/baltic

License

MIT License

Support

For issues and questions:

  • Create an issue on GitHub
  • Contact: ARTIC Network

Acknowledgments

Developed for the ARTIC Network phylogenetic analysis workflows.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published