A Nextflow pipeline for Bayesian phylogenetic analysis using BEAST X with exponential growth coalescent model and time-scaled tree visualization.
This pipeline performs:
- XML Generation: Creates BEAST X input XML from aligned FASTA using beastgen with a user-specified template
- BEAST Analysis: Runs Bayesian MCMC phylogenetic inference
- Tree Annotation: Summarizes posterior tree distribution using TreeAnnotator
- Visualization: Renders time-scaled tree using Python and Baltic library
- Report Generation: Creates comprehensive HTML report with analysis results
- Nextflow (≥21.04.0)
- BEAST X (with beastgen and loganalyser utilities)
- Python 3.9+
- Biopython
- Baltic
- Matplotlib
conda create -n beast-nf python=3.9 nextflow matplotlib biopython
conda activate beast-nf
pip install baltic
# Install BEAST X separately from https://www.beast2.org/The pipeline includes Docker profile support. Docker images will be pulled automatically.
- Aligned sequences in FASTA format
- Sequence names must contain dates in one of these formats:
name|YYYY-MM-DDorname_YYYY-MM-DD(full date)name|YYYYorname_YYYY(year only)
Example:
>sample1|2023-01-15
ATCGATCGATCG...
>sample2|2023-03-20
ATCGATCGATCG...
- BEAST XML template file for use with
beastgen - Template should include variable placeholders using
$(variable=default)syntax - See templates/exponential_growth.xml for an example
Available template variables:
chain_length- MCMC chain lengthlog_every- Logging frequencyscreen_every- Screen output frequency
nextflow run main.nf \
--input aligned_sequences.fasta \
--template templates/exponential_growth.xmlnextflow run main.nf \\
--input aligned_sequences.fasta \\
--template templates/exponential_growth.xml \\
--outdir results \\
--prefix my_analysis \\
--chain_length 50000000 \\
--burnin 10nextflow run main.nf \
--input aligned_sequences.fasta \
--template templates/exponential_growth.xml \
-profile dockernextflow run main.nf \
--input aligned_sequences.fasta \
--template templates/exponential_growth.xml \
-profile condanextflow run main.nf \
--input aligned_sequences.fasta \
--template templates/exponential_growth.xml \
-profile slurm| Parameter | Default | Description |
|---|---|---|
--input |
(required) | Path to aligned FASTA file |
--template |
(required) | Path to BEAST XML template file |
--outdir |
results |
Output directory |
--prefix |
beast_analysis |
Prefix for output files |
--chain_length |
10000000 |
MCMC chain length |
--log_every |
1000 |
Logging interval |
--screen_every |
10000 |
Screen output interval |
--burnin |
10 |
Burnin percentage for TreeAnnotator |
--max_cpus |
4 |
Maximum CPUs for BEAST |
--max_memory |
8.GB |
Maximum memory for BEAST |
--max_time |
48.h |
Maximum runtime for BEAST |
The pipeline uses beastgen to generate BEAST XML files from templates. The provided template (templates/exponential_growth.xml) includes:
- Substitution Model: HKY with estimated frequencies
- Clock Model: Strict molecular clock
- Tree Prior: Exponential growth coalescent
- Tip Dates: Automatically parsed from sequence names by beastgen
- Population Size: 1/x prior
- Growth Rate: Laplace distribution (μ=0, scale=30.7)
- Kappa: Log-normal (mean=1.0, SD=1.25)
- Clock Rate: Uniform (0, 1)
You can create your own BEAST XML templates for different models. Templates should:
- Use
$(variable=default)syntax for replaceable parameters - Include
<data id="alignment".../>for sequence data - Include tip dates trait if needed
- See BEAST X documentation for template format details
results/
├── beast_analysis_report.html # Comprehensive HTML report
├── xml/
│ └── beast_analysis.xml # BEAST input XML
├── beast/
│ ├── beast_analysis.log # Parameter log
│ ├── beast_analysis.trees # Sampled trees
│ └── beast_analysis.*.log # Additional logs
├── trees/
│ └── beast_analysis.mcc.tree # Maximum clade credibility tree
├── figures/
│ ├── beast_analysis_timetree.png # Time tree visualization
│ └── beast_analysis_timetree.svg # SVG version
├── pipeline_report.html # Pipeline execution report
├── timeline.html # Execution timeline
├── trace.txt # Resource usage trace
└── dag.svg # Pipeline DAG
The pipeline generates a comprehensive HTML report (beast_analysis_report.html) that includes:
- Input Data Summary: Number of taxa, sequence length, template used
- Taxa Table: List of all taxa with sampling dates (if < 50 taxa)
- Analysis Details: Chain length, logging frequency, burn-in, runtime
- Parameter Estimates: Complete table from loganalyser with:
- Mean, standard error, median
- 95% HPD intervals
- ESS values with quality indicators (Good/Fair/Low)
- Tree Visualization: Embedded SVG of the time-scaled MCC tree
Open the report in any web browser to view all results in one place.
graph LR
A[FASTA File] --> B[Generate XML]
C[Template] --> B
B --> D[Run BEAST]
D --> E[TreeAnnotator]
E --> F[Visualize Tree]
D --> G[Generate Report]
F --> G
A --> G
C --> G
G --> H[HTML Report]
# Your aligned sequences with dates in names
head aligned_sequences.fasta
>virus1|2023-01-15
ATCGATCG...
>virus2|2023-02-20
ATCGATCG...nextflow run main.nf \
--input aligned_sequences.fasta \
--template templates/exponential_growth.xml \
--chain_length 50000000# View HTML report in browser
open results/beast_analysis_report.html
# Or view individual files
cat results/beast/beast_analysis.log
cat results/trees/beast_analysis.mcc.tree
open results/figures/beast_analysis_timetree.pngView pipeline progress:
# In terminal
tail -f .nextflow.log
# After completion
open results/pipeline_report.htmlIf dates aren't recognized, check sequence names match supported formats:
name|YYYY-MM-DDname_YYYY-MM-DDname|YYYYname_YYYY
Increase memory for BEAST:
nextflow run main.nf --input data.fasta --max_memory 16.GBFor faster testing, reduce chain length:
nextflow run main.nf \
--input data.fasta \
--template templates/exponential_growth.xml \
--chain_length 1000000Ensure BEAST X is installed and tools are in PATH:
beastgen -version
beast -version
treeannotator -version
loganalyser -versionIf the HTML report shows low ESS (Effective Sample Size) values:
- Increase chain length:
--chain_length 50000000 - Check for convergence issues in Tracer
- Consider adjusting operators in the template
Create your own BEAST XML template with different models:
# Copy and modify the example template
cp templates/exponential_growth.xml templates/my_model.xml
# Edit my_model.xml to change substitution model, tree prior, etc.
# Run with custom template
nextflow run main.nf \
--input data.fasta \
--template templates/my_model.xmlTemplates can use these variables (passed via beastgen):
$(chain_length=10000000)- MCMC chain length$(log_every=1000)- Logging frequency$(screen_every=10000)- Screen output frequency
To pass additional parameters to beastgen, modify the GENERATE_XML process in main.nf to add more -D flags:
beastgen \\
-D chain_length=${params.chain_length} \\
-D my_parameter=${params.my_parameter} \\
${template} \\
${fasta} \\
${params.prefix}.xmlEdit the visualization script (bin/visualize_tree.py) to customize:
- Tree layout
- Color schemes
- Node annotations
- Figure dimensions
If you use this pipeline, please cite:
- BEAST: Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7:214.
- BEAST X: Suchard MA, et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evolution 4(1): vey016.
- Nextflow: Di Tommaso et al. (2017) Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319.
- Baltic: https://github.com/evogytis/baltic
MIT License
For issues and questions:
- Create an issue on GitHub
- Contact: ARTIC Network
Developed for the ARTIC Network phylogenetic analysis workflows.