A modular framework for running and documenting database benchmarks, with a focus on comparing Exasol with other database systems. This repository provides reusable building blocks to launch benchmark environments, collect detailed system information, run benchmark workloads, and generate reports documenting the results.
- ποΈ Modular Architecture: Fine-grained templates for setup, execution, and reporting
- βοΈ Multi-Cloud Support: AWS infrastructure automation with separate instances per database
- π Benchmark Workloads: TPC-H with support for custom workloads
- π Self-Contained Reports: Generate reproducible reports with all attachments
- π§ Extensible: Easy to add new systems, workloads, and cloud providers
- π Rich Visualizations: Automated generation of performance plots and tables
- π Result Verification: Validate query correctness against expected outputs
# Clone the repository
git clone <repository-url>
cd benchkit
# Install dependencies
python -m pip install -e .
# Run a sample benchmark
make all CFG=configs/exa_vs_ch_1g.yamlThis will:
- Provision cloud infrastructure (if configured)
- Probe system information
- Run Exasol vs ClickHouse TPC-H benchmark
- Generate a complete report with results and reproducibility instructions
π See Getting Started Guide for detailed installation and usage instructions.
The framework provides 9 commands for complete benchmark lifecycle management:
# System information collection
benchkit probe --config configs/my_benchmark.yaml
# Run benchmarks
benchkit run --config configs/my_benchmark.yaml [--systems exasol] [--queries Q01,Q06]
# Generate reports
benchkit report --config configs/my_benchmark.yaml
# Manage infrastructure
benchkit infra apply --provider aws --config configs/my_benchmark.yaml
# Other commands: execute, status, package, verify, cleanupStatus Command provides comprehensive project insights:
- Overview of all projects (probe, benchmark, report status)
- Detailed status for specific configs (system info, infrastructure, timing)
- Cloud infrastructure details (IPs, connection strings)
- Multiple config support and smart project lookup
π See Getting Started Guide for comprehensive CLI documentation and examples.
benchkit/
βββ benchkit/ # Core framework
β βββ cli.py # Command-line interface (9 commands)
β βββ systems/ # Database system implementations
β βββ workloads/ # Benchmark workloads (TPC-H)
β βββ gather/ # System information collection
β βββ run/ # Benchmark execution
β βββ report/ # Report generation
β βββ infra/ # Cloud infrastructure management
β βββ package/ # Minimal package creation
β βββ verify/ # Result verification
βββ templates/ # Jinja2 templates for reports
βββ configs/ # Benchmark configurations
βββ infra/aws/ # AWS Terraform modules
βββ workloads/tpch/ # TPC-H queries and schemas
βββ results/ # Generated results (auto-created)
project_id: "exasol_vs_clickhouse_tpch"
title: "Exasol vs ClickHouse Performance on TPC-H"
env:
mode: "aws"
region: "eu-west-1"
instances:
exasol:
instance_type: "m7i.4xlarge"
clickhouse:
instance_type: "m7i.4xlarge"
systems:
- name: "exasol"
kind: "exasol"
version: "2025.1.0"
setup:
method: "installer"
extra:
dbram: "32g"
- name: "clickhouse"
kind: "clickhouse"
version: "24.12"
setup:
method: "native"
extra:
memory_limit: "32g"
workload:
name: "tpch"
scale_factor: 1
queries:
include: ["Q01", "Q03", "Q06", "Q13"]
runs_per_query: 3
warmup_runs: 1π See Getting Started Guide for more configuration examples.
- Python 3.10+
- Terraform (for cloud infrastructure) - Installation Guide
- At least 16GB RAM (32GB+ recommended for larger benchmarks)
- SSD storage recommended
For cloud deployments, configure AWS credentials:
# Create .env file (recommended)
cat > .env << EOF
AWS_PROFILE=default-mfa
AWS_REGION=eu-west-1
EOFRequired AWS Permissions: ec2:*, ec2:DescribeImages, ec2:DescribeAvailabilityZones
π See Getting Started Guide for detailed cloud setup instructions.
The framework is designed for easy extension:
- Create
benchkit/systems/newsystem.py:
from .base import SystemUnderTest
class NewSystem(SystemUnderTest):
@classmethod
def get_python_dependencies(cls) -> list[str]:
return ["newsystem-driver>=1.0.0"]
def execute_query(self, query: str, query_name: str = None):
# Use native Python driver for universal connectivity
pass
# ... implement other required methods- Register in
benchkit/systems/__init__.py:
SYSTEM_IMPLEMENTATIONS = {
"exasol": "ExasolSystem",
"clickhouse": "ClickHouseSystem",
"newsystem": "NewSystem", # Add this line
}π See Extending the Framework for comprehensive guides on:
- Adding new database systems
- Creating custom workloads
- Adding cloud providers
- Customizing reports and visualizations
- Implementing result verification
Every report is a complete directory with:
- All result data as attachments
- Exact configuration files
- Minimal reproduction package
- Complete setup commands
Uses official Python drivers for universal database connectivity:
- Exasol:
pyexasol- works with Docker, native, cloud, preinstalled - ClickHouse:
clickhouse-connect- works with any deployment
Each system defines its own dependencies via get_python_dependencies(). Packages only include drivers for databases actually benchmarked.
Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands.
- π Getting Started Guide - Installation, usage, and examples
- π§ Extending the Framework - Adding systems, workloads, and features
Core dependencies (automatically installed):
typer- CLI frameworkjinja2- Template renderingpyyaml- Configuration parsingpandas- Data manipulationmatplotlib- Plottingrich- CLI formattingboto3- AWS integration (optional)python-dotenv- .env file support (optional)
Database-specific drivers loaded dynamically based on systems used.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
- Database credentials and licenses should not be committed to the repository
- Use environment variables or
.envfile for sensitive data - The framework includes basic security practices but should be reviewed for production use
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ for reproducible database benchmarking.