Skip to content

An educational Portable Executable (PE) packer and training data generator for security research/malware detection. Built in Rust with Python bindings.

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-APPENDIX.md
Notifications You must be signed in to change notification settings

codeamt/rust-python-pe-packer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CI Docs License

PE-Packer

An educational PE (Portable Executable) laboratory built in Rust with Python bindings for research and training on PE formats, safe unpacking workflows, and defensive ML evaluation.

⚠️ Educational Purpose Only: This tool is designed for security research, malware analysis training, and building robust ML detection systems. Use only with legitimate, authorized benign samples.

Quick links: Security Policy · License Appendix (Educational Use)

Features

  • Multiple Encryption Algorithms: XOR, AES, or no encryption
  • Anti-Debugging Techniques: Optional anti-debug code injection
  • Section Randomization: Randomize PE section names to vary packed output
  • Benign Metadata Injection: Add legitimate-looking metadata for evasion research
  • Stub Variation: Generate diverse decryption stubs for dataset diversity
  • Python CLI: Modern command-line interface with Typer
  • Training Dataset Generation: Batch generate variants of PE files with configurable parameters
  • Metadata Tracking: Complete metadata for each packed sample for ML training
  • Cross-platform: Runs on Linux, macOS, and Windows

Installation

Prerequisites

  • Rust 1.56+ (for building from source)
  • Python 3.8+
  • pip or poetry

From PyPI (educational stub)

pip install pe-packer-educational

This PyPI package is a non-functional educational stub that prints guidance and links to the source. It does not ship packing code or native modules.

From Source

# Clone the repository
git clone https://github.com/codeamt/rust-python-pe-packer
cd rust-python-pe-packer

# Using uv (recommended)
uv sync --group dev --group training
uv run maturin develop --features python

# Or using pip
pip install -e .[dev,training]
maturin develop

Quick Start

Pack a Single File

# Safety gate: actual packing requires BOTH
#   PE_PACKER_ALLOW_PACKING=1  and  --force
# Otherwise, the CLI runs in dry-run mode and prints analysis only.

# Dry-run (no env, no --force)
pe-packer pack malware.exe packed.exe

# Actual packing (requires both env and --force)
PE_PACKER_ALLOW_PACKING=1 pe-packer pack malware.exe packed.exe --force

# Pack with AES encryption and anti-debugging
pe-packer pack malware.exe packed.exe --encryption aes --anti-debug

# Randomize sections and add benign metadata (with gates)
PE_PACKER_ALLOW_PACKING=1 pe-packer pack malware.exe packed.exe --force \
  --encryption aes \
  --anti-debug \
  --random-sections \
  --benign-metadata

Generate Training Dataset

# Generate 10 variants per file with all configurations
pe-packer generate-training-data \
  ./benign_samples \
  ./training_data \
  --variants 20

# Generate with specific encryption methods
pe-packer generate-training-data \
  ./benign_samples \
  ./training_data \
  --variants 15 \
  --encryption xor,aes

Testing Dataset Generation (Checklist)

  • Ensure you have a local folder with benign PE files, for example: ./benign_samples/
  • Run a small test to validate the pipeline end-to-end:
# Minimal smoke test: 1 variant, analysis-focused
pe-packer generate-training-data ./benign_samples ./training_data --variants 1

# Analyze the produced metadata
pe-packer analyze-dataset ./training_data/dataset_metadata.json
  • If you intend to actually generate packed binaries (not just analysis), ensure safety gates are consciously enabled when invoking direct packing commands (see Pack a Single File section). Dataset generation itself focuses on safe educational workflows and metadata.

Analyze Generated Dataset

pe-packer analyze-dataset training_data/dataset_metadata.json

Validate PE Files

pe-packer validate suspicious.exe

Commands

pack

Pack a single PE file with specified packing options.

pe-packer pack INPUT OUTPUT [OPTIONS]

Options:

  • --encryption, -e: Encryption algorithm (xor, aes, none) [default: xor]
  • --key, -k: Encryption key as hex string (auto-generated if not provided)
  • --anti-debug: Enable anti-debugging techniques
  • --random-sections: Randomize section names
  • --benign-metadata: Add benign-looking metadata
  • --stub-variation, -s: Stub variation identifier (1-32) [default: 1]
  • --verbose, -v: Enable verbose logging
  • --force: Required along with PE_PACKER_ALLOW_PACKING=1 to perform actual packing

Example:

PE_PACKER_ALLOW_PACKING=1 pe-packer pack sample.exe packed.exe --force \
  --encryption aes \
  --key deadbeef \
  --anti-debug \
  --stub-variation 3

generate-training-data

Generate training dataset with packed samples.

pe-packer generate-training-data INPUT_DIR OUTPUT_DIR [OPTIONS]

Options:

  • --variants, -n: Variants per file (1-100) [default: 10]
  • --encryption, -e: Comma-separated encryption methods [default: xor,aes,none]
  • --anti-debug / --no-anti-debug: Enable anti-debugging [default: enabled]
  • --random-sections / --no-random-sections: Randomize sections [default: enabled]
  • --benign-metadata / --no-benign-metadata: Add benign metadata [default: enabled]
  • --stub-variations, -s: Number of stub variations [default: 5]
  • --verbose, -v: Enable verbose logging

Example:

pe-packer generate-training-data \
  ./benign_samples \
  ./training_data \
  --variants 50 \
  --encryption xor,aes \
  --stub-variations 10

analyze-dataset

Analyze a generated training dataset.

pe-packer analyze-dataset METADATA_FILE [OPTIONS]

Options:

  • --verbose, -v: Enable verbose logging

Example:

pe-packer analyze-dataset training_data/dataset_metadata.json

validate

Validate a PE file format.

pe-packer validate FILE_PATH [OPTIONS]

Options:

  • --verbose, -v: Enable verbose logging

Example:

pe-packer validate sample.exe

Python API

Use PE-Packer programmatically in your Python code.

Basic Usage

from pe_packer import PEPacker, PackerConfig, EncryptionAlgorithm

# Create a configuration
config = PackerConfig(
    encryption=EncryptionAlgorithm.AES,
    add_anti_debug=True,
    randomize_sections=True,
    add_benign_metadata=True,
)

# Create a packer instance
packer = PEPacker(config)

# Pack a file
packer.pack_file("input.exe", "output.exe")

# Or pack from bytes
with open("input.exe", "rb") as f:
    data = f.read()
packed_data = packer.pack_bytes(data)

Training Data Generation

from pe_packer.training import DatasetGenerator, TrainingConfig
from pathlib import Path

# Configure dataset generation
config = TrainingConfig(
    variants_per_file=20,
    encryption_methods=["xor", "aes", "none"],
    enable_anti_debug=[True, False],
    stub_variations=5,
)

# Generate dataset
generator = DatasetGenerator(
    input_dir=Path("./benign_samples"),
    output_dir=Path("./training_data"),
    config=config,
)

dataset_info = generator.generate()
print(f"Generated {dataset_info['total_samples']} samples")

Dataset Analysis

from pe_packer.training import MetadataManager
from pathlib import Path

# Load and analyze dataset
manager = MetadataManager(Path("training_data/dataset_metadata.json"))

# Get statistics
stats = manager.get_statistics()
print(f"Total samples: {stats['total_variants']}")
print(f"Encryption distribution: {stats['encryption_distribution']}")
print(f"Anti-debug coverage: {stats['anti_debug_percentage']:.1f}%")

# Query specific samples
aes_samples = manager.get_samples_by_encryption("aes")
anti_debug_samples = manager.get_samples_with_anti_debug()

Architecture

Rust Backend (src/)

  • packer/: Core packing logic with encryption and stub generation
  • pe/: PE file parsing, building, and structure handling
  • python/: PyO3 bindings for Python integration
  • utils/: Error handling, logging, and utilities

Python Layer (python/)

  • core.py: High-level Python API for packing
  • cli.py: Modern Typer CLI interface
  • training/: Dataset generation and metadata management
  • utils/: File validation, entropy calculation, and helpers

Configuration

Default Configuration

# config/default.toml
encryption = "xor"
add_anti_debug = false
randomize_sections = false
add_benign_metadata = false
stub_variation = 1

Training Configuration

# config/training.toml
variants_per_file = 10
encryption_methods = ["xor", "aes", "none"]
enable_anti_debug = [true, false]
enable_random_sections = [true, false]
enable_benign_metadata = [true, false]
stub_variations = 5

Production Configuration

# config/production.toml
encryption = "aes"
add_anti_debug = true
randomize_sections = true
add_benign_metadata = true
stub_variation = 32

Output Format

Packed PE File

The output is a valid PE executable with:

  • Modified section headers and names
  • Encrypted original code sections
  • Decryption stub as entry point
  • Proper PE alignment and structure

Dataset Metadata

Generated as dataset_metadata.json:

{
  "total_samples": 100,
  "input_dir": "/path/to/samples",
  "output_dir": "/path/to/output",
  "config": {
    "variants_per_file": 10,
    "encryption_methods": ["xor", "aes", "none"],
    "enable_anti_debug": [true, false]
  },
  "files": [
    {
      "original_file": "sample.exe",
      "original_path": "/path/to/sample.exe",
      "variants": [
        {
          "variant_id": 0,
          "output_file": "sample_packed_000.exe",
          "config": {
            "encryption": "xor",
            "add_anti_debug": false
          },
          "file_size": 45056
        }
      ]
    }
  ]
}

Testing

Run Rust Tests

cargo test --verbose

Run Python Tests

pip install pytest pytest-benchmark
pytest python/tests/ -v

Benchmark

# Rust benchmarks
cargo bench

# Python performance tests
python benchmarks/python_performance.py

Building Wheels

Build distributable Python wheels:

pip install maturin
maturin build --release

# Or for local development
maturin develop

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass: cargo test && pytest python/tests/
  2. Code is formatted: cargo fmt && black python/
  3. Linting passes: cargo clippy && ruff check python/
  4. Documentation is updated

License

This project is dual-licensed under Apache 2.0. See LICENSE file for details.

Security Disclaimer

This tool is provided for educational and authorized security research purposes only. Users are responsible for ensuring they have proper authorization before using this tool on any files or systems. Unauthorized modification or distribution of executables may violate laws in your jurisdiction.

Citation

If you use PE-Packer in your research, please cite:

@software{pe_packer,
  title={PE-Packer: Educational PE Packer for Malware Detection Training},
  author={AnnMargaret Tutu},
  year={2025},
  url={https://github.com/codeamt/rust-python-pe-packer}
}

Support

For issues, questions, or suggestions:

  • Open an issue on GitHub
  • Check existing documentation in docs/
  • Review examples in examples/

Acknowledgments

  • Built with Rust, goblin, and PyO3
  • CLI built with Typer
  • Inspired by educational packing techniques and malware analysis research

Last Updated: 2025
Version: 0.1.0

About

An educational Portable Executable (PE) packer and training data generator for security research/malware detection. Built in Rust with Python bindings.

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-APPENDIX.md

Security policy

Stars

Watchers

Forks

Packages

No packages published