NHP Data

A comprehensive data processing pipeline for the New Hospital Programme (NHP) model. This project orchestrates the extraction, transformation, and preparation of data required for the model.

Built to work in pyspark on Databricks, with Hospital Episode Statistics (HES) data.

Admitted Patient Care (APC)
Outpatient Appointments (OPA)
Emergency Care Dataset (ECDS), and for historical trends, Accident and Emergency (AAE)
ONS Population Projections
NHS Reference Data

Architecture

The project uses Databricks Asset Bundles to manage deployment. All processing is orchestrated through Databricks workflows that can run independently, or as part of the main pipeline.

graph TD
    ref[Reference Data]
    inputs[Inputs Data]
    ecds[ECDS Data]
    ip[Inpatient Data] 
    op[Outpatient Data]
    model[Model Data Extraction]

    ref --> ecds
    ref --> ip
    ref --> op
    
    ecds --> inputs
    ip --> inputs
    op --> inputs
    
    inputs --> model

The workflows are built into a python package, with all of the code in the src/ folder. Each task in the workflows is defined as an entry point in pyproject.toml, and by convention is a main() function which takes no arguments (parameters passed in via sys.argv).

Getting Started

Prerequisites

Access to Databricks workspace
Appropriate permissions for access to the data
Python 3.11+
uv

Installation

The project is packaged as a Python wheel and deployed via Databricks bundles:

# Build the package
uv build

# Deploy to development
databricks bundle deploy --target dev

Deployment to the prod target is via GitHub actions, and should not be done manually.

Running Workflows

Run the complete data pipeline:

databricks jobs run --job-name "Generate NHP Data"

Run individual components:

# Process reference data only
databricks jobs run --job-name "Generate NHP Data (Reference Data)"

# Process emergency care data
databricks jobs run --job-name "Generate NHP Data (AAE/ECDS)"

# Extract data for modeling containers
databricks jobs run --job-name "Extract NHP for containers"

Name		Name	Last commit message	Last commit date
Latest commit History 515 Commits
.github/workflows		.github/workflows
.vscode		.vscode
databricks_workflows		databricks_workflows
src/nhp/data		src/nhp/data
.gitignore		.gitignore
.python-version		.python-version
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
databricks.yml		databricks.yml
pyproject.toml		pyproject.toml
readme.md		readme.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NHP Data

Architecture

Getting Started

Prerequisites

Installation

Running Workflows

Run the complete data pipeline:

Run individual components:

About

Uh oh!

Releases 13

Contributors 3

Uh oh!

Languages

License

The-Strategy-Unit/nhp_data

Folders and files

Latest commit

History

Repository files navigation

NHP Data

Architecture

Getting Started

Prerequisites

Installation

Running Workflows

Run the complete data pipeline:

Run individual components:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Contributors 3

Uh oh!

Languages