Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
8bdd558
code: setup mypy with initial configuration
Apr 3, 2025
2b4f699
code: resolve type errors flagged by mypy static analysis
Apr 3, 2025
0691b65
code: local development ignore folders
Apr 3, 2025
069d399
code: Add three more hooks
Apr 5, 2025
e61323c
code: Auto fixes as a result of the new hooks
Apr 5, 2025
eec320d
code: add github action for mypy test
Apr 7, 2025
ab60e48
Merge branch 'PyPSA:master' into refactor_code_quality
cdgaete Apr 8, 2025
71a85fb
Merge branch 'PyPSA:master' into refactor_code_quality
cdgaete Apr 17, 2025
21d5f1a
Running case with little optimizations
Apr 23, 2025
af87478
Merge branch 'PyPSA:master' into feature_osm_perf_improvements
cdgaete Apr 28, 2025
3c15f76
Merge branch 'PyPSA:master' into feature_osm_perf_improvements
cdgaete Apr 28, 2025
e1e3148
Feature: Modular implementation. Debugging...
Apr 29, 2025
bd769de
Merge branch 'feature_osm_perf_improvements' of https://github.com/cd…
Apr 29, 2025
7a2212f
code clean and debug
May 5, 2025
3c79de9
Merge branch 'PyPSA:master' into feature_osm_perf_improvements
cdgaete May 5, 2025
7921317
Merge branch 'PyPSA:master' into feature_osm_perf_improvements
cdgaete May 5, 2025
f889def
Merge branch 'feature_osm_perf_improvements' of https://github.com/cd…
May 5, 2025
2855d48
Rejection tracker implementation
May 7, 2025
a646522
Merge branch 'PyPSA:master' into feature_osm_perf_improvements
cdgaete May 7, 2025
4488527
Merge branch 'PyPSA:master' into feature_osm_perf_improvements
cdgaete May 7, 2025
8149d68
Merge branch 'feature_osm_perf_improvements' of https://github.com/cd…
May 7, 2025
dbbff7f
Debugging workflow and tags parsing
May 8, 2025
923af74
Fine tune data parsing
May 8, 2025
7178d73
Clean-up and optimization
May 10, 2025
322b314
Several enhancements and examples scripts
May 12, 2025
3120c21
feat: add Units collection class and enhanced rejection tracking with…
May 30, 2025
5821de2
Fix No config filter. To be implemented
May 30, 2025
4771635
Fix Linear plot added
May 30, 2025
6e53f6a
Fix typo
May 30, 2025
450d93c
Feat: add plant reconstruction from incomplete OSM data
Jun 17, 2025
9c366ad
Fix: Enabled config_filter in pandas pipe in the OSM function in data.py
Jun 17, 2025
0787867
Fix unformated date from raise to warning
Jun 18, 2025
95e6574
feat: add regional download functionality for OSM power plant data
Jun 19, 2025
502cf91
feat: add progress tracking for OSM downloads
Jun 19, 2025
eb02651
fix: Replace exceptions with warnings for invalid countries/regions
Jun 19, 2025
54434fa
test: Add country validation to OSM example scripts
Jun 19, 2025
d11ad5c
Add OSM cache coverage analysis and population utilities
Jun 19, 2025
4c734b0
fix: import due to ruff complaint of relative import
Jun 19, 2025
6682303
refactor(osm): split element processing, add PlantGeometry, improve t…
Jun 22, 2025
c86b330
Update config.yaml
diazr-david Jun 23, 2025
7b64c3f
Merge branch 'feature_osm_perf_improvements' of https://github.com/op…
diazr-david Jun 23, 2025
62f32fa
feat(osm): expand source/tech mappings, improve rejection tracking, e…
Jun 25, 2025
e492f88
Update OSM config
Jun 25, 2025
896e80d
Merge remote-tracking branch 'origin/feature_osm_perf_improvements' i…
Jun 25, 2025
d0968c6
Fix: remove docstring and comments
Jun 25, 2025
c3bae64
Reorganize OSM module structure and add comprehensive tutorials
Jun 27, 2025
e82728c
docs: Add comprehensive NumPy-style docstrings to OSM module
Jun 27, 2025
1f0edf4
feat: standardize country codes to full names in output
Jun 30, 2025
3d44fe5
Create run_osm_pipeline.py
diazr-david Jul 2, 2025
b76e94c
Update osm-module.rst
diazr-david Jul 2, 2025
f1bab2f
Update osm-module.rst
diazr-david Jul 2, 2025
b2e5b78
Update osm-module.rst
diazr-david Jul 2, 2025
7f34115
Create ppm_osm_pipeline.png
diazr-david Jul 2, 2025
ba28a6f
Update osm-module.rst
diazr-david Jul 2, 2025
9566c7d
Update osm-module.rst
diazr-david Jul 2, 2025
310bfe9
Update ppm_osm_pipeline.png
diazr-david Jul 3, 2025
17a917f
Merge branch 'feature_osm_perf_improvements' of https://github.com/op…
diazr-david Jul 3, 2025
ac179f3
Merge branch 'master' into feature_osm_perf_improvements
FabianHofmann Jul 10, 2025
83f3234
Update osm-module.rst
diazr-david Jul 10, 2025
4a3558b
Update osm-module.rst
diazr-david Jul 10, 2025
8e62ce8
Update osm-module.rst
diazr-david Jul 10, 2025
21d9b85
Update osm-module.rst
diazr-david Jul 10, 2025
6b3ba39
Update osm-module.rst
diazr-david Jul 10, 2025
b996223
Update osm-module.rst
diazr-david Jul 10, 2025
e97b48a
Update osm-module.rst
diazr-david Jul 10, 2025
cd7b074
Update osm-module.rst
diazr-david Jul 10, 2025
a46a541
Update osm-module.rst
diazr-david Jul 10, 2025
4900a3c
Update osm-module.rst
diazr-david Jul 10, 2025
1fadabb
docs: Add OSM module to release notes
Jul 10, 2025
614713e
Merge branch 'feature_osm_perf_improvements' of https://github.com/op…
Jul 10, 2025
749e3fc
Drop Python 3.9 support and remove mypy type checking
Jul 29, 2025
2ed62b6
Update type hints to Python 3.10+ union syntax
Jul 29, 2025
9ee8253
Fixes: address PR review feedback - improve code organization and con…
Sep 26, 2025
ef01699
Fix: Correct number of EU countries in config
Sep 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .codespell.ignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ ue
gud
hel
BU
Nam
FO
30 changes: 0 additions & 30 deletions .github/workflows/type-checking.yml

This file was deleted.

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,6 @@ test.ipynb
# temporary
.devcontainer/
.repoai/

uv.lock
output/
183 changes: 183 additions & 0 deletions analysis/1_osm_basics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
#!/usr/bin/env python3
"""
OSM Tutorial Part 1: Data Loading and Configuration
==================================================

Learn how to:
1. Load OSM power plant data
2. Configure data processing options
3. Handle data quality settings
"""

from powerplantmatching.core import get_config
from powerplantmatching.data import OSM

# Understanding the OSM() function
# ================================
# The OSM() function is a high-level interface that automatically:
# - Downloads or loads cached OpenStreetMap data
# - Processes raw OSM elements into power plants
# - Applies quality filters and validation
# - Estimates missing capacities (if enabled)
# - Reconstructs plants from generators (if enabled)
# - Returns a clean pandas DataFrame ready for analysis
#
# For more control over these steps, see tutorials 2 & 3

# Example 1: Basic data loading
# =============================
config = get_config()
config["target_countries"] = ["Luxembourg"]

# Load with default settings
df = OSM(config=config)
print(f"Loaded {len(df)} power plants from Luxembourg\n")


# Example 2: Configure data quality requirements
# ==============================================
# The OSM module can filter data based on completeness

# First, get baseline count with permissive settings
config_baseline = get_config()
config_baseline["target_countries"] = ["Luxembourg"]
config_baseline["OSM"]["missing_name_allowed"] = True # Allow unnamed plants
df_baseline = OSM(config=config_baseline)

# Now apply strict requirements
config["OSM"]["missing_name_allowed"] = False # Reject unnamed plants
config["OSM"]["missing_technology_allowed"] = True # Allow missing technology
config["OSM"]["missing_start_date_allowed"] = True # Allow missing start dates

# This will return fewer plants due to stricter requirements
df_strict = OSM(config=config)
print(f"With strict name requirement: {len(df_strict)} plants")
print(f"Filtered out: {len(df_baseline) - len(df_strict)} plants without names\n")


# Example 3: Control data processing features
# ===========================================
config["OSM"]["capacity_extraction"]["enabled"] = True # Extract capacity from tags
config["OSM"]["capacity_estimation"]["enabled"] = True # Estimate missing capacities
config["OSM"]["units_clustering"]["enabled"] = False # Don't cluster nearby generators
config["OSM"]["units_reconstruction"]["enabled"] = (
True # Reconstruct plants from generators
)

df_processed = OSM(config=config)
print(f"With extraction, estimation and reconstruction: {len(df_processed)} plants\n")

# Note: capacity_extraction vs capacity_estimation
# - Extraction: Reads capacity from OSM tags (plant:output:electricity=10 MW)
# - Estimation: Calculates capacity when missing (e.g., from area for solar)


# Example 4: Cache behavior - force_refresh vs update
# ===================================================
# Two parameters control how OSM handles cached data:
# - force_refresh: controls the OSM module's internal behavior
# - update: controls powerplantmatching's high-level cache

# Case 1: Use all caches (fastest)
config["OSM"]["force_refresh"] = False # Use OSM's cache
df_cached = OSM(config=config, update=False) # Use PPM's cache

# Case 2: Update PPM cache from OSM cache
config["OSM"]["force_refresh"] = False # Use OSM's cache
df_updated = OSM(config=config, update=True) # Refresh PPM's cache

# Case 3: Full refresh from OpenStreetMap (slowest)
# config["OSM"]["force_refresh"] = True # Download from OSM
# df_fresh = OSM(config=config, update=True) # Update PPM's cache

# Summary:
# - force_refresh=False, update=False: Use all cached data
# - force_refresh=False, update=True: Refresh PPM cache from OSM cache
# - force_refresh=True, update=True: Download fresh from OpenStreetMap


# Example 5: Load multiple countries efficiently
# ==============================================
config["target_countries"] = ["Luxembourg", "Malta", "Cyprus"]
config["OSM"]["plants_only"] = True # Only load plants, not generators

df_multi = OSM(config=config)
print(f"Loaded {len(df_multi)} plants from 3 countries")

# The module handles each country separately for memory efficiency


# Example 6: Custom cache directory
# ================================================
# The OSM module supports custom cache directories via config
# This is useful for managing large caches or sharing between projects

# Method 1: Set in config.yaml
# OSM:
# cache_dir: ~/osm_caches/project1 # Custom location
# fn: osm_data.csv # CSV filename (stored IN cache_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have the osm_cache in a hidden folder .osm_cache? Or even .powerplantmatching_osm_cache, so that people know where this comes from


# Method 2: Set programmatically
config["OSM"]["cache_dir"] = "~/osm_caches/europe" # Will be expanded
df_custom_cache = OSM(config=config)

# Benefits:
# - Keep large caches (6GB for 249 countries) separate from project
# - Share cache across multiple projects
# - Use faster/larger storage for cache
# - Separate test/dev/prod caches
# - All OSM data in one place (CSV + API caches)

# The cache_dir path can be:
# - Absolute: /data/osm_cache
# - Relative: ./cache/osm (relative to data directory)
# - With ~: ~/osm_caches/global (expands to home directory)

# The CSV cache file (osm_data.csv) is stored INSIDE cache_dir
# Structure:
# cache_dir/
# ├── osm_data.csv # CSV cache (all countries)
# ├── plants/ # API cache
# ├── generators/ # API cache
# └── units/ # API cache

print(f"✓ Example 6 complete! Loaded {len(df_custom_cache)} plants")


# Example 7: Understanding source and technology mapping
# ======================================================
# OSM data uses various tags that are mapped to standard categories
# This ensures consistency across different tagging conventions

# The mapping is defined in config.yaml under OSM section:
# - source_mapping: Maps OSM generator:source tags to standard fuel types
# - technology_mapping: Maps OSM generator:method tags to standard technologies

# Standard fuel types (see powerplantmatching.CONSTANT_FUELTYPE):
# ['Bioenergy', 'Geothermal', 'Hard Coal', 'Hydro', 'Lignite',
# 'Natural Gas', 'Nuclear', 'Oil', 'Other', 'Solar', 'Wind']

# Standard technologies (see powerplantmatching documentation):
# ['CCGT', 'OCGT', 'Steam Turbine', 'Combustion Engine',
# 'Run-Of-River', 'Reservoir', 'Pumped Storage',
# 'Onshore', 'Offshore', 'PV', 'CSP']

# Example mappings from config.yaml:
# source_mapping:
# Solar: [solar, photovoltaic, solar_thermal, pv]
# Wind: [wind, wind_power, wind_turbine]
# Natural Gas: [gas, natural_gas, lng]

# This means:
# - generator:source=solar → Fueltype="Solar"
# - generator:source=gas → Fueltype="Natural Gas"
# - generator:method=photovoltaic → Technology="PV"

# You can extend mappings for regional variations:
config["OSM"]["source_mapping"]["Solar"].append("sonnenkraft") # German
config["OSM"]["technology_mapping"]["PV"].append("fotovoltaico") # Spanish

# Reload with extended mappings
df_extended = OSM(config=config, update=True)

print("\n✓ Mapping example complete!")
151 changes: 151 additions & 0 deletions analysis/2_osm_cache_and_quality.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
#!/usr/bin/env python3
"""
OSM Tutorial Part 2: Cache Management and Data Quality
=====================================================

Learn how to:
1. Manage the OSM cache system
2. Track data quality and rejections
3. Download data for new countries

Cache Structure:
The OSM module uses a unified cache directory containing:
- osm_data.csv: Combined CSV cache for all countries
- plants/: Raw plant data from OpenStreetMap
- generators/: Generator data from OpenStreetMap
- units/: Processed unit data

You can set a custom cache location in config.yaml:
OSM:
cache_dir: ~/osm_caches/global
fn: osm_data.csv
"""

from powerplantmatching.core import get_config
from powerplantmatching.osm import (
find_outdated_caches,
get_country_coverage_data,
populate_cache,
print_coverage_report,
)

# Example 1: Check what's in the cache
# ====================================
print("=== Current Cache Status ===")

# You can specify a custom cache directory
# If not specified, it uses the value from config.yaml
# or defaults to ./osm_cache
data = get_country_coverage_data(
cache_dir=None, # Uses config value or default
check_live_counts=False, # Don't query live OSM data
)

print_coverage_report(
coverage_data=data,
show_missing=False,
check_live_counts=False,
show_outdated_only=False,
)

# Using a specific cache directory:
# get_country_coverage_data(cache_dir="~/osm_caches/europe")

# Note: check_live_counts=True would:
# - Query the Overpass API for current element counts
# - Compare cached vs. live data to identify outdated caches
# - Show which countries have new power plants since last download
# - This is slower as it makes API calls for each country


# Example 2: Find outdated caches
# ===============================
# Identify countries where OSM has new data since last download

print("\n=== Checking for Outdated Data ===")
outdated = find_outdated_caches(
threshold=0.95, # Flag if cache has <95% of current OSM data
check_specific_countries=["Germany", "France", "Spain"],
)

if outdated:
print(f"Found {len(outdated)} countries with outdated data:")
for country in outdated[:3]: # Show first 3
print(f" {country['name']}: {country['total_missing']} new elements")
else:
print("All checked countries are up to date!")


# Example 3: Populate cache for new countries
# ===========================================
print("\n=== Downloading New Data ===")

# Download data for small countries
result = populate_cache(
countries=["Liechtenstein", "Monaco"],
cache_dir=None, # Uses config value or default ./osm_cache
force_refresh=False, # Skip if already cached
show_progress=True, # Show download progress
)

# Or use a custom cache directory:
# result = populate_cache(
# countries=["Kenya", "Uganda"],
# cache_dir="~/osm_caches/africa",
# force_refresh=False,
# show_progress=True,
# )

print("\nResults:")
print(f" Successfully downloaded: {result['succeeded']}")
print(f" Already cached: {result['skipped']}")
print(f" Failed: {result['failed']}")


# Example 4: Understanding rejections
# ===================================
# See why some OSM elements were rejected during processing

from powerplantmatching.osm import OverpassAPIClient, RejectionTracker, Units, Workflow

config = get_config()["OSM"]
config["missing_name_allowed"] = False # Strict: require names

# Process with rejection tracking
rejection_tracker = RejectionTracker()
units = Units()

with OverpassAPIClient(cache_dir=None) as client: # Uses config value
workflow = Workflow(client, rejection_tracker, units, config)
workflow.process_country_data("Malta") # Use Malta instead of Kenya

# Analyze rejections
print("\n=== Data Quality Report for Malta ===")
print(f"Valid power plants: {len(units)}")
print(f"Rejected elements: {rejection_tracker.get_total_count()}")

if rejection_tracker.get_total_count() > 0:
print("\nTop rejection reasons:")
for reason, count in list(rejection_tracker.get_summary().items())[:3]:
print(f" {reason}: {count}")

# Save detailed rejection report
import os

os.makedirs("output", exist_ok=True)
rejection_tracker.generate_report().to_csv(
"output/malta_rejections.csv", index=False
)
print("\nDetailed rejection report saved to output/malta_rejections.csv")


# Example 5: Force refresh specific countries
# ==========================================
# Update cache for countries with significant changes

# This would re-download even if cached
# result = populate_cache(
# countries=["South Africa"],
# force_refresh=True, # Force new download
# show_progress=True
# )
Loading