Skip to content

Conversation

@aminghadersohi
Copy link
Contributor

SUMMARY

This PR completes the MCP service implementation by adding instance-level metadata and AI guidance features. It introduces a comprehensive instance information tool, system prompts for user onboarding, and resource templates for chart configurations.

New Features Added:

Tools (1):

  • get_superset_instance_info: Comprehensive instance statistics including dashboards, charts, datasets, activity metrics, database breakdowns, and popular content

Prompts (2):

  • superset_quickstart: Interactive onboarding guide tailored by user type (analyst, executive, developer) and focus area
  • create_chart_guided: AI-powered chart creation workflow that guides users through business context and visualization selection

Resources (2):

  • chart_configs: Example chart configuration templates for common visualization types (bar, line, pie, scatter, etc.)
  • instance_metadata: Real-time instance metadata and statistics accessible without tool calls

Infrastructure Added:

  • InstanceInfoCore class (173 lines in mcp_core.py):

    • Configurable base class for comprehensive instance reporting
    • Custom metric calculators via dependency injection
    • Flexible breakdown calculations for dashboards, databases, and popular content
    • Extensible design for adding new metrics without modifying core logic
  • System schemas (85 lines added to system/schemas.py):

    • GetSupersetInstanceInfoRequest - Request schema with optional metric filters
    • InstanceInfo - Complete instance information response
    • InstanceSummary - High-level counts and totals
    • RecentActivity - User activity metrics (last 7/30 days)
    • DashboardBreakdown - Dashboard categorization (published vs draft)
    • DatabaseBreakdown - Database usage statistics
    • PopularContent - Most viewed/edited content

Statistics:

  • Files Changed: 16 (16 new, 2 modified)
  • Lines Added: ~1,461
    • get_superset_instance_info.py: 268 lines
    • InstanceInfoCore in mcp_core.py: 173 lines
    • Prompts: 289 lines (quickstart: 94, create_chart_guided: 195)
    • Resources: 469 lines (chart_configs: 362, instance_metadata: 107)
    • Schemas: 85 lines (7 new schema classes)
    • Placeholder init.py files: 126 lines (dashboard/dataset prompts/resources)
  • All pre-commit hooks passing

Builds on:

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A - This is a backend API enhancement for the MCP service.

TESTING INSTRUCTIONS

Prerequisites

  1. Ensure you have a running Superset instance with the MCP service enabled
  2. Ensure you have the development environment set up (Python venv activated)
  3. Have some dashboards, charts, and datasets created in your Superset instance for meaningful statistics

Manual Testing with MCP Client

If you have an MCP client configured (Claude Desktop, Cline, etc.), you can test the new features:

1. Test Instance Info Tool

# Via MCP client: Request instance information
Use the get_superset_instance_info tool

# Expected response includes:
# - Summary counts (total dashboards, charts, datasets, databases)
# - Recent activity (user counts, chart views, edits)
# - Dashboard breakdown (published vs draft)
# - Database breakdown (usage by database)
# - Popular content (most viewed/edited items)

2. Test Prompts

# Via MCP client: Access the quickstart prompt
Request the "superset_quickstart" prompt with:
  user_type: "analyst"
  focus_area: "sales"

# Expected: Personalized onboarding guide with:
# - Welcome message tailored to analysts
# - Sales-focused example queries
# - Step-by-step visualization creation
# - Dashboard building guidance

# Try the chart creation prompt
Request the "create_chart_guided" prompt

# Expected: Interactive chart creation guide with:
# - Business context questions
# - Available dataset suggestions
# - Chart type recommendations
# - Configuration examples

3. Test Resources

# Via MCP client: Access chart configuration examples
Request the "chart_configs" resource

# Expected: Template configurations for:
# - Bar charts
# - Line charts
# - Pie charts
# - Scatter plots
# - Time series
# And more...

# Request instance metadata
Request the "instance_metadata" resource

# Expected: Current instance stats including:
# - Available databases
# - Dataset count
# - Chart type distribution
# - User activity

Pre-Commit Validation

# Stage all files
git add -A

# Run pre-commit hooks on changed files
pre-commit run --files \
  superset/mcp_service/system/tool/get_superset_instance_info.py \
  superset/mcp_service/system/prompts/quickstart.py \
  superset/mcp_service/chart/prompts/create_chart_guided.py \
  superset/mcp_service/chart/resources/chart_configs.py \
  superset/mcp_service/system/resources/instance_metadata.py \
  superset/mcp_service/mcp_core.py \
  superset/mcp_service/system/schemas.py

# Expected: All hooks pass (mypy, ruff, pylint, etc.)

Verification Checklist

  • Instance info tool returns valid statistics
  • Quickstart prompt generates personalized guidance
  • Create chart prompt provides helpful workflow
  • Chart configs resource shows template examples
  • Instance metadata resource shows current stats
  • All schemas validate correctly
  • InstanceInfoCore class allows custom metric calculators
  • Pre-commit hooks pass

ADDITIONAL INFORMATION

  • Has associated issue: No
  • Required feature flags: No
  • Changes UI: No
  • Includes DB Migration: No
  • Introduces new feature or API: Yes (1 tool, 2 prompts, 2 resources)
  • Removes existing feature or API: No

Completes MCP Service Implementation:

This PR represents the final phase of the core MCP service implementation, adding:

  1. Instance-level introspection - AI agents can understand what data and content is available
  2. User guidance - Prompts help AI provide better user experiences
  3. Reference templates - Resources provide examples for common use cases

Files Changed Summary:

New Tool (1 file):

  • superset/mcp_service/system/tool/get_superset_instance_info.py (268 lines)

New Prompts (3 files):

  • superset/mcp_service/system/prompts/init.py (21 lines)
  • superset/mcp_service/system/prompts/quickstart.py (94 lines)
  • superset/mcp_service/chart/prompts/init.py (21 lines)
  • superset/mcp_service/chart/prompts/create_chart_guided.py (195 lines)

New Resources (3 files):

  • superset/mcp_service/system/resources/init.py (21 lines)
  • superset/mcp_service/system/resources/instance_metadata.py (107 lines)
  • superset/mcp_service/chart/resources/init.py (21 lines)
  • superset/mcp_service/chart/resources/chart_configs.py (362 lines)

Placeholder Files (4 files):

  • superset/mcp_service/dashboard/prompts/init.py (21 lines)
  • superset/mcp_service/dashboard/resources/init.py (21 lines)
  • superset/mcp_service/dataset/prompts/init.py (21 lines)
  • superset/mcp_service/dataset/resources/init.py (21 lines)

Updated Files (2 files):

  • superset/mcp_service/mcp_core.py (+173 lines for InstanceInfoCore)
  • superset/mcp_service/system/schemas.py (+85 lines, 7 new schemas)
  • superset/mcp_service/system/tool/init.py (export new tool)

Key Features:

  1. get_superset_instance_info:

    • Comprehensive instance statistics
    • Dashboard breakdown (published vs draft)
    • Database usage metrics
    • Recent activity tracking (7 and 30 days)
    • Popular content identification
    • Configurable via InstanceInfoCore base class
  2. superset_quickstart prompt:

    • Personalized by user type (analyst, executive, developer)
    • Focused by area of interest (sales, marketing, operations, general)
    • Interactive onboarding workflow
    • Dataset discovery guidance
    • First visualization creation
    • Dashboard building basics
  3. create_chart_guided prompt:

    • Business context gathering
    • Dataset recommendation
    • Chart type suggestion based on data and goals
    • Configuration examples
    • Best practices guidance
  4. chart_configs resource:

    • Template configurations for common chart types
    • Example form_data structures
    • Visualization-specific options
    • Quick reference for AI agents
  5. instance_metadata resource:

    • Real-time instance statistics
    • Available database connections
    • Dataset summaries
    • User activity metrics
    • No tool call required (static resource)

Architecture Highlights:

The InstanceInfoCore class uses dependency injection for metric calculation, allowing custom implementations:

# Default implementation
instance_info_core = InstanceInfoCore()
info = instance_info_core.get_instance_info()

# Custom metric calculators
custom_core = InstanceInfoCore(
    dashboard_calculator=my_custom_dashboard_metrics,
    database_calculator=my_custom_db_metrics
)

This design enables:

  • Easy testing with mock calculators
  • Custom metrics for different deployment scenarios
  • Extension without modifying core logic
  • Separation of concerns between data access and formatting

Pattern Compliance:

This PR continues the MINIMAL cherry-pick pattern established in previous PRs:

  • ✅ Copied ONLY the required tool, prompt, and resource files
  • ✅ NO development utilities or extra infrastructure
  • ✅ Updated exports in __init__.py files
  • ✅ Added comprehensive schemas for type safety
  • ✅ All pre-commit hooks passing
  • ✅ Clean separation of concerns

Next Steps:

The MCP service is now feature-complete for the initial release. Future enhancements may include:

  • Additional prompts for dashboard and dataset modules
  • Custom resources for database-specific examples
  • Advanced instance metrics (query performance, cache hit rates)
  • Multi-tenant instance information
  • Prompt testing framework for validating LLM guidance quality

@korbit-ai
Copy link

korbit-ai bot commented Oct 25, 2025

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

@bito-code-review
Copy link
Contributor

bito-code-review bot commented Oct 25, 2025

Bito Automatic Review Skipped - Large PR

Bito didn't auto-review this change because the pull request exceeded the line limit. No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.

@dosubot dosubot bot added the api Related to the REST API label Oct 25, 2025
@aminghadersohi aminghadersohi force-pushed the feat/mcp_service_pr6_misc_tools_prompts_resources branch from 24db572 to 2deb19a Compare October 25, 2025 00:36
@github-actions github-actions bot removed the api Related to the REST API label Oct 25, 2025
except Exception as e:
from flask import jsonify

return jsonify({"error": str(e), "status": "error"}), 500

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.
@codecov
Copy link

codecov bot commented Oct 25, 2025

Codecov Report

❌ Patch coverage is 0% with 1398 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.96%. Comparing base (6e27bee) to head (b08de55).

Files with missing lines Patch % Lines
superset/mcp_service/middleware.py 0.00% 269 Missing ⚠️
superset/mcp_service/screenshot/webdriver_pool.py 0.00% 218 Missing ⚠️
...perset/mcp_service/screenshot/pooled_screenshot.py 0.00% 172 Missing ⚠️
superset/mcp_service/utils/retry_utils.py 0.00% 113 Missing ⚠️
superset/mcp_service/utils/error_builder.py 0.00% 101 Missing ⚠️
superset/mcp_service/sql_lab/sql_lab_utils.py 0.00% 86 Missing ⚠️
superset/mcp_service/mcp_core.py 0.00% 84 Missing ⚠️
superset/mcp_service/sql_lab/execute_sql_core.py 0.00% 58 Missing ⚠️
superset/mcp_service/utils/url_utils.py 0.00% 42 Missing ⚠️
superset/mcp_service/sql_lab/schemas.py 0.00% 41 Missing ⚠️
... and 12 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #35839       +/-   ##
===========================================
+ Coverage        0   68.96%   +68.96%     
===========================================
  Files           0      621      +621     
  Lines           0    45565    +45565     
  Branches        0     4942     +4942     
===========================================
+ Hits            0    31422    +31422     
- Misses          0    12898    +12898     
- Partials        0     1245     +1245     
Flag Coverage Δ
hive 44.35% <0.00%> (?)
mysql 68.03% <0.00%> (?)
postgres 68.08% <0.00%> (?)
presto 47.90% <0.00%> (?)
python 68.92% <0.00%> (?)
sqlite 67.69% <0.00%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aminghadersohi aminghadersohi force-pushed the feat/mcp_service_pr6_misc_tools_prompts_resources branch 3 times, most recently from e2835f0 to 66daf9a Compare October 25, 2025 01:27
@github-actions github-actions bot added the github_actions Pull requests that update GitHub Actions code label Oct 25, 2025
@aminghadersohi aminghadersohi force-pushed the feat/mcp_service_pr6_misc_tools_prompts_resources branch 5 times, most recently from b3b5b0c to cbe0cdf Compare October 25, 2025 03:36
@github-actions github-actions bot removed the github_actions Pull requests that update GitHub Actions code label Oct 25, 2025
@aminghadersohi aminghadersohi force-pushed the feat/mcp_service_pr6_misc_tools_prompts_resources branch from cbe0cdf to c07c574 Compare October 25, 2025 05:14
@aminghadersohi aminghadersohi force-pushed the feat/mcp_service_pr6_misc_tools_prompts_resources branch 4 times, most recently from 33de34d to 4b10322 Compare October 29, 2025 05:15
Add 6 new MCP tools for dashboard and dataset management, expanding the
MCP service to cover Superset's core resources.

Dashboard Tools (3):
- list_dashboards: List dashboards with filtering, search, and sorting
  - Filter by: dashboard_title, published, favorite
  - Sort by: id, dashboard_title, slug, published, changed_on, created_on
  - Supports UUID and slug lookups
- get_dashboard_info: Retrieve dashboard details by ID, UUID, or slug
  - Returns charts, layout, position JSON, owners, roles
- get_dashboard_available_filters: List filterable columns and operators
  - Returns DashboardFilter metadata for query building

Dataset Tools (3):
- list_datasets: List datasets with filtering, search, and sorting
  - Filter by: table_name, schema, owner, favorite
  - Sort by: id, table_name, schema, changed_on, created_on
  - Returns columns and metrics for each dataset
- get_dataset_info: Retrieve dataset details by ID or UUID
  - Returns complete column definitions with types
  - Returns metrics with SQL expressions
- get_dataset_available_filters: List filterable columns and operators
  - Returns DatasetFilter metadata for query building

Core Infrastructure:
- ModelGetAvailableFiltersCore: Generic base class for *_available_filters tools
- Enhanced schemas with TYPE_CHECKING imports for better type safety
- Retry utilities in utils/retry_utils.py (340 lines)
- CLAUDE.md guide (431 lines) for LLM agents

Schemas (1,107 lines):
- dashboard/schemas.py: DashboardInfo, DashboardList, DashboardFilter, etc. (418 lines)
- dataset/schemas.py: DatasetInfo, DatasetList, DatasetFilter, etc. (349 lines)
- Enhanced chart/schemas.py: Added chart serialization helpers (793 lines total)
- Enhanced system/schemas.py: Expanded with more type safety (111 lines total)

Testing (1,804 lines):
- test_dashboard_tools.py: 15 tests for dashboard listing/retrieval (573 lines)
- test_dataset_tools.py: 20 tests for dataset listing/retrieval (1,231 lines)
- All 62 MCP tests passing (11 chart + 15 dashboard + 20 dataset + 16 core)

Integration:
- app.py: Added dashboard and dataset tool imports (13 new lines)
- All tools follow ModelListCore/ModelGetInfoCore patterns
- Consistent Pydantic schemas with Field descriptions
- All tools use @mcp_auth_hook for security

This PR completes Phase 2 of the MCP service rollout, providing comprehensive
read access to Superset's dashboards and datasets. Chart creation tools will
follow in the next PR.

24 files changed, 5,293 insertions(+), 20 deletions(-)

fix(mcp): use cryptographically secure random for retry jitter

Replace random.uniform() with secrets.SystemRandom().uniform() to address
CodeQL security warning about weak random number generation.

While jitter for retry delays doesn't require cryptographic randomness,
using secrets.SystemRandom() eliminates the CodeQL alert and has minimal
performance impact.

fix(mcp): replace ReDoS-vulnerable regex with safer alternatives

Replace regex patterns that could cause catastrophic backtracking with
safer alternatives to address CodeQL security warnings:

1. **Length checks first**: Add explicit length validation before any
   regex matching to bound the maximum input size

2. **Substring checks instead of complex regex**: Replace patterns like
   `r"<script[^>]*>.*?</script>"` with simple substring checks like
   `"<script" in v.lower()` - this is just as effective for detecting
   attacks but has O(n) complexity instead of potential exponential
   backtracking

3. **Simplified regex patterns**: For patterns that must use regex,
   remove backtracking opportunities:
   - `r"/\*.*?\*/"` → `r"/\*"` (just check for comment start)
   - Keep only bounded patterns with no `.*` quantifiers

**Affected validators:**
- ColumnRef.sanitize_name (column names)
- ColumnRef.sanitize_label (display labels)
- FilterConfig.sanitize_column (filter columns)
- FilterConfig.sanitize_value (filter values)
- UpdateChartRequest.sanitize_chart_name (chart names)

These changes maintain the same security protections while eliminating
ReDoS vulnerabilities. The substring approach is actually more robust
for XSS/injection detection since it catches partial tags/patterns.
Add 11 new MCP tools for chart creation/modification, data access, SQL execution,
and explore link generation. This is the largest MCP PR, adding comprehensive
write capabilities and advanced features.

Chart Creation & Modification Tools (6):
- generate_chart: Create new charts with AI-powered configuration (454 lines)
  - Supports table and XY chart types (line, bar, area, scatter)
  - 5-layer validation pipeline (schema, business logic, dataset, Superset, runtime)
  - XSS/SQL injection prevention with whitelist validation
  - Column existence validation with fuzzy match suggestions
  - Auto-generates chart names and handles duplicate labels
  - Returns chart_id, preview URLs, and explore URL

- update_chart: Update existing chart configuration (213 lines)
  - Modify saved charts by ID or UUID
  - Full validation pipeline like generate_chart
  - Updates chart metadata and regenerates previews

- update_chart_preview: Update cached preview without saving (158 lines)
  - Modifies temporary form_data from generate_chart (save_chart=False)
  - Returns new form_data_key and updated preview
  - Useful for iterative chart design without cluttering database

- get_chart_preview: Get visual preview of any chart (2,082 lines)
  - Multiple formats: url (PNG screenshot), ascii, table, vega_lite, base64
  - Screenshot generation with Selenium WebDriver pool
  - ASCII art charts for terminal-friendly output
  - Tabular data representation
  - VegaLite JSON for interactive visualizations

- get_chart_data: Get underlying chart data (649 lines)
  - Export formats: json, csv, excel
  - Pagination support with configurable limits
  - Cache control (use_cache, force_refresh, cache_timeout)
  - Returns raw data for LLM analysis without rendering

- get_chart_available_filters: List filterable chart columns (50 lines)
  - Returns available filter fields and operators
  - Helps LLMs discover what chart fields can be filtered

SQL Lab Tools (3):
- execute_sql: Execute SQL queries against databases (94 lines)
  - Security validation and timeout protection
  - DML permission enforcement
  - Support for parameterized queries
  - Configurable result limits (max 10,000 rows)
  - Returns query results in structured format

- open_sql_lab_with_context: Generate SQL Lab URL (118 lines)
  - Pre-populate SQL editor with query and context
  - Set database connection and schema
  - Provide dataset context for exploration
  - Returns URL for direct navigation

Explore & Analysis Tools (1):
- generate_explore_link: Create pre-configured explore URL (95 lines)
  - Generate Superset explore URLs with dataset/metrics/filters
  - Supports table and XY chart configurations
  - Form data caching for seamless user experience
  - Returns explore_url for interactive chart building

Chart Infrastructure (3,085 lines):
- chart_utils.py: Chart creation and update logic (484 lines)
  - Handles both saved charts and cached previews
  - Manages form_data lifecycle and expiration
  - Integrates with ChartDAO for persistence

- preview_utils.py: Preview generation from form_data (561 lines)
  - Converts configs to ASCII art, tables, VegaLite
  - Handles chart data extraction and formatting

- validation/ (5-layer pipeline, 2,040 lines):
  - schema_validator.py: Pydantic schema validation (307 lines)
  - dataset_validator.py: Column/metric existence checks (329 lines)
  - pipeline.py: Orchestrates validation layers (293 lines)
  - runtime/chart_type_suggester.py: Smart chart type recommendations (437 lines)
  - runtime/cardinality_validator.py: High-cardinality warnings (195 lines)
  - runtime/format_validator.py: Axis format validation (225 lines)

Screenshot Infrastructure (1,090 lines):
- pooled_screenshot.py: Manages WebDriver pool for chart screenshots (483 lines)
  - Concurrent screenshot generation with connection pooling
  - Automatic retry on transient failures
  - Configurable dimensions and timeouts

- webdriver_pool.py: WebDriver connection pool (433 lines)
  - Reuses Selenium connections for performance
  - Health checks and automatic recovery
  - Thread-safe pool management

- webdriver_config.py: WebDriver configuration (139 lines)
  - Chrome/Firefox support with headless mode
  - Reads from WEBDRIVER_* config variables

SQL Lab Infrastructure (589 lines):
- execute_sql_core.py: Core SQL execution logic (221 lines)
  - Query execution with error handling
  - DML permission checks
  - Result formatting and pagination

- sql_lab_utils.py: SQL Lab URL generation (243 lines)
  - Constructs SQL Lab URLs with context
  - Form data encoding for URL parameters

- schemas.py: Request/response models (109 lines)
  - ExecuteSqlRequest, ExecuteSqlResponse
  - OpenSqlLabRequest, OpenSqlLabResponse

Enhanced Middleware (740 lines, +666 lines):
- Request/response logging with timing
- Error response formatting with structured schemas
- Flask context integration for all tools
- Cache control header management
- Form data cleanup for expired entries

Enhanced Authentication (auth.py, +38 lines):
- has_dataset_access(): Dataset permission checking
- Integration with Superset's security_manager
- Used by all chart creation/modification tools

Common Utilities (738 lines):
- error_schemas.py: Structured error responses (103 lines)
  - BaseError, ValidationError, ChartError, etc.
  - Consistent error format across all tools

- cache_utils.py: Cache control helpers (143 lines)
  - CacheControl schema for unified caching
  - use_cache, force_refresh, cache_timeout flags

- error_builder.py: Error construction utilities (364 lines)
  - Builds structured error responses
  - Validation error formatting
  - Suggestion generation for typos

- url_utils.py: URL generation helpers (128 lines)
  - get_superset_base_url() for external URLs
  - Constructs chart/explore/sqllab URLs

Commands (33 lines):
- create_form_data.py: Form data creation command
  - Integrates with Superset's CreateFormDataCommand
  - Used by chart preview tools

Testing (2,554 lines, 105 new tests):
- test_generate_chart.py: 268 lines, chart creation tests
- test_update_chart.py: 385 lines, chart update tests
- test_update_chart_preview.py: 474 lines, preview update tests
- test_get_chart_preview.py: 290 lines, preview generation tests
- test_chart_utils.py: 460 lines, chart utilities tests
- test_execute_sql.py: 497 lines, SQL execution tests
- test_execute_sql_helper.py: 64 lines, SQL helper tests
- test_generate_explore_link.py: 580 lines, explore link tests

Total: 167 tests (139 passing, 28 SQL Lab tests need integration fixes)

Integration (app.py updates):
All new tools imported and auto-registered via @mcp.tool decorators:
- 6 chart tools (generate_chart, update_chart, etc.)
- 3 SQL Lab tools (execute_sql, open_sql_lab_with_context)
- 1 explore tool (generate_explore_link)

This PR completes Phase 3 of the MCP service rollout, adding full CRUD
capabilities for charts and enabling SQL-based data exploration. Dashboard
creation tools will follow in the next PR.

53 files changed, 13,318 insertions(+), 37 deletions(-)
Replace complex regex pattern with substring checks in error sanitization:
- r"<script[^>]*>.*?</script>" → "<script" in str_lower
- More efficient and avoids catastrophic backtracking
- Still catches all XSS attempts in error messages
Replace ReDoS-vulnerable regex patterns in _sanitize_validation_error():
- Remove r"SELECT\s+.*?\s+FROM" pattern (catastrophic backtracking)
- Remove r"WHERE\s+.*?(\s+ORDER|\s+GROUP|\s+LIMIT|$)" pattern

Security improvements:
- Length check FIRST to bound input size
- Substring checks for SQL fragment detection
- String slicing instead of regex for content redaction
- Maintains same security protection without backtracking

Refactoring to reduce complexity:
- Extract _redact_sql_select helper
- Extract _redact_sql_where helper
- Extract _get_generic_error_message helper
- Main function now has complexity 4 (well under limit of 10)

This completes the ReDoS fixes across all MCP service validation and
sanitization functions.
Replace unbounded regex patterns in _sanitize_error_for_logging():
- r"postgresql://[^@]+@[^/]+/" → bounded {1,100} quantifiers
- r"mysql://[^@]+@[^/]+/" → bounded {1,100} quantifiers
- r'[Aa]pi[_-]?[Kk]ey[:\s]*[^\s\'"]+' → bounded {0,5} and {1,100}
- r'[Tt]oken[:\s]*[^\s\'"]+' → bounded {0,5} and {1,100}
- r"/[a-zA-Z0-9_\-/.]+/superset/" → bounded {1,200}

Security improvements:
- Length check FIRST to bound input size
- Substring checks before applying regex (avoid unnecessary work)
- All patterns now have explicit quantifier bounds
- Prevents catastrophic backtracking on malicious input

This completes all ReDoS fixes in the MCP service.
Remove reference to deleted codeql-config.yml file.
CodeQL will now run with default configuration and detect
all security issues without suppressions.
CodeQL detected incomplete URL substring sanitization where database URLs
could be partially matched at arbitrary positions.

Changes:
- Remove substring position checks (e.g., "postgresql://" in str)
- Use word boundaries (\b) to match complete URLs only
- Add whitespace exclusions ([^@\s], [^/\s]) to prevent cross-boundary matches
- Apply case-insensitive matching with re.IGNORECASE flag
- Ensure full URL path is redacted with [^\s]{0,100} for path component

This ensures URLs are completely sanitized regardless of position in error string.
CodeQL detected incomplete URL substring sanitization where URL schemes
(javascript:, vbscript:, data:) could match at arbitrary positions.

Changes:
- Replace substring checks for URL schemes with regex word boundaries
- Use \b(javascript|vbscript|data): to match only actual URL schemes
- Keep simple substring checks for HTML tags (<script, </script>)
- Consolidate event handler check into same pattern list

This ensures we only filter actual URL schemes, not arbitrary text
containing these substrings.
CodeQL detected incomplete URL substring sanitization in multiple validators.
URL schemes (javascript:, vbscript:, data:) could match at arbitrary positions.

Fixed in 5 validators:
- ColumnRef.sanitize_name
- ColumnRef.sanitize_label
- FilterConfig.sanitize_column
- FilterConfig.sanitize_value
- UpdateChartRequest.sanitize_chart_name

Changes:
- Separate HTML tag checks (substring) from URL scheme checks (regex)
- Use \b(javascript|vbscript|data): to match only actual URL schemes
- Extract _validate_string_value helper to reduce complexity
- Prevents false positives on text containing these substrings

All validators now properly distinguish between HTML tags (safe substring
checks) and URL schemes (word boundary regex).
Add two new MCP tools for dashboard management:
- generate_dashboard: Creates dashboards with charts in grid layout
- add_chart_to_existing_dashboard: Adds charts to existing dashboards

Changes:
- New tool: generate_dashboard.py (236 lines)
- New tool: add_chart_to_existing_dashboard.py (282 lines)
- Schema additions: 4 new classes in dashboard/schemas.py (+55 lines)
- Tests: 11 comprehensive tests (450 lines, all passing)
- Updated dashboard/tool/__init__.py exports

All pre-commit hooks passing. All tests passing (11/11).

fix(mcp): apply ruff auto-fixes for import ordering

- Remove unused imports in update_chart.py
- Sort imports alphabetically in dashboard/tool/__init__.py

These fixes resolve CI ruff errors.

fix(mcp): resolve schema issues from rebase

- Remove duplicate class definitions in chart/schemas.py (ChartCapabilities, ChartSemantics, PerformanceMetadata, AccessibilityMetadata, VersionedResponse)
- Replace `Optional[T]` with modern `T | None` syntax in dashboard/schemas.py
- Fix Pydantic forward reference issue by moving GenerateChartResponse after ChartPreviewContent definition
- Address mypy type checking errors
- Fix ruff formatting issues

These issues were introduced during the PR5 rebase and prevented CI from passing.

 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

fix: apply ruff formatting and remove unused mypy ignore

- Remove extra blank line in mcp_core.py
- Add missing blank line after import in superset_config_docker_light.py
- Remove unused type: ignore comment in superset_config_docker_light.py

fix(mcp): add type ignore for CELERY_CONFIG override in docker light config

Fixes mypy errors:
- Cannot assign multiple types to name "CELERY_CONFIG" without an explicit "Type[...]" annotation
- Incompatible types in assignment (expression has type "None", variable has type "type[CeleryConfig]")

The docker light config intentionally overrides CELERY_CONFIG to None to disable Celery.
Added type: ignore comment to suppress the expected type mismatch.

revert: remove unrelated changes to docker config file

This file is not part of the MCP service work. Removing the blank line
formatting change to restore it to the base branch state.

fix(mcp): restore DAO files and add tool registration to fix failing tests

Fixed all 64 failing MCP service unit tests by:

1. **Restored accidentally deleted DAO files**:
   - superset/mcp_service/dao/base.py
   - superset/mcp_service/dao/__init__.py
   These files were accidentally deleted in commit adbdb9b

2. **Added tool imports to server.py**:
   - Added dashboard tool imports (generate_dashboard, add_chart_to_existing_dashboard, etc.)
   - Added dataset tool imports (list_datasets, get_dataset_info, etc.)
   - Added explore tool imports (generate_explore_link)
   - Added sql_lab tool imports (execute_sql, open_sql_lab_with_context)

3. **Created conftest.py for test discovery**:
   - Added tests/unit_tests/mcp_service/conftest.py
   - Imports all tools so they register with MCP instance before tests run
   - Essential for pytest to discover and run tools in test context

All 178 MCP service unit tests now pass.

refactor(mcp): remove unused DAO Protocol

The custom DAO Protocol in superset/mcp_service/dao/ is unused.
All code uses BaseDAO from superset.daos.base instead.

The Protocol was originally created as a placeholder with the comment
"To be replaced with one from superset core" and has since been replaced.

All 178 MCP service tests still pass after removal.

update importes

fix(mcp): remove tool imports from conftest to prevent test pollution

Removed tool imports from tests/unit_tests/mcp_service/conftest.py
to prevent side effects during test discovery that were causing
integration test failures.

Tools are already registered in app.py when the MCP instance is created,
so importing them again in conftest.py was unnecessary and causing
database constraint violations in unrelated integration tests.
Add comprehensive MCP features for instance metadata, user guidance, and resource templates:

**New Tools (1):**
- get_superset_instance_info: Detailed instance statistics with dashboards, charts, datasets, activity metrics, and database breakdowns

**New Prompts (2):**
- superset_quickstart: Interactive onboarding guide for new users
- create_chart_guided: AI-powered chart creation workflow with business context

**New Resources (2):**
- chart_configs: Example chart configuration templates for common visualizations
- instance_metadata: Real-time instance metadata and statistics

**Infrastructure Added:**
- InstanceInfoCore: Configurable base class for comprehensive instance reporting with custom metric calculators
- Added 7 new schemas to system/schemas.py: InstanceInfo, InstanceSummary, RecentActivity, DashboardBreakdown, DatabaseBreakdown, PopularContent, GetSupersetInstanceInfoRequest

Changes:
- Added get_superset_instance_info.py tool (268 lines) with detailed breakdown calculations
- Added InstanceInfoCore class to mcp_core.py (173 lines)
- Added quickstart.py prompt (94 lines) for user onboarding
- Added create_chart_guided.py prompt (195 lines) for AI-assisted chart creation
- Added chart_configs.py resource (362 lines) with example configurations
- Added instance_metadata.py resource (107 lines) for real-time instance stats
- Updated system/tool/__init__.py to export new instance info tool
- Added placeholder __init__.py files for dashboard/dataset prompts and resources
- Added 7 new Pydantic schemas for instance information responses

Statistics:
- 17 files changed (16 new, 2 modified)
- ~1,200 lines of production code added
- All pre-commit hooks passing

fix(mcp): convert timestamp to ISO string in InstanceInfo schema

Fixed Pydantic validation error where timestamp field was receiving a datetime object instead of the expected string type. The InstanceInfo schema requires timestamp as a string, so now we convert it using .isoformat().

fix(mcp): use BaseDAO instead of DAO Protocol in instance_metadata

Replace DAO Protocol casts with BaseDAO[Any] to match InstanceInfoCore type signature.
This fixes mypy type checking errors.

feat(mcp): register prompts and resources in app.py and document in CLAUDE.md

Add imports for chart and system prompts/resources to ensure they are registered
with the MCP instance on startup. Update CLAUDE.md with comprehensive instructions
for adding new prompts and resources.

Changes to app.py:
- Import chart.prompts module to register create_chart_guided prompt
- Import chart.resources module to register chart_configs resource
- Import system.prompts module to register superset_quickstart prompt
- Import system.resources module to register instance_metadata resource
- Add clear comment explaining prompt/resource registration pattern

Changes to CLAUDE.md:
- Update section title to include prompts and resources
- Add "How to Add a New Prompt" section with complete workflow
- Add "How to Add a New Resource" section with complete workflow
- Add Quick Checklist for New Prompts
- Add Quick Checklist for New Resources
- Update line number references (210-242 for tools, 244-253 for prompts/resources)
- Explain two-level import pattern (module __init__.py + app.py)

This ensures all prompts and resources are available to MCP clients and provides
clear documentation for future contributors.
@aminghadersohi aminghadersohi force-pushed the feat/mcp_service_pr6_misc_tools_prompts_resources branch from 4b10322 to b08de55 Compare October 31, 2025 00:30
@aminghadersohi
Copy link
Contributor Author

closing as this stack is too hard to maintain. i am just gonna keep the last one. PR10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant