-
Notifications
You must be signed in to change notification settings - Fork 16.1k
feat(mcp): PR6 - add instance info tool, prompts, and resources #35839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mcp): PR6 - add instance info tool, prompts, and resources #35839
Conversation
|
Bito Automatic Review Skipped - Large PR |
24db572 to
2deb19a
Compare
| except Exception as e: | ||
| from flask import jsonify | ||
|
|
||
| return jsonify({"error": str(e), "status": "error"}), 500 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #35839 +/- ##
===========================================
+ Coverage 0 68.96% +68.96%
===========================================
Files 0 621 +621
Lines 0 45565 +45565
Branches 0 4942 +4942
===========================================
+ Hits 0 31422 +31422
- Misses 0 12898 +12898
- Partials 0 1245 +1245
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e2835f0 to
66daf9a
Compare
b3b5b0c to
cbe0cdf
Compare
cbe0cdf to
c07c574
Compare
33de34d to
4b10322
Compare
Add 6 new MCP tools for dashboard and dataset management, expanding the MCP service to cover Superset's core resources. Dashboard Tools (3): - list_dashboards: List dashboards with filtering, search, and sorting - Filter by: dashboard_title, published, favorite - Sort by: id, dashboard_title, slug, published, changed_on, created_on - Supports UUID and slug lookups - get_dashboard_info: Retrieve dashboard details by ID, UUID, or slug - Returns charts, layout, position JSON, owners, roles - get_dashboard_available_filters: List filterable columns and operators - Returns DashboardFilter metadata for query building Dataset Tools (3): - list_datasets: List datasets with filtering, search, and sorting - Filter by: table_name, schema, owner, favorite - Sort by: id, table_name, schema, changed_on, created_on - Returns columns and metrics for each dataset - get_dataset_info: Retrieve dataset details by ID or UUID - Returns complete column definitions with types - Returns metrics with SQL expressions - get_dataset_available_filters: List filterable columns and operators - Returns DatasetFilter metadata for query building Core Infrastructure: - ModelGetAvailableFiltersCore: Generic base class for *_available_filters tools - Enhanced schemas with TYPE_CHECKING imports for better type safety - Retry utilities in utils/retry_utils.py (340 lines) - CLAUDE.md guide (431 lines) for LLM agents Schemas (1,107 lines): - dashboard/schemas.py: DashboardInfo, DashboardList, DashboardFilter, etc. (418 lines) - dataset/schemas.py: DatasetInfo, DatasetList, DatasetFilter, etc. (349 lines) - Enhanced chart/schemas.py: Added chart serialization helpers (793 lines total) - Enhanced system/schemas.py: Expanded with more type safety (111 lines total) Testing (1,804 lines): - test_dashboard_tools.py: 15 tests for dashboard listing/retrieval (573 lines) - test_dataset_tools.py: 20 tests for dataset listing/retrieval (1,231 lines) - All 62 MCP tests passing (11 chart + 15 dashboard + 20 dataset + 16 core) Integration: - app.py: Added dashboard and dataset tool imports (13 new lines) - All tools follow ModelListCore/ModelGetInfoCore patterns - Consistent Pydantic schemas with Field descriptions - All tools use @mcp_auth_hook for security This PR completes Phase 2 of the MCP service rollout, providing comprehensive read access to Superset's dashboards and datasets. Chart creation tools will follow in the next PR. 24 files changed, 5,293 insertions(+), 20 deletions(-) fix(mcp): use cryptographically secure random for retry jitter Replace random.uniform() with secrets.SystemRandom().uniform() to address CodeQL security warning about weak random number generation. While jitter for retry delays doesn't require cryptographic randomness, using secrets.SystemRandom() eliminates the CodeQL alert and has minimal performance impact. fix(mcp): replace ReDoS-vulnerable regex with safer alternatives Replace regex patterns that could cause catastrophic backtracking with safer alternatives to address CodeQL security warnings: 1. **Length checks first**: Add explicit length validation before any regex matching to bound the maximum input size 2. **Substring checks instead of complex regex**: Replace patterns like `r"<script[^>]*>.*?</script>"` with simple substring checks like `"<script" in v.lower()` - this is just as effective for detecting attacks but has O(n) complexity instead of potential exponential backtracking 3. **Simplified regex patterns**: For patterns that must use regex, remove backtracking opportunities: - `r"/\*.*?\*/"` → `r"/\*"` (just check for comment start) - Keep only bounded patterns with no `.*` quantifiers **Affected validators:** - ColumnRef.sanitize_name (column names) - ColumnRef.sanitize_label (display labels) - FilterConfig.sanitize_column (filter columns) - FilterConfig.sanitize_value (filter values) - UpdateChartRequest.sanitize_chart_name (chart names) These changes maintain the same security protections while eliminating ReDoS vulnerabilities. The substring approach is actually more robust for XSS/injection detection since it catches partial tags/patterns.
Add 11 new MCP tools for chart creation/modification, data access, SQL execution, and explore link generation. This is the largest MCP PR, adding comprehensive write capabilities and advanced features. Chart Creation & Modification Tools (6): - generate_chart: Create new charts with AI-powered configuration (454 lines) - Supports table and XY chart types (line, bar, area, scatter) - 5-layer validation pipeline (schema, business logic, dataset, Superset, runtime) - XSS/SQL injection prevention with whitelist validation - Column existence validation with fuzzy match suggestions - Auto-generates chart names and handles duplicate labels - Returns chart_id, preview URLs, and explore URL - update_chart: Update existing chart configuration (213 lines) - Modify saved charts by ID or UUID - Full validation pipeline like generate_chart - Updates chart metadata and regenerates previews - update_chart_preview: Update cached preview without saving (158 lines) - Modifies temporary form_data from generate_chart (save_chart=False) - Returns new form_data_key and updated preview - Useful for iterative chart design without cluttering database - get_chart_preview: Get visual preview of any chart (2,082 lines) - Multiple formats: url (PNG screenshot), ascii, table, vega_lite, base64 - Screenshot generation with Selenium WebDriver pool - ASCII art charts for terminal-friendly output - Tabular data representation - VegaLite JSON for interactive visualizations - get_chart_data: Get underlying chart data (649 lines) - Export formats: json, csv, excel - Pagination support with configurable limits - Cache control (use_cache, force_refresh, cache_timeout) - Returns raw data for LLM analysis without rendering - get_chart_available_filters: List filterable chart columns (50 lines) - Returns available filter fields and operators - Helps LLMs discover what chart fields can be filtered SQL Lab Tools (3): - execute_sql: Execute SQL queries against databases (94 lines) - Security validation and timeout protection - DML permission enforcement - Support for parameterized queries - Configurable result limits (max 10,000 rows) - Returns query results in structured format - open_sql_lab_with_context: Generate SQL Lab URL (118 lines) - Pre-populate SQL editor with query and context - Set database connection and schema - Provide dataset context for exploration - Returns URL for direct navigation Explore & Analysis Tools (1): - generate_explore_link: Create pre-configured explore URL (95 lines) - Generate Superset explore URLs with dataset/metrics/filters - Supports table and XY chart configurations - Form data caching for seamless user experience - Returns explore_url for interactive chart building Chart Infrastructure (3,085 lines): - chart_utils.py: Chart creation and update logic (484 lines) - Handles both saved charts and cached previews - Manages form_data lifecycle and expiration - Integrates with ChartDAO for persistence - preview_utils.py: Preview generation from form_data (561 lines) - Converts configs to ASCII art, tables, VegaLite - Handles chart data extraction and formatting - validation/ (5-layer pipeline, 2,040 lines): - schema_validator.py: Pydantic schema validation (307 lines) - dataset_validator.py: Column/metric existence checks (329 lines) - pipeline.py: Orchestrates validation layers (293 lines) - runtime/chart_type_suggester.py: Smart chart type recommendations (437 lines) - runtime/cardinality_validator.py: High-cardinality warnings (195 lines) - runtime/format_validator.py: Axis format validation (225 lines) Screenshot Infrastructure (1,090 lines): - pooled_screenshot.py: Manages WebDriver pool for chart screenshots (483 lines) - Concurrent screenshot generation with connection pooling - Automatic retry on transient failures - Configurable dimensions and timeouts - webdriver_pool.py: WebDriver connection pool (433 lines) - Reuses Selenium connections for performance - Health checks and automatic recovery - Thread-safe pool management - webdriver_config.py: WebDriver configuration (139 lines) - Chrome/Firefox support with headless mode - Reads from WEBDRIVER_* config variables SQL Lab Infrastructure (589 lines): - execute_sql_core.py: Core SQL execution logic (221 lines) - Query execution with error handling - DML permission checks - Result formatting and pagination - sql_lab_utils.py: SQL Lab URL generation (243 lines) - Constructs SQL Lab URLs with context - Form data encoding for URL parameters - schemas.py: Request/response models (109 lines) - ExecuteSqlRequest, ExecuteSqlResponse - OpenSqlLabRequest, OpenSqlLabResponse Enhanced Middleware (740 lines, +666 lines): - Request/response logging with timing - Error response formatting with structured schemas - Flask context integration for all tools - Cache control header management - Form data cleanup for expired entries Enhanced Authentication (auth.py, +38 lines): - has_dataset_access(): Dataset permission checking - Integration with Superset's security_manager - Used by all chart creation/modification tools Common Utilities (738 lines): - error_schemas.py: Structured error responses (103 lines) - BaseError, ValidationError, ChartError, etc. - Consistent error format across all tools - cache_utils.py: Cache control helpers (143 lines) - CacheControl schema for unified caching - use_cache, force_refresh, cache_timeout flags - error_builder.py: Error construction utilities (364 lines) - Builds structured error responses - Validation error formatting - Suggestion generation for typos - url_utils.py: URL generation helpers (128 lines) - get_superset_base_url() for external URLs - Constructs chart/explore/sqllab URLs Commands (33 lines): - create_form_data.py: Form data creation command - Integrates with Superset's CreateFormDataCommand - Used by chart preview tools Testing (2,554 lines, 105 new tests): - test_generate_chart.py: 268 lines, chart creation tests - test_update_chart.py: 385 lines, chart update tests - test_update_chart_preview.py: 474 lines, preview update tests - test_get_chart_preview.py: 290 lines, preview generation tests - test_chart_utils.py: 460 lines, chart utilities tests - test_execute_sql.py: 497 lines, SQL execution tests - test_execute_sql_helper.py: 64 lines, SQL helper tests - test_generate_explore_link.py: 580 lines, explore link tests Total: 167 tests (139 passing, 28 SQL Lab tests need integration fixes) Integration (app.py updates): All new tools imported and auto-registered via @mcp.tool decorators: - 6 chart tools (generate_chart, update_chart, etc.) - 3 SQL Lab tools (execute_sql, open_sql_lab_with_context) - 1 explore tool (generate_explore_link) This PR completes Phase 3 of the MCP service rollout, adding full CRUD capabilities for charts and enabling SQL-based data exploration. Dashboard creation tools will follow in the next PR. 53 files changed, 13,318 insertions(+), 37 deletions(-)
Replace complex regex pattern with substring checks in error sanitization: - r"<script[^>]*>.*?</script>" → "<script" in str_lower - More efficient and avoids catastrophic backtracking - Still catches all XSS attempts in error messages
Replace ReDoS-vulnerable regex patterns in _sanitize_validation_error(): - Remove r"SELECT\s+.*?\s+FROM" pattern (catastrophic backtracking) - Remove r"WHERE\s+.*?(\s+ORDER|\s+GROUP|\s+LIMIT|$)" pattern Security improvements: - Length check FIRST to bound input size - Substring checks for SQL fragment detection - String slicing instead of regex for content redaction - Maintains same security protection without backtracking Refactoring to reduce complexity: - Extract _redact_sql_select helper - Extract _redact_sql_where helper - Extract _get_generic_error_message helper - Main function now has complexity 4 (well under limit of 10) This completes the ReDoS fixes across all MCP service validation and sanitization functions.
Replace unbounded regex patterns in _sanitize_error_for_logging():
- r"postgresql://[^@]+@[^/]+/" → bounded {1,100} quantifiers
- r"mysql://[^@]+@[^/]+/" → bounded {1,100} quantifiers
- r'[Aa]pi[_-]?[Kk]ey[:\s]*[^\s\'"]+' → bounded {0,5} and {1,100}
- r'[Tt]oken[:\s]*[^\s\'"]+' → bounded {0,5} and {1,100}
- r"/[a-zA-Z0-9_\-/.]+/superset/" → bounded {1,200}
Security improvements:
- Length check FIRST to bound input size
- Substring checks before applying regex (avoid unnecessary work)
- All patterns now have explicit quantifier bounds
- Prevents catastrophic backtracking on malicious input
This completes all ReDoS fixes in the MCP service.
Remove reference to deleted codeql-config.yml file. CodeQL will now run with default configuration and detect all security issues without suppressions.
CodeQL detected incomplete URL substring sanitization where database URLs
could be partially matched at arbitrary positions.
Changes:
- Remove substring position checks (e.g., "postgresql://" in str)
- Use word boundaries (\b) to match complete URLs only
- Add whitespace exclusions ([^@\s], [^/\s]) to prevent cross-boundary matches
- Apply case-insensitive matching with re.IGNORECASE flag
- Ensure full URL path is redacted with [^\s]{0,100} for path component
This ensures URLs are completely sanitized regardless of position in error string.
CodeQL detected incomplete URL substring sanitization where URL schemes (javascript:, vbscript:, data:) could match at arbitrary positions. Changes: - Replace substring checks for URL schemes with regex word boundaries - Use \b(javascript|vbscript|data): to match only actual URL schemes - Keep simple substring checks for HTML tags (<script, </script>) - Consolidate event handler check into same pattern list This ensures we only filter actual URL schemes, not arbitrary text containing these substrings.
CodeQL detected incomplete URL substring sanitization in multiple validators. URL schemes (javascript:, vbscript:, data:) could match at arbitrary positions. Fixed in 5 validators: - ColumnRef.sanitize_name - ColumnRef.sanitize_label - FilterConfig.sanitize_column - FilterConfig.sanitize_value - UpdateChartRequest.sanitize_chart_name Changes: - Separate HTML tag checks (substring) from URL scheme checks (regex) - Use \b(javascript|vbscript|data): to match only actual URL schemes - Extract _validate_string_value helper to reduce complexity - Prevents false positives on text containing these substrings All validators now properly distinguish between HTML tags (safe substring checks) and URL schemes (word boundary regex).
Add two new MCP tools for dashboard management: - generate_dashboard: Creates dashboards with charts in grid layout - add_chart_to_existing_dashboard: Adds charts to existing dashboards Changes: - New tool: generate_dashboard.py (236 lines) - New tool: add_chart_to_existing_dashboard.py (282 lines) - Schema additions: 4 new classes in dashboard/schemas.py (+55 lines) - Tests: 11 comprehensive tests (450 lines, all passing) - Updated dashboard/tool/__init__.py exports All pre-commit hooks passing. All tests passing (11/11). fix(mcp): apply ruff auto-fixes for import ordering - Remove unused imports in update_chart.py - Sort imports alphabetically in dashboard/tool/__init__.py These fixes resolve CI ruff errors. fix(mcp): resolve schema issues from rebase - Remove duplicate class definitions in chart/schemas.py (ChartCapabilities, ChartSemantics, PerformanceMetadata, AccessibilityMetadata, VersionedResponse) - Replace `Optional[T]` with modern `T | None` syntax in dashboard/schemas.py - Fix Pydantic forward reference issue by moving GenerateChartResponse after ChartPreviewContent definition - Address mypy type checking errors - Fix ruff formatting issues These issues were introduced during the PR5 rebase and prevented CI from passing. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> fix: apply ruff formatting and remove unused mypy ignore - Remove extra blank line in mcp_core.py - Add missing blank line after import in superset_config_docker_light.py - Remove unused type: ignore comment in superset_config_docker_light.py fix(mcp): add type ignore for CELERY_CONFIG override in docker light config Fixes mypy errors: - Cannot assign multiple types to name "CELERY_CONFIG" without an explicit "Type[...]" annotation - Incompatible types in assignment (expression has type "None", variable has type "type[CeleryConfig]") The docker light config intentionally overrides CELERY_CONFIG to None to disable Celery. Added type: ignore comment to suppress the expected type mismatch. revert: remove unrelated changes to docker config file This file is not part of the MCP service work. Removing the blank line formatting change to restore it to the base branch state. fix(mcp): restore DAO files and add tool registration to fix failing tests Fixed all 64 failing MCP service unit tests by: 1. **Restored accidentally deleted DAO files**: - superset/mcp_service/dao/base.py - superset/mcp_service/dao/__init__.py These files were accidentally deleted in commit adbdb9b 2. **Added tool imports to server.py**: - Added dashboard tool imports (generate_dashboard, add_chart_to_existing_dashboard, etc.) - Added dataset tool imports (list_datasets, get_dataset_info, etc.) - Added explore tool imports (generate_explore_link) - Added sql_lab tool imports (execute_sql, open_sql_lab_with_context) 3. **Created conftest.py for test discovery**: - Added tests/unit_tests/mcp_service/conftest.py - Imports all tools so they register with MCP instance before tests run - Essential for pytest to discover and run tools in test context All 178 MCP service unit tests now pass. refactor(mcp): remove unused DAO Protocol The custom DAO Protocol in superset/mcp_service/dao/ is unused. All code uses BaseDAO from superset.daos.base instead. The Protocol was originally created as a placeholder with the comment "To be replaced with one from superset core" and has since been replaced. All 178 MCP service tests still pass after removal. update importes fix(mcp): remove tool imports from conftest to prevent test pollution Removed tool imports from tests/unit_tests/mcp_service/conftest.py to prevent side effects during test discovery that were causing integration test failures. Tools are already registered in app.py when the MCP instance is created, so importing them again in conftest.py was unnecessary and causing database constraint violations in unrelated integration tests.
Add comprehensive MCP features for instance metadata, user guidance, and resource templates: **New Tools (1):** - get_superset_instance_info: Detailed instance statistics with dashboards, charts, datasets, activity metrics, and database breakdowns **New Prompts (2):** - superset_quickstart: Interactive onboarding guide for new users - create_chart_guided: AI-powered chart creation workflow with business context **New Resources (2):** - chart_configs: Example chart configuration templates for common visualizations - instance_metadata: Real-time instance metadata and statistics **Infrastructure Added:** - InstanceInfoCore: Configurable base class for comprehensive instance reporting with custom metric calculators - Added 7 new schemas to system/schemas.py: InstanceInfo, InstanceSummary, RecentActivity, DashboardBreakdown, DatabaseBreakdown, PopularContent, GetSupersetInstanceInfoRequest Changes: - Added get_superset_instance_info.py tool (268 lines) with detailed breakdown calculations - Added InstanceInfoCore class to mcp_core.py (173 lines) - Added quickstart.py prompt (94 lines) for user onboarding - Added create_chart_guided.py prompt (195 lines) for AI-assisted chart creation - Added chart_configs.py resource (362 lines) with example configurations - Added instance_metadata.py resource (107 lines) for real-time instance stats - Updated system/tool/__init__.py to export new instance info tool - Added placeholder __init__.py files for dashboard/dataset prompts and resources - Added 7 new Pydantic schemas for instance information responses Statistics: - 17 files changed (16 new, 2 modified) - ~1,200 lines of production code added - All pre-commit hooks passing fix(mcp): convert timestamp to ISO string in InstanceInfo schema Fixed Pydantic validation error where timestamp field was receiving a datetime object instead of the expected string type. The InstanceInfo schema requires timestamp as a string, so now we convert it using .isoformat(). fix(mcp): use BaseDAO instead of DAO Protocol in instance_metadata Replace DAO Protocol casts with BaseDAO[Any] to match InstanceInfoCore type signature. This fixes mypy type checking errors. feat(mcp): register prompts and resources in app.py and document in CLAUDE.md Add imports for chart and system prompts/resources to ensure they are registered with the MCP instance on startup. Update CLAUDE.md with comprehensive instructions for adding new prompts and resources. Changes to app.py: - Import chart.prompts module to register create_chart_guided prompt - Import chart.resources module to register chart_configs resource - Import system.prompts module to register superset_quickstart prompt - Import system.resources module to register instance_metadata resource - Add clear comment explaining prompt/resource registration pattern Changes to CLAUDE.md: - Update section title to include prompts and resources - Add "How to Add a New Prompt" section with complete workflow - Add "How to Add a New Resource" section with complete workflow - Add Quick Checklist for New Prompts - Add Quick Checklist for New Resources - Update line number references (210-242 for tools, 244-253 for prompts/resources) - Explain two-level import pattern (module __init__.py + app.py) This ensures all prompts and resources are available to MCP clients and provides clear documentation for future contributors.
4b10322 to
b08de55
Compare
|
closing as this stack is too hard to maintain. i am just gonna keep the last one. PR10 |
SUMMARY
This PR completes the MCP service implementation by adding instance-level metadata and AI guidance features. It introduces a comprehensive instance information tool, system prompts for user onboarding, and resource templates for chart configurations.
New Features Added:
Tools (1):
Prompts (2):
Resources (2):
Infrastructure Added:
InstanceInfoCore class (173 lines in mcp_core.py):
System schemas (85 lines added to system/schemas.py):
GetSupersetInstanceInfoRequest- Request schema with optional metric filtersInstanceInfo- Complete instance information responseInstanceSummary- High-level counts and totalsRecentActivity- User activity metrics (last 7/30 days)DashboardBreakdown- Dashboard categorization (published vs draft)DatabaseBreakdown- Database usage statisticsPopularContent- Most viewed/edited contentStatistics:
Builds on:
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - This is a backend API enhancement for the MCP service.
TESTING INSTRUCTIONS
Prerequisites
Manual Testing with MCP Client
If you have an MCP client configured (Claude Desktop, Cline, etc.), you can test the new features:
1. Test Instance Info Tool
2. Test Prompts
3. Test Resources
Pre-Commit Validation
Verification Checklist
ADDITIONAL INFORMATION
Completes MCP Service Implementation:
This PR represents the final phase of the core MCP service implementation, adding:
Files Changed Summary:
New Tool (1 file):
New Prompts (3 files):
New Resources (3 files):
Placeholder Files (4 files):
Updated Files (2 files):
Key Features:
get_superset_instance_info:
superset_quickstart prompt:
create_chart_guided prompt:
chart_configs resource:
instance_metadata resource:
Architecture Highlights:
The InstanceInfoCore class uses dependency injection for metric calculation, allowing custom implementations:
This design enables:
Pattern Compliance:
This PR continues the MINIMAL cherry-pick pattern established in previous PRs:
__init__.pyfilesNext Steps:
The MCP service is now feature-complete for the initial release. Future enhancements may include: