Skip to content

EficodeDemoOrg/copilot-python-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Developer Insights Analytics Dashboard πŸ΄β€β˜ οΈπŸ“Š

Yarr! Welcome to the Developer Insights Analytics Dashboard, a comprehensive data analysis treasure chest that helps data analysts explore and visualize developer survey data! This be a flexible, full-stack application built with modern data analysis practices in mind.

πŸŽ“ Plus: Complete GitHub Copilot Training Materials! This project includes comprehensive exercises from beginner to advanced level, teaching you to master AI-assisted development with real-world scenarios.

πŸ—ΊοΈ Project Structure

python-fullstack/
β”‚
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
β”‚
β”œβ”€β”€ data/
β”‚   └── kaggle_so_2023_data.zip    # Stack Overflow 2023 survey data (auto-extracted on first run)
β”‚
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                    # Main FastAPI application
β”‚   β”œβ”€β”€ data_config.py            # Data source configuration & analysis
β”‚   └── templates/
β”‚       └── index.html            # Analytics dashboard frontend
β”‚
β”œβ”€β”€ docs/
β”‚   └── specifications/
β”‚       └── project_specs.md      # Technical specifications
β”‚
β”œβ”€β”€ exercises/                     # πŸŽ“ GitHub Copilot Training Materials
β”‚   β”œβ”€β”€ copilot-cheatsheet.md     # Quick reference guide
β”‚   β”œβ”€β”€ copilot-beginner-exercises.md     # Foundation skills (2-3 hrs)
β”‚   β”œβ”€β”€ copilot-advanced-exercises.md     # Mode selection guide
β”‚   β”œβ”€β”€ copilot-advanced-mode-a.md        # Hands-on track (8-12 hrs)
β”‚   β”œβ”€β”€ copilot-advanced-mode-b.md        # Agent-driven track (12-18 hrs)
β”‚   └── copilot-ci-exercise.md            # DevOps with Act
β”‚
└── tests/
    β”œβ”€β”€ __init__.py
    └── test_main.py              # Comprehensive test suite

βš“ Technology Stack

  • Backend: Python 3.10+ with FastAPI
  • Data Analysis: Pandas with flexible data source management
  • Web Server: Uvicorn with auto-reload
  • Frontend: HTML5, JavaScript (ES6+), Chart.js with interactive controls
  • API Design: RESTful with Pydantic models and comprehensive error handling
  • Testing: Pytest with full API coverage

πŸ” Analytics Features

This application is designed specifically for data analysts who need:

πŸ“Š Flexible Data Analysis

  • Multiple Technology Categories: Languages, Databases, Platforms, Web Frameworks
  • Configurable Results: Choose top 10, 15, 20, or 25 results
  • Real-time Analysis: Interactive dashboard with instant results
  • Comparison Views: "Have Worked With" vs "Want to Work With" analysis

πŸ”Œ Extensible Data Sources

  • Modular Design: Easy to add new data sources
  • Schema Validation: Built-in data validation and error handling
  • Multiple Format Support: CSV with automatic schema detection
  • Data Quality Insights: Response counts and unique technology metrics

πŸ΄β€β˜ οΈ Setup Instructions

1. Data Setup (Smart Zip Management System)

The application features an intelligent data management system designed for data analysts who work with multiple datasets:

πŸ€– Automatic Data Source Detection

  • Zero Configuration: Drop any survey data zip file into data/ folder
  • Auto-Extraction: Zip files are automatically extracted on application startup
  • Smart Detection: CSV files are automatically discovered and configured
  • Technology Analysis: Columns with semicolon-separated tech lists are auto-detected

πŸ“¦ Current Data Sources

  • Stack Overflow 2023: kaggle_so_2023_data.zip (20MB compressed β†’ 151MB extracted)
    • Extracts to kaggle_so_2023_data/ folder on first application startup
    • Contains survey_results_public.csv with 89,000+ developer responses
    • Includes survey_results_schema.csv with column definitions
    • Pre-configured with 8 technology analysis categories

βž• Adding New Data Sources (Open-Ended Design)

Perfect for data analysts working with multiple survey datasets:

  1. Prepare Your Data:

    your_survey_data/
    β”œβ”€β”€ main_survey_responses.csv     # Main data (any CSV name works)
    β”œβ”€β”€ schema_definitions.csv        # Optional (detected by "schema" in name)
    └── documentation.txt             # Additional files (ignored)
    
  2. Create Zip Archive:

    zip -r your_survey_2024.zip your_survey_data/
  3. Deploy to Application:

    cp your_survey_2024.zip /path/to/project/data/
    # Application auto-detects and configures on next startup
  4. Automatic Configuration:

    • Main data file detected (largest CSV or one with "survey"/"results" in name)
    • Schema file detected (contains "schema" in filename)
    • Technology columns identified (contain "language", "database", "platform", etc.)
    • New data source registered and available in dashboard

πŸ” Data Format Requirements

  • Primary Format: CSV files with semicolon-separated technology lists
  • Column Detection: Automatic detection of technology-related columns
  • Schema Support: Optional schema files for column descriptions
  • Size Limit: Zip files should be under GitHub's 100MB limit

πŸ“Š Example Multi-Source Setup

data/
β”œβ”€β”€ kaggle_so_2023_data.zip         # Stack Overflow 2023
β”œβ”€β”€ kaggle_so_2023_data/            # Auto-extracted
β”œβ”€β”€ github_dev_survey_2024.zip      # Your GitHub survey
β”œβ”€β”€ github_dev_survey_2024/         # Auto-extracted
β”œβ”€β”€ company_internal_survey.zip     # Internal survey
β”œβ”€β”€ company_internal_survey/        # Auto-extracted
└── .gitignore                      # Excludes CSV files, includes zips

Each data source becomes automatically available in the dashboard with detected technology categories!

Data Contents:

  • survey_results_public.csv - Main survey responses (151MB)
  • survey_results_schema.csv - Data schema and column descriptions
  • so_survey_2023.pdf - Survey documentation
  • README_2023.txt - Additional information

2. Install Dependencies

Make sure ye have Python 3.10+ installed, then install the required packages:

# Activate the virtual environment (if ye haven't already)
source venv/bin/activate  # On macOS/Linux
# or
venv\\Scripts\\activate   # On Windows

# Install the treasure chest of dependencies
pip install -r requirements.txt

3. Run the Application

Start the FastAPI server like hoisting the main sail:

# Run the application with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

4. Access the Analytics Dashboard

Once the server be running, open yer browser and navigate to:

πŸ§ͺ Running Tests

To run the comprehensive test suite:

# Run all tests with verbose output
pytest -v

# Run tests with coverage report
pytest --cov=app --cov-report=html

# Run specific test categories
pytest tests/test_main.py::test_technology_analysis_endpoint -v

πŸ“Š API Endpoints

GET /api/data-sources

  • Description: Lists all available data sources and their analysis capabilities
  • Response: Array of data source information with available columns

GET /api/analysis/technology-usage

  • Description: Flexible technology usage analysis with multiple parameters
  • Parameters:
    • source: Data source name (default: "stackoverflow_2023")
    • column: Technology category to analyze (default: "LanguageHaveWorkedWith")
    • top_n: Number of results to return (1-50, default: 10)
  • Response: Comprehensive analysis results with metadata

GET /api/schema/{source_name}

  • Description: Returns schema information for a data source
  • Response: Data structure and column definitions

GET /api/languages/popular (Legacy)

  • Description: Backward-compatible endpoint for original specification
  • Response: Top 10 programming languages in legacy format

GET /

  • Description: Interactive analytics dashboard
  • Response: Full-featured HTML dashboard with controls

🎯 Data Analyst Features

πŸ”§ Interactive Analysis Controls

  • Data Source Selection: Choose from available datasets
  • Technology Categories: 8+ different analysis dimensions
    • Programming Languages (Used/Wanted)
    • Databases (Used/Wanted)
    • Platforms (Used/Wanted)
    • Web Frameworks (Used/Wanted)
  • Result Customization: Adjustable result counts
  • Real-time Updates: Instant analysis with loading indicators

πŸ“ˆ Rich Visualizations

  • Interactive Bar Charts: Hover details with percentages
  • Color-coded Categories: Professional color schemes
  • Responsive Design: Works on all screen sizes
  • Export Ready: High-quality charts suitable for presentations

πŸ“Š Analysis Metadata

  • Response Counts: Total survey responses analyzed
  • Technology Coverage: Number of unique technologies found
  • Data Quality: Insights into data completeness
  • Source Attribution: Clear data provenance

πŸš€ Future Enhancements for Data Analysts

This application be designed with extensibility in mind! Future versions could include:

πŸ“Š Advanced Analytics

  • Cross-tabulation Analysis: Technology combinations and correlations
  • Trend Analysis: Year-over-year comparisons when historical data is available
  • Demographic Breakdowns: Analysis by experience level, company size, location
  • Salary Analysis: Compensation trends by technology stack

πŸ”„ Data Pipeline Features

  • Multiple Data Sources: Support for different survey years and sources
  • Data Refresh Automation: Scheduled data updates and processing
  • Data Quality Monitoring: Automated validation and completeness checks
  • Custom Data Uploads: Allow analysts to upload their own datasets

πŸ“ˆ Enhanced Visualizations

  • Multiple Chart Types: Scatter plots, heatmaps, time series
  • Interactive Filtering: Dynamic data exploration with multiple dimensions
  • Export Capabilities: PDF reports, CSV exports, chart images
  • Dashboard Customization: Save and share custom analysis configurations

πŸ”’ Enterprise Features

  • User Authentication: Multi-user support with role-based access
  • API Rate Limiting: Production-ready API with proper throttling
  • Database Integration: PostgreSQL/MongoDB for larger datasets
  • Caching Layer: Redis for improved performance with large datasets

πŸ‘₯ For Data Analysts

This application follows data analysis best practices:

  • Reproducible Analysis: All analysis parameters are configurable and documented
  • Data Validation: Built-in checks for data quality and completeness
  • Error Handling: Graceful handling of missing data and edge cases
  • Performance Optimization: Efficient data processing for large datasets
  • API-First Design: Easy integration with other analysis tools and notebooks
  • Comprehensive Testing: Full test coverage ensures reliability

πŸ΄β€β˜ οΈ Development Notes

  • Modular Architecture: Easy to extend with new data sources and analysis types
  • Clean Code Principles: Well-documented, maintainable codebase
  • Type Safety: Pydantic models for API contract enforcement
  • Async Support: Built for high-performance concurrent requests
  • Docker Ready: Easy containerization for deployment
  • All code be commented in proper pirate fashion, yarr!

πŸ“ License

This treasure be open source - use it freely for yer data analysis adventures, but remember to give credit where it be due!


Built with ❀️ and βš“ by data analyst pirates who love clean code, robust analysis, and beautiful visualizations

Perfect for:

  • Data Analysis: Survey data exploration, technology trend analysis, multi-source data integration
  • Learning: Modern full-stack development with Python/FastAPI and data science applications
  • GitHub Copilot Training: Comprehensive exercises from beginner to advanced AI-assisted development
  • DevOps: Local CI/CD workflows and development optimization

πŸŽ“ GitHub Copilot Training Exercises

Exercise Link Description Requirements
Cheatsheet copilot-cheatsheet.md Quick reference for context variables and commands None
Beginner copilot-beginner-exercises.md Foundation Copilot skills Basic Python, VS Code + Copilot
Advanced Mode A copilot-advanced-mode-a.md Hands-on development track Intermediate Python/FastAPI
Advanced Mode B copilot-advanced-mode-b.md Agent-driven development track Advanced Python, architecture knowledge
CI/CD with Act copilot-ci-exercise.md Local GitHub Actions workflow Docker Desktop, Act installation

Choose based on your experience level and preferred learning style. Start with the Cheatsheet for quick reference or Beginner exercises if new to Copilot.


About

Copilot Exercises with Python for Data Analysis and Visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published