Developer Insights Analytics Dashboard 🏴‍☠️📊

Yarr! Welcome to the Developer Insights Analytics Dashboard, a comprehensive data analysis treasure chest that helps data analysts explore and visualize developer survey data! This be a flexible, full-stack application built with modern data analysis practices in mind.

🎓 Plus: Complete GitHub Copilot Training Materials! This project includes comprehensive exercises from beginner to advanced level, teaching you to master AI-assisted development with real-world scenarios.

🗺️ Project Structure

python-fullstack/
│
├── .gitignore
├── README.md
├── requirements.txt
├── pyproject.toml
│
├── data/
│   └── kaggle_so_2023_data.zip    # Stack Overflow 2023 survey data (auto-extracted on first run)
│
├── app/
│   ├── __init__.py
│   ├── main.py                    # Main FastAPI application
│   ├── data_config.py            # Data source configuration & analysis
│   └── templates/
│       └── index.html            # Analytics dashboard frontend
│
├── docs/
│   └── specifications/
│       └── project_specs.md      # Technical specifications
│
├── exercises/                     # 🎓 GitHub Copilot Training Materials
│   ├── copilot-cheatsheet.md     # Quick reference guide
│   ├── copilot-beginner-exercises.md     # Foundation skills (2-3 hrs)
│   ├── copilot-advanced-exercises.md     # Mode selection guide
│   ├── copilot-advanced-mode-a.md        # Hands-on track (8-12 hrs)
│   ├── copilot-advanced-mode-b.md        # Agent-driven track (12-18 hrs)
│   └── copilot-ci-exercise.md            # DevOps with Act
│
└── tests/
    ├── __init__.py
    └── test_main.py              # Comprehensive test suite

⚓ Technology Stack

Backend: Python 3.10+ with FastAPI
Data Analysis: Pandas with flexible data source management
Web Server: Uvicorn with auto-reload
Frontend: HTML5, JavaScript (ES6+), Chart.js with interactive controls
API Design: RESTful with Pydantic models and comprehensive error handling
Testing: Pytest with full API coverage

🔍 Analytics Features

This application is designed specifically for data analysts who need:

📊 Flexible Data Analysis

Multiple Technology Categories: Languages, Databases, Platforms, Web Frameworks
Configurable Results: Choose top 10, 15, 20, or 25 results
Real-time Analysis: Interactive dashboard with instant results
Comparison Views: "Have Worked With" vs "Want to Work With" analysis

🔌 Extensible Data Sources

Modular Design: Easy to add new data sources
Schema Validation: Built-in data validation and error handling
Multiple Format Support: CSV with automatic schema detection
Data Quality Insights: Response counts and unique technology metrics

🏴‍☠️ Setup Instructions

1. Data Setup (Smart Zip Management System)

The application features an intelligent data management system designed for data analysts who work with multiple datasets:

🤖 Automatic Data Source Detection

Zero Configuration: Drop any survey data zip file into data/ folder
Auto-Extraction: Zip files are automatically extracted on application startup
Smart Detection: CSV files are automatically discovered and configured
Technology Analysis: Columns with semicolon-separated tech lists are auto-detected

📦 Current Data Sources

Stack Overflow 2023: kaggle_so_2023_data.zip (20MB compressed → 151MB extracted)
- Extracts to kaggle_so_2023_data/ folder on first application startup
- Contains survey_results_public.csv with 89,000+ developer responses
- Includes survey_results_schema.csv with column definitions
- Pre-configured with 8 technology analysis categories

➕ Adding New Data Sources (Open-Ended Design)

Perfect for data analysts working with multiple survey datasets:

Prepare Your Data:

your_survey_data/
├── main_survey_responses.csv     # Main data (any CSV name works)
├── schema_definitions.csv        # Optional (detected by "schema" in name)
└── documentation.txt             # Additional files (ignored)

Create Zip Archive:

zip -r your_survey_2024.zip your_survey_data/

Deploy to Application:

cp your_survey_2024.zip /path/to/project/data/
# Application auto-detects and configures on next startup

Automatic Configuration:
- Main data file detected (largest CSV or one with "survey"/"results" in name)
- Schema file detected (contains "schema" in filename)
- Technology columns identified (contain "language", "database", "platform", etc.)
- New data source registered and available in dashboard

🔍 Data Format Requirements

Primary Format: CSV files with semicolon-separated technology lists
Column Detection: Automatic detection of technology-related columns
Schema Support: Optional schema files for column descriptions
Size Limit: Zip files should be under GitHub's 100MB limit

📊 Example Multi-Source Setup

data/
├── kaggle_so_2023_data.zip         # Stack Overflow 2023
├── kaggle_so_2023_data/            # Auto-extracted
├── github_dev_survey_2024.zip      # Your GitHub survey
├── github_dev_survey_2024/         # Auto-extracted
├── company_internal_survey.zip     # Internal survey
├── company_internal_survey/        # Auto-extracted
└── .gitignore                      # Excludes CSV files, includes zips

Each data source becomes automatically available in the dashboard with detected technology categories!

Data Contents:

survey_results_public.csv - Main survey responses (151MB)
survey_results_schema.csv - Data schema and column descriptions
so_survey_2023.pdf - Survey documentation
README_2023.txt - Additional information

2. Install Dependencies

Make sure ye have Python 3.10+ installed, then install the required packages:

# Activate the virtual environment (if ye haven't already)
source venv/bin/activate  # On macOS/Linux
# or
venv\\Scripts\\activate   # On Windows

# Install the treasure chest of dependencies
pip install -r requirements.txt

3. Run the Application

Start the FastAPI server like hoisting the main sail:

# Run the application with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

4. Access the Analytics Dashboard

Once the server be running, open yer browser and navigate to:

Interactive Dashboard: http://localhost:8000
API Documentation: http://localhost:8000/docs (FastAPI auto-generated)
Data Sources API: http://localhost:8000/api/data-sources

🧪 Running Tests

To run the comprehensive test suite:

# Run all tests with verbose output
pytest -v

# Run tests with coverage report
pytest --cov=app --cov-report=html

# Run specific test categories
pytest tests/test_main.py::test_technology_analysis_endpoint -v

📊 API Endpoints

GET `/api/data-sources`

Description: Lists all available data sources and their analysis capabilities
Response: Array of data source information with available columns

GET `/api/analysis/technology-usage`

Description: Flexible technology usage analysis with multiple parameters
Parameters:
- source: Data source name (default: "stackoverflow_2023")
- column: Technology category to analyze (default: "LanguageHaveWorkedWith")
- top_n: Number of results to return (1-50, default: 10)
Response: Comprehensive analysis results with metadata

GET `/api/schema/{source_name}`

Description: Returns schema information for a data source
Response: Data structure and column definitions

GET `/api/languages/popular` (Legacy)

Description: Backward-compatible endpoint for original specification
Response: Top 10 programming languages in legacy format

GET `/`

Description: Interactive analytics dashboard
Response: Full-featured HTML dashboard with controls

🎯 Data Analyst Features

🔧 Interactive Analysis Controls

Data Source Selection: Choose from available datasets
Technology Categories: 8+ different analysis dimensions
- Programming Languages (Used/Wanted)
- Databases (Used/Wanted)
- Platforms (Used/Wanted)
- Web Frameworks (Used/Wanted)
Result Customization: Adjustable result counts
Real-time Updates: Instant analysis with loading indicators

📈 Rich Visualizations

Interactive Bar Charts: Hover details with percentages
Color-coded Categories: Professional color schemes
Responsive Design: Works on all screen sizes
Export Ready: High-quality charts suitable for presentations

📊 Analysis Metadata

Response Counts: Total survey responses analyzed
Technology Coverage: Number of unique technologies found
Data Quality: Insights into data completeness
Source Attribution: Clear data provenance

🚀 Future Enhancements for Data Analysts

This application be designed with extensibility in mind! Future versions could include:

📊 Advanced Analytics

Cross-tabulation Analysis: Technology combinations and correlations
Trend Analysis: Year-over-year comparisons when historical data is available
Demographic Breakdowns: Analysis by experience level, company size, location
Salary Analysis: Compensation trends by technology stack

🔄 Data Pipeline Features

Multiple Data Sources: Support for different survey years and sources
Data Refresh Automation: Scheduled data updates and processing
Data Quality Monitoring: Automated validation and completeness checks
Custom Data Uploads: Allow analysts to upload their own datasets

📈 Enhanced Visualizations

Multiple Chart Types: Scatter plots, heatmaps, time series
Interactive Filtering: Dynamic data exploration with multiple dimensions
Export Capabilities: PDF reports, CSV exports, chart images
Dashboard Customization: Save and share custom analysis configurations

🔒 Enterprise Features

User Authentication: Multi-user support with role-based access
API Rate Limiting: Production-ready API with proper throttling
Database Integration: PostgreSQL/MongoDB for larger datasets
Caching Layer: Redis for improved performance with large datasets

👥 For Data Analysts

This application follows data analysis best practices:

Reproducible Analysis: All analysis parameters are configurable and documented
Data Validation: Built-in checks for data quality and completeness
Error Handling: Graceful handling of missing data and edge cases
Performance Optimization: Efficient data processing for large datasets
API-First Design: Easy integration with other analysis tools and notebooks
Comprehensive Testing: Full test coverage ensures reliability

🏴‍☠️ Development Notes

Modular Architecture: Easy to extend with new data sources and analysis types
Clean Code Principles: Well-documented, maintainable codebase
Type Safety: Pydantic models for API contract enforcement
Async Support: Built for high-performance concurrent requests
Docker Ready: Easy containerization for deployment
All code be commented in proper pirate fashion, yarr!

📝 License

This treasure be open source - use it freely for yer data analysis adventures, but remember to give credit where it be due!

Built with ❤️ and ⚓ by data analyst pirates who love clean code, robust analysis, and beautiful visualizations

Perfect for:

Data Analysis: Survey data exploration, technology trend analysis, multi-source data integration
Learning: Modern full-stack development with Python/FastAPI and data science applications
GitHub Copilot Training: Comprehensive exercises from beginner to advanced AI-assisted development
DevOps: Local CI/CD workflows and development optimization

🎓 GitHub Copilot Training Exercises

Exercise	Link	Description	Requirements
Cheatsheet	`copilot-cheatsheet.md`	Quick reference for context variables and commands	None
Beginner	`copilot-beginner-exercises.md`	Foundation Copilot skills	Basic Python, VS Code + Copilot
Advanced Mode A	`copilot-advanced-mode-a.md`	Hands-on development track	Intermediate Python/FastAPI
Advanced Mode B	`copilot-advanced-mode-b.md`	Agent-driven development track	Advanced Python, architecture knowledge
CI/CD with Act	`copilot-ci-exercise.md`	Local GitHub Actions workflow	Docker Desktop, Act installation

Choose based on your experience level and preferred learning style. Start with the Cheatsheet for quick reference or Beginner exercises if new to Copilot.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
app		app
data		data
docs/specifications		docs/specifications
exercises		exercises
tests		tests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

EficodeDemoOrg/copilot-python-data-analysis

Folders and files

Latest commit

History

Repository files navigation