SQL Challenge for Monthly/Daily Active User Analysis and Visualization
- Python 3.12 or higher
- pipenv (Python package manager)
- Install dependencies:
pipenv install
The project includes a script to generate test login data. To generate test data:
pipenv run generate-logins <output_file> <start_date> <end_date>
Example:
pipenv run generate-logins new_logins.csv 2025-01-01 2025-04-15
This will generate a CSV file containing login records with:
- Timestamps between the specified dates
- User IDs from users.csv
- Login propensity-based login frequency
- Occasional IP and user agent changes
- Simulated failed login attempts
NOTE: This can generate large volumes of test data. Try not to push files larger that 10Gb to GitHub.
Modern identity management systems provide event streams of login attempts, which are crucial for measuring user engagement through metrics like Daily Active Users (DAU) and Monthly Active Users (MAU). This challenge focuses on analyzing login data to create meaningful visualizations of user engagement patterns.
The project includes two main data files:
- Contains user information including:
- user_id: Unique identifier for each user
- user_name: Username
- street: User's street address
- city: User's city
- state: User's state
- zip: User's zip code
- login_propensity: Probability (0-1) of user logging in on any given day
- ip_address: Default IP address for the user
- user_agent: Default user agent string for the user
You can generate logins using the pipenv run generate-logins
command
- Contains login event data including:
- timestamp: Login attempt timestamp (UTC)
- user_id: Reference to users.csv
- login_status: Success/failure status
- Additional metadata (IP, user agent, etc.)
- Create a visualization showing DAU trends over the two-month period
- Create a visualization showing MAU trends, use a moving 30-day window to show MAU over the prior 30 days.
- Ensure proper handling of multiple logins per user per day/month (no double counting)
- Accurate DAU/MAU calculations
- Clear, readable visualizations
- Proper handling of edge cases (time zones, missing data)
- Documentation of your approach and findings
-
Geographic Analysis
- MAU by State visualization
- User distribution heat map
- Regional engagement patterns
-
Engagement Analysis
- DAU/MAU ratio trends
- User retention analysis
- Login time pattern analysis (morning vs evening users)
-
Advanced Metrics
- User session duration analysis
- Login frequency patterns
- User churn prediction
You are free to choose any tools, languages, or frameworks to complete this challenge. Here are some example approaches, but feel free to use any method you prefer:
Probably the simplest possible visualization would be
- Connect to your database using DBeaver
- Run your DAU/MAU queries
- Export results to CSV
- Open in your preferred spreadsheet software (Excel, Google Sheets, etc.)
- Create visualizations using the built-in charting tools
If you wanted to exercise your python skills a nice approach would be to use pandas and matplot. Your code might look something like this:
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3
conn = sqlite3.connect("example.db")
df = pd.read_sql_query("SELECT date, value FROM timeseries", conn)
plt.plot(df['date'], df['value'])
plt.title("Line Graph")
plt.xlabel("Date")
plt.ylabel("Value")
plt.show()
If you want to take your skills to the next level, create a live interactive dashboard using open source observability tools that are commonly used in IT operations. You might want to run grafana and your database in containers, the Docker compose file might look something like this:
version: '3.8'
services:
postgres:
image: postgres:15
container_name: postgres
environment:
POSTGRES_USER: grafana
POSTGRES_PASSWORD: grafana
POSTGRES_DB: metrics
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
depends_on:
- postgres
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-storage:/var/lib/grafana
volumes:
pgdata:
grafana-storage:
Create your own unique approach using any combination of tools and techniques.
- Create a branch of this repository for your work
- Update this README with information about your solution
- Include instructions on how to duplicate your solution
- Provide any additional analysis or insights
- Consider timezone handling in your calculations
- Think about how to handle edge cases
- Focus on creating clear, meaningful visualizations
- Document your approach and reasoning
- Consider scalability and performance
- Think about how your solution could be extended