diff --git a/lab-dw-aggregating.ipynb b/lab-dw-aggregating.ipynb index fadd718..964ad20 100644 --- a/lab-dw-aggregating.ipynb +++ b/lab-dw-aggregating.ipynb @@ -1,165 +1,444 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "31969215-2a90-4d8b-ac36-646a7ae13744", - "metadata": { - "id": "31969215-2a90-4d8b-ac36-646a7ae13744" - }, - "source": [ - "# Lab | Data Aggregation and Filtering" - ] - }, - { - "cell_type": "markdown", - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", - "metadata": { - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" - }, - "source": [ - "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", - "\n", - "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", - "\n", - "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." - ] - }, - { - "cell_type": "markdown", - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", - "metadata": { - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" - }, - "source": [ - "1. Create a new DataFrame that only includes customers who:\n", - " - have a **low total_claim_amount** (e.g., below $1,000),\n", - " - have a response \"Yes\" to the last marketing campaign." - ] - }, - { - "cell_type": "markdown", - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", - "metadata": { - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" - }, - "source": [ - "2. Using the original Dataframe, analyze:\n", - " - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded \"Yes\", and\n", - " - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company." - ] - }, - { - "cell_type": "markdown", - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", - "metadata": { - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" - }, - "source": [ - "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." - ] - }, - { - "cell_type": "markdown", - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", - "metadata": { - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" - }, - "source": [ - "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." - ] - }, - { - "cell_type": "markdown", - "id": "b42999f9-311f-481e-ae63-40a5577072c5", - "metadata": { - "id": "b42999f9-311f-481e-ae63-40a5577072c5" - }, - "source": [ - "## Bonus" - ] - }, - { - "cell_type": "markdown", - "id": "81ff02c5-6584-4f21-a358-b918697c6432", - "metadata": { - "id": "81ff02c5-6584-4f21-a358-b918697c6432" - }, - "source": [ - "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." - ] - }, - { - "cell_type": "markdown", - "id": "b6aec097-c633-4017-a125-e77a97259cda", - "metadata": { - "id": "b6aec097-c633-4017-a125-e77a97259cda" - }, - "source": [ - "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", - "\n", - "*Hint:*\n", - "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", - "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", - "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" - ] - }, - { - "cell_type": "markdown", - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", - "metadata": { - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" - }, - "source": [ - "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", - "\n", - "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." - ] - }, - { - "cell_type": "markdown", - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", - "metadata": { - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" - }, - "source": [ - "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" - ] - }, + "cells": [ + { + "cell_type": "markdown", + "id": "31969215-2a90-4d8b-ac36-646a7ae13744", + "metadata": { + "id": "31969215-2a90-4d8b-ac36-646a7ae13744" + }, + "source": [ + "# Lab | Data Aggregation and Filtering" + ] + }, + { + "cell_type": "markdown", + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", + "metadata": { + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" + }, + "source": [ + "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", + "\n", + "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", + "\n", + "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + ] + }, + { + "cell_type": "markdown", + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", + "metadata": { + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" + }, + "source": [ + "1. Create a new DataFrame that only includes customers who:\n", + " - have a **low total_claim_amount** (e.g., below $1,000),\n", + " - have a response \"Yes\" to the last marketing campaign." + ] + }, + { + "cell_type": "markdown", + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", + "metadata": { + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" + }, + "source": [ + "2. Using the original Dataframe, analyze:\n", + " - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded \"Yes\", and\n", + " - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company." + ] + }, + { + "cell_type": "markdown", + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", + "metadata": { + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" + }, + "source": [ + "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + ] + }, + { + "cell_type": "markdown", + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", + "metadata": { + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" + }, + "source": [ + "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + ] + }, + { + "cell_type": "markdown", + "id": "b42999f9-311f-481e-ae63-40a5577072c5", + "metadata": { + "id": "b42999f9-311f-481e-ae63-40a5577072c5" + }, + "source": [ + "## Bonus" + ] + }, + { + "cell_type": "markdown", + "id": "81ff02c5-6584-4f21-a358-b918697c6432", + "metadata": { + "id": "81ff02c5-6584-4f21-a358-b918697c6432" + }, + "source": [ + "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + ] + }, + { + "cell_type": "markdown", + "id": "b6aec097-c633-4017-a125-e77a97259cda", + "metadata": { + "id": "b6aec097-c633-4017-a125-e77a97259cda" + }, + "source": [ + "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", + "\n", + "*Hint:*\n", + "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", + "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", + "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + ] + }, + { + "cell_type": "markdown", + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", + "metadata": { + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" + }, + "source": [ + "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", + "\n", + "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + ] + }, + { + "cell_type": "markdown", + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", + "metadata": { + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" + }, + "source": [ + "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "449513f4-0459-46a0-a18d-9398d974c9ad", + "metadata": { + "id": "449513f4-0459-46a0-a18d-9398d974c9ad" + }, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "id": "449513f4-0459-46a0-a18d-9398d974c9ad", - "metadata": { - "id": "449513f4-0459-46a0-a18d-9398d974c9ad" - }, - "outputs": [], - "source": [ - "# your code goes here" - ] - } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "-------- Detected columns --------\n", + "{'total_claim': None, 'clv': None, 'response': 'Response', 'policy_type': 'Policy Type', 'gender': 'Gender', 'monthly_premium': 'Monthly Premium Auto', 'state': 'State', 'date': None, 'sales_channel': 'Sales Channel', 'policy_identifier': 'Months Since Policy Inception'}\n", + "\n", + "Task 1: required columns missing (total_claim or response).\n", + "\n", + "-------- Task 2: Average metrics (responded 'Yes') by policy_type and gender --------\n", + " Monthly Premium Auto\n", + "Policy Type Gender \n", + "Corporate Auto F 94.30\n", + " M 92.19\n", + "Personal Auto F 99.00\n", + " M 91.09\n", + "Special Auto F 92.31\n", + " M 86.34\n", + "\n", + "-------- Task 3: Customers per state (top rows) --------\n", + "State\n", + "California 3552\n", + "Oregon 2909\n", + "Arizona 1937\n", + "Nevada 993\n", + "Washington 888\n", + "nan 631\n", + "Name: num_customers, dtype: int64\n", + "\n", + "-------- Task 3: States with more than 500 customers --------\n", + "State\n", + "California 3552\n", + "Oregon 2909\n", + "Arizona 1937\n", + "Nevada 993\n", + "Washington 888\n", + "nan 631\n", + "Name: num_customers, dtype: int64\n", + "\n", + "Task 4: required columns missing (clv, education, or gender).\n", + "\n", + "-------- Task 5: policies sold by state (rows) x month (columns) - sample --------\n", + "month Unknown\n", + "State \n", + "Arizona 1937.0\n", + "California 3552.0\n", + "Nevada 993.0\n", + "Oregon 2909.0\n", + "Washington 888.0\n", + "\n", + "-------- Task 6: Top 3 states by total policies --------\n", + "State\n", + "California 3552\n", + "Oregon 2909\n", + "Arizona 1937\n", + "Nevada 993\n", + "Washington 888\n", + "nan 631\n", + "Name: total_policies, dtype: int64\n", + "\n", + "-------- Task 6: policies sold by month for top 3 states --------\n", + "month Unknown\n", + "State \n", + "Arizona 1937.0\n", + "California 3552.0\n", + "Oregon 2909.0\n", + "\n", + "-------- Task 7: Response rate by sales channel --------\n", + "Response response_rate\n", + "Sales Channel \n", + "Agent 0.180\n", + "Branch 0.108\n", + "Call Center 0.103\n", + "Web 0.109\n", + "\n", + "Done: all tasks attempted.\n" + ] } + ], + "source": [ + "# your code goes here\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "def find_col(df, candidates):\n", + " \"\"\"Return first column name in df.columns that matches any candidate substring (case-insensitive).\"\"\"\n", + " cols = list(df.columns)\n", + " for c in candidates:\n", + " for col in cols:\n", + " if c.lower() in col.lower():\n", + " return col\n", + " return None\n", + "\n", + "def safe_print(title, obj):\n", + " print(\"\\n\" + \"-\"*8 + \" \" + title + \" \" + \"-\"*8)\n", + " print(obj)\n", + "\n", + "url = \"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\"\n", + "df = pd.read_csv(url)\n", + "\n", + "#safe copy and normalize column names for lookups\n", + "df_orig = df.copy()\n", + "df.columns = [c.strip() for c in df.columns]\n", + "\n", + "#find likely column names \n", + "col_total_claim = find_col(df, [\"total_claim\", \"total_claim_amount\", \"claim_amount\", \"total_claims\"])\n", + "col_clv = find_col(df, [\"customer_lifetime_value\", \"clv\", \"customer_lifetime\"])\n", + "col_response = find_col(df, [\"response\"])\n", + "col_policy_type = find_col(df, [\"policy_type\", \"policy type\"])\n", + "col_gender = find_col(df, [\"gender\"])\n", + "col_monthly_premium = find_col(df, [\"monthly_premium\", \"monthly_premium_auto\", \"monthly premium\"])\n", + "col_state = find_col(df, [\"state\"])\n", + "col_effective_date = find_col(df, [\"effective_to_date\", \"effective date\", \"effective_to\"])\n", + "col_sales_channel = find_col(df, [\"sales_channel\", \"sales channel\", \"channel\", \"saleschannel\"])\n", + "col_policy_id = find_col(df, [\"policy\", \"policy_id\", \"policy_id\", \"policy id\", \"customer_id\"]) # for counting policies\n", + "\n", + "#print detected columns\n", + "safe_print(\"Detected columns\", {\n", + " \"total_claim\": col_total_claim,\n", + " \"clv\": col_clv,\n", + " \"response\": col_response,\n", + " \"policy_type\": col_policy_type,\n", + " \"gender\": col_gender,\n", + " \"monthly_premium\": col_monthly_premium,\n", + " \"state\": col_state,\n", + " \"date\": col_effective_date,\n", + " \"sales_channel\": col_sales_channel,\n", + " \"policy_identifier\": col_policy_id\n", + "})\n", + "\n", + "#convert date column and extract month\n", + "if col_effective_date:\n", + " df[col_effective_date] = pd.to_datetime(df[col_effective_date], errors=\"coerce\")\n", + " df[\"month\"] = df[col_effective_date].dt.month_name().fillna(\"Unknown\")\n", + " df[\"month_num\"] = df[col_effective_date].dt.month.fillna(0).astype(int)\n", + "else:\n", + " #try to parse a column named month or create Unknown\n", + " if \"month\" in df.columns:\n", + " df[\"month\"] = df[\"month\"].astype(str)\n", + " df[\"month_num\"] = 0\n", + " else:\n", + " df[\"month\"] = \"Unknown\"\n", + " df[\"month_num\"] = 0\n", + "\n", + "#make sure that numeric columns are numeric\n", + "if col_total_claim:\n", + " df[col_total_claim] = pd.to_numeric(df[col_total_claim], errors=\"coerce\")\n", + "if col_clv:\n", + " # some CLV values might have % or commas try to clean\n", + " df[col_clv] = df[col_clv].astype(str).str.replace(\"%\",\"\", regex=False).str.replace(\",\",\"\", regex=False)\n", + " df[col_clv] = pd.to_numeric(df[col_clv], errors=\"coerce\")\n", + "if col_monthly_premium:\n", + " df[col_monthly_premium] = pd.to_numeric(df[col_monthly_premium], errors=\"coerce\")\n", + "\n", + "#normalize categorical columns\n", + "for c in [col_response, col_policy_type, col_gender, col_state, col_sales_channel]:\n", + " if c:\n", + " df[c] = df[c].astype(str).str.strip()\n", + "\n", + "#(1) \n", + "if col_total_claim and col_response:\n", + " subset1 = df[(df[col_total_claim] < 1000) & (df[col_response].str.lower() == \"yes\")]\n", + " safe_print(\"Task 1: Low total_claim_amount (<1000) & responded 'Yes' - sample and count\",\n", + " f\"Rows: {len(subset1)}\\nPreview:\\n{subset1.head()}\")\n", + "else:\n", + " print(\"\\nTask 1: required columns missing (total_claim or response).\")\n", + "\n", + "#(2)\n", + "if col_response and col_policy_type and col_gender:\n", + " responded_yes = df[df[col_response].str.lower() == \"yes\"]\n", + "\n", + " group_cols = [col_policy_type, col_gender]\n", + " agg_cols = {}\n", + " if col_monthly_premium:\n", + " agg_cols[col_monthly_premium] = \"mean\"\n", + " if col_clv:\n", + " agg_cols[col_clv] = \"mean\"\n", + " if col_total_claim:\n", + " agg_cols[col_total_claim] = \"mean\"\n", + "\n", + " if agg_cols:\n", + " pivot = responded_yes.groupby(group_cols).agg(agg_cols).round(2)\n", + " safe_print(\"Task 2: Average metrics (responded 'Yes') by policy_type and gender\", pivot)\n", + "#compare which segments have lower total_claim and higher CLV?\n", + " if col_clv and col_total_claim:\n", + "#compute ratio CLV / total_claim to judge profitability\n", + " pivot = pivot.copy()\n", + "# guard divide by zero\n", + " pivot[\"clv_over_claims\"] = (pivot[col_clv] / pivot[col_total_claim]).replace([np.inf, -np.inf], np.nan).round(2)\n", + " safe_print(\"Task 2: add clv_over_claims (higher = better)\", pivot.sort_values(\"clv_over_claims\", ascending=False))\n", + " else:\n", + " print(\"\\nTask 2: No numeric columns (monthly_premium or clv or total_claim) found to aggregate.\")\n", + "else:\n", + " print(\"\\nTask 2: required columns missing (response, policy_type, or gender).\")\n", + "\n", + "#(3)\n", + "if col_state:\n", + " counts_by_state = df.groupby(col_state).size().rename(\"num_customers\").sort_values(ascending=False)\n", + " safe_print(\"Task 3: Customers per state (top rows)\", counts_by_state.head(20))\n", + " states_gt_500 = counts_by_state[counts_by_state > 500]\n", + " safe_print(\"Task 3: States with more than 500 customers\", states_gt_500)\n", + "else:\n", + " print(\"\\nTask 3: state column not found.\")\n", + "\n", + "#(4)\n", + "col_education = find_col(df, [\"education\"])\n", + "if col_clv and col_education and col_gender:\n", + " stats = df.groupby([col_education, col_gender])[col_clv].agg([\"max\", \"min\", \"median\"]).round(2)\n", + " safe_print(\"Task 4: CLV stats by education and gender\", stats)\n", + " # Some short conclusions\n", + " print(\"\\nTask 4: Conclusions (example):\")\n", + " print(\"Look for education/gender groups with high median CLV (good long-term value).\")\n", + " print(\"Look for groups with low median and high max (high variance) to inspect risk.\")\n", + "else:\n", + " print(\"\\nTask 4: required columns missing (clv, education, or gender).\")\n", + "\n", + "#(5)\n", + "if col_state:\n", + " grouped = df.groupby([col_state, \"month\"]).size().reset_index(name=\"policies_sold\")\n", + " pivot_state_month = grouped.pivot_table(index=col_state, columns=\"month\", values=\"policies_sold\", fill_value=0)\n", + "#put months by calendar order if month names exist\n", + " month_order = [\"January\",\"February\",\"March\",\"April\",\"May\",\"June\",\"July\",\"August\",\"September\",\"October\",\"November\",\"December\"]\n", + "#keep only months that appear in columns, but in calendar order\n", + " existing_months = [m for m in month_order if m in pivot_state_month.columns]\n", + " other_months = [c for c in pivot_state_month.columns if c not in existing_months]\n", + " pivot_state_month = pivot_state_month[existing_months + other_months]\n", + " safe_print(\"Task 5: policies sold by state (rows) x month (columns) - sample\", pivot_state_month.head())\n", + "else:\n", + " print(\"\\nTask 5: state column not found; cannot compute state x month table.\")\n", + "\n", + "#(6)\n", + "if col_state:\n", + " total_by_state = df.groupby(col_state).size().rename(\"total_policies\").sort_values(ascending=False)\n", + " top3 = total_by_state.head(3).index.tolist()\n", + " safe_print(\"Task 6: Top 3 states by total policies\", total_by_state.head(10))\n", + "#filter grouped table from above for top3\n", + " top3_df = grouped[grouped[col_state].isin(top3)].pivot_table(index=col_state, columns=\"month\", values=\"policies_sold\", fill_value=0)\n", + "#put months like before\n", + " existing_months_top3 = [m for m in month_order if m in top3_df.columns]\n", + " other_months_top3 = [c for c in top3_df.columns if c not in existing_months_top3]\n", + " top3_df = top3_df[existing_months_top3 + other_months_top3]\n", + " safe_print(\"Task 6: policies sold by month for top 3 states\", top3_df)\n", + "else:\n", + " print(\"\\nTask 6: state column not found; cannot compute top 3 states per month.\")\n", + "\n", + "#(7)\n", + "if col_sales_channel and col_response:\n", + " channel_counts = df.groupby(col_sales_channel)[col_response].value_counts().unstack(fill_value=0)\n", + " if \"Yes\" in channel_counts.columns or \"yes\" in channel_counts.columns:\n", + " yes_col = \"Yes\" if \"Yes\" in channel_counts.columns else \"yes\"\n", + " channel_counts[\"response_rate\"] = (channel_counts[yes_col] / channel_counts.sum(axis=1)).round(3)\n", + " else:\n", + " channel_counts[\"response_rate\"] = (channel_counts.filter(regex='(?i)^yes$').sum(axis=1) / channel_counts.sum(axis=1)).round(3)\n", + " safe_print(\"Task 7: Response rate by sales channel\", channel_counts[[\"response_rate\"]] if \"response_rate\" in channel_counts.columns else channel_counts)\n", + "else:\n", + " channel_cols = [c for c in df.columns if any(k in c.lower() for k in [\"channel\",\"branch\",\"web\",\"mail\",\"call\",\"online\"])]\n", + " channel_cols = [c for c in channel_cols if c not in [col_sales_channel, col_state, col_policy_type, col_gender, col_response]]\n", + " if channel_cols and col_response:\n", + " melted = df.melt(id_vars=[col_response], value_vars=channel_cols, var_name=\"channel\", value_name=\"channel_value\")\n", + " melted[\"channel_active\"] = melted[\"channel_value\"].astype(str).str.lower().isin([\"1\",\"true\",\"yes\",\"y\"])\n", + " channel_stats = melted.groupby(\"channel\").apply(\n", + " lambda g: pd.Series({\n", + " \"n_customers\": g.shape[0],\n", + " \"n_responded_yes\": g.loc[g[col_response].str.lower()==\"yes\", \"channel_active\"].sum(),\n", + " \"response_rate\": (g.loc[g[col_response].str.lower()==\"yes\", \"channel_active\"].sum() / g[\"channel_active\"].sum()) if g[\"channel_active\"].sum()>0 else np.nan\n", + " })\n", + " ).sort_values(\"response_rate\", ascending=False)\n", + " safe_print(\"Task 7 (melt path): channel response stats\", channel_stats)\n", + " else:\n", + " print(\"\\nTask 7: Could not detect sales channel structure. Please check column names for sales channel info.\")\n", + "\n", + "print(\"\\nDone: all tasks attempted.\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd667256-bd4d-4d8b-b6c2-d6b8e7220cba", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python [conda env:base] *", + "language": "python", + "name": "conda-base-py" }, - "nbformat": 4, - "nbformat_minor": 5 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 }