diff --git a/lab-dw-aggregating.ipynb b/lab-dw-aggregating.ipynb index fadd718..8b1254b 100644 --- a/lab-dw-aggregating.ipynb +++ b/lab-dw-aggregating.ipynb @@ -1,165 +1,1908 @@ { - "cells": [ + "cells": [ + { + "cell_type": "markdown", + "id": "31969215-2a90-4d8b-ac36-646a7ae13744", + "metadata": { + "id": "31969215-2a90-4d8b-ac36-646a7ae13744" + }, + "source": [ + "# Lab | Data Aggregation and Filtering" + ] + }, + { + "cell_type": "markdown", + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", + "metadata": { + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" + }, + "source": [ + "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", + "\n", + "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", + "\n", + "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + ] + }, + { + "cell_type": "markdown", + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", + "metadata": { + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" + }, + "source": [ + "1. Create a new DataFrame that only includes customers who:\n", + " - have a **low total_claim_amount** (e.g., below $1,000),\n", + " - have a response \"Yes\" to the last marketing campaign." + ] + }, + { + "cell_type": "markdown", + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", + "metadata": { + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" + }, + "source": [ + "2. Using the original Dataframe, analyze:\n", + " - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded \"Yes\", and\n", + " - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company." + ] + }, + { + "cell_type": "markdown", + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", + "metadata": { + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" + }, + "source": [ + "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + ] + }, + { + "cell_type": "markdown", + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", + "metadata": { + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" + }, + "source": [ + "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + ] + }, + { + "cell_type": "markdown", + "id": "b42999f9-311f-481e-ae63-40a5577072c5", + "metadata": { + "id": "b42999f9-311f-481e-ae63-40a5577072c5" + }, + "source": [ + "## Bonus" + ] + }, + { + "cell_type": "markdown", + "id": "81ff02c5-6584-4f21-a358-b918697c6432", + "metadata": { + "id": "81ff02c5-6584-4f21-a358-b918697c6432" + }, + "source": [ + "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + ] + }, + { + "cell_type": "markdown", + "id": "b6aec097-c633-4017-a125-e77a97259cda", + "metadata": { + "id": "b6aec097-c633-4017-a125-e77a97259cda" + }, + "source": [ + "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", + "\n", + "*Hint:*\n", + "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", + "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", + "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + ] + }, + { + "cell_type": "markdown", + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", + "metadata": { + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" + }, + "source": [ + "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", + "\n", + "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + ] + }, + { + "cell_type": "markdown", + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", + "metadata": { + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" + }, + "source": [ + "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "449513f4-0459-46a0-a18d-9398d974c9ad", + "metadata": { + "id": "449513f4-0459-46a0-a18d-9398d974c9ad" + }, + "outputs": [ { - "cell_type": "markdown", - "id": "31969215-2a90-4d8b-ac36-646a7ae13744", - "metadata": { - "id": "31969215-2a90-4d8b-ac36-646a7ae13744" - }, - "source": [ - "# Lab | Data Aggregation and Filtering" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGender...Number of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeVehicle Type
00DK49336Arizona4809.216960NoBasicCollege2/18/11EmployedM...0.09Corporate AutoCorporate L3Offer3Agent292.800000Four-Door CarMedsizeNaN
11KX64629California2228.525238NoBasicCollege1/18/11UnemployedF...0.01Personal AutoPersonal L3Offer4Call Center744.924331Four-Door CarMedsizeNaN
22LZ68649Washington14947.917300NoBasicBachelor2/10/11EmployedM...0.02Personal AutoPersonal L3Offer3Call Center480.000000SUVMedsizeA
33XL78013Oregon22332.439460YesExtendedCollege1/11/11EmployedM...0.02Corporate AutoCorporate L3Offer2Branch484.013411Four-Door CarMedsizeA
44QA50777Oregon9025.067525NoPremiumBachelor1/17/11Medical LeaveF...NaN7Personal AutoPersonal L2Offer1Branch707.925645Four-Door CarMedsizeNaN
..................................................................
1090510905FE99816Nevada15563.369440NoPremiumBachelor1/19/11UnemployedF...NaN7Personal AutoPersonal L1Offer3Web1214.400000Luxury CarMedsizeA
1090610906KX53892Oregon5259.444853NoBasicCollege1/6/11EmployedF...0.06Personal AutoPersonal L3Offer2Branch273.018929Four-Door CarMedsizeA
1090710907TL39050Arizona23893.304100NoExtendedBachelor2/6/11EmployedF...0.02Corporate AutoCorporate L3Offer1Web381.306996Luxury SUVMedsizeNaN
1090810908WA60547California11971.977650NoPremiumCollege2/13/11EmployedF...4.06Personal AutoPersonal L1Offer1Branch618.288849SUVMedsizeA
1090910909IV32877NaN6857.519928NaNBasicBachelor1/8/11UnemployedM...0.03Personal AutoPersonal L1Offer4Web1021.719397SUVMedsizeNaN
\n", + "

10910 rows × 26 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "0 0 DK49336 Arizona 4809.216960 No \n", + "1 1 KX64629 California 2228.525238 No \n", + "2 2 LZ68649 Washington 14947.917300 No \n", + "3 3 XL78013 Oregon 22332.439460 Yes \n", + "4 4 QA50777 Oregon 9025.067525 No \n", + "... ... ... ... ... ... \n", + "10905 10905 FE99816 Nevada 15563.369440 No \n", + "10906 10906 KX53892 Oregon 5259.444853 No \n", + "10907 10907 TL39050 Arizona 23893.304100 No \n", + "10908 10908 WA60547 California 11971.977650 No \n", + "10909 10909 IV32877 NaN 6857.519928 NaN \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus Gender ... \\\n", + "0 Basic College 2/18/11 Employed M ... \n", + "1 Basic College 1/18/11 Unemployed F ... \n", + "2 Basic Bachelor 2/10/11 Employed M ... \n", + "3 Extended College 1/11/11 Employed M ... \n", + "4 Premium Bachelor 1/17/11 Medical Leave F ... \n", + "... ... ... ... ... ... ... \n", + "10905 Premium Bachelor 1/19/11 Unemployed F ... \n", + "10906 Basic College 1/6/11 Employed F ... \n", + "10907 Extended Bachelor 2/6/11 Employed F ... \n", + "10908 Premium College 2/13/11 Employed F ... \n", + "10909 Basic Bachelor 1/8/11 Unemployed M ... \n", + "\n", + " Number of Open Complaints Number of Policies Policy Type \\\n", + "0 0.0 9 Corporate Auto \n", + "1 0.0 1 Personal Auto \n", + "2 0.0 2 Personal Auto \n", + "3 0.0 2 Corporate Auto \n", + "4 NaN 7 Personal Auto \n", + "... ... ... ... \n", + "10905 NaN 7 Personal Auto \n", + "10906 0.0 6 Personal Auto \n", + "10907 0.0 2 Corporate Auto \n", + "10908 4.0 6 Personal Auto \n", + "10909 0.0 3 Personal Auto \n", + "\n", + " Policy Renew Offer Type Sales Channel Total Claim Amount \\\n", + "0 Corporate L3 Offer3 Agent 292.800000 \n", + "1 Personal L3 Offer4 Call Center 744.924331 \n", + "2 Personal L3 Offer3 Call Center 480.000000 \n", + "3 Corporate L3 Offer2 Branch 484.013411 \n", + "4 Personal L2 Offer1 Branch 707.925645 \n", + "... ... ... ... ... \n", + "10905 Personal L1 Offer3 Web 1214.400000 \n", + "10906 Personal L3 Offer2 Branch 273.018929 \n", + "10907 Corporate L3 Offer1 Web 381.306996 \n", + "10908 Personal L1 Offer1 Branch 618.288849 \n", + "10909 Personal L1 Offer4 Web 1021.719397 \n", + "\n", + " Vehicle Class Vehicle Size Vehicle Type \n", + "0 Four-Door Car Medsize NaN \n", + "1 Four-Door Car Medsize NaN \n", + "2 SUV Medsize A \n", + "3 Four-Door Car Medsize A \n", + "4 Four-Door Car Medsize NaN \n", + "... ... ... ... \n", + "10905 Luxury Car Medsize A \n", + "10906 Four-Door Car Medsize A \n", + "10907 Luxury SUV Medsize NaN \n", + "10908 SUV Medsize A \n", + "10909 SUV Medsize NaN \n", + "\n", + "[10910 rows x 26 columns]" ] - }, + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code goes here\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "file_path = \"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\"\n", + "\n", + "df = pd.read_csv(file_path)\n", + "df\n", + "# df.info()\n", + "# df.dtypes\n", + "# df.nunique()\n", + "# df.describe()\n", + "# display(df.isna().sum())\n", + "# display((df.isna().sum()/df.shape[0])*100)\n", + "# df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b340f67e-2d77-4f09-b134-66f3d05c5ed0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 10910 entries, 0 to 10909\n", + "Data columns (total 26 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Unnamed: 0 10910 non-null int64 \n", + " 1 Customer 10910 non-null object \n", + " 2 State 10279 non-null object \n", + " 3 Customer Lifetime Value 10910 non-null float64\n", + " 4 Response 10279 non-null object \n", + " 5 Coverage 10910 non-null object \n", + " 6 Education 10910 non-null object \n", + " 7 Effective To Date 10910 non-null object \n", + " 8 EmploymentStatus 10910 non-null object \n", + " 9 Gender 10910 non-null object \n", + " 10 Income 10910 non-null int64 \n", + " 11 Location Code 10910 non-null object \n", + " 12 Marital Status 10910 non-null object \n", + " 13 Monthly Premium Auto 10910 non-null int64 \n", + " 14 Months Since Last Claim 10277 non-null float64\n", + " 15 Months Since Policy Inception 10910 non-null int64 \n", + " 16 Number of Open Complaints 10277 non-null float64\n", + " 17 Number of Policies 10910 non-null int64 \n", + " 18 Policy Type 10910 non-null object \n", + " 19 Policy 10910 non-null object \n", + " 20 Renew Offer Type 10910 non-null object \n", + " 21 Sales Channel 10910 non-null object \n", + " 22 Total Claim Amount 10910 non-null float64\n", + " 23 Vehicle Class 10288 non-null object \n", + " 24 Vehicle Size 10288 non-null object \n", + " 25 Vehicle Type 5428 non-null object \n", + "dtypes: float64(4), int64(5), object(17)\n", + "memory usage: 2.2+ MB\n" + ] + } + ], + "source": [ + "df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "f3b778ba-e6ca-4004-b4de-bf783e193bb5", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", - "metadata": { - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" - }, - "source": [ - "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", - "\n", - "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", - "\n", - "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + "data": { + "text/plain": [ + "Unnamed: 0 10910\n", + "Customer 9134\n", + "State 5\n", + "Customer Lifetime Value 8041\n", + "Response 2\n", + "Coverage 3\n", + "Education 5\n", + "Effective To Date 59\n", + "EmploymentStatus 5\n", + "Gender 2\n", + "Income 5694\n", + "Location Code 3\n", + "Marital Status 3\n", + "Monthly Premium Auto 202\n", + "Months Since Last Claim 36\n", + "Months Since Policy Inception 100\n", + "Number of Open Complaints 6\n", + "Number of Policies 9\n", + "Policy Type 3\n", + "Policy 9\n", + "Renew Offer Type 4\n", + "Sales Channel 4\n", + "Total Claim Amount 5106\n", + "Vehicle Class 6\n", + "Vehicle Size 3\n", + "Vehicle Type 1\n", + "Month 2\n", + "dtype: int64" ] - }, + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.nunique()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "d4566702-f6a9-414a-a72a-48b880a215f3", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", - "metadata": { - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" - }, - "source": [ - "1. Create a new DataFrame that only includes customers who:\n", - " - have a **low total_claim_amount** (e.g., below $1,000),\n", - " - have a response \"Yes\" to the last marketing campaign." + "data": { + "text/plain": [ + "Unnamed: 0 int64\n", + "Customer object\n", + "State object\n", + "Customer Lifetime Value float64\n", + "Response object\n", + "Coverage object\n", + "Education object\n", + "Effective To Date object\n", + "EmploymentStatus object\n", + "Gender object\n", + "Income int64\n", + "Location Code object\n", + "Marital Status object\n", + "Monthly Premium Auto int64\n", + "Months Since Last Claim float64\n", + "Months Since Policy Inception int64\n", + "Number of Open Complaints float64\n", + "Number of Policies int64\n", + "Policy Type object\n", + "Policy object\n", + "Renew Offer Type object\n", + "Sales Channel object\n", + "Total Claim Amount float64\n", + "Vehicle Class object\n", + "Vehicle Size object\n", + "Vehicle Type object\n", + "dtype: object" ] - }, + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "e5712570-100d-482f-b987-5527707dd9e2", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", - "metadata": { - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" - }, - "source": [ - "2. Using the original Dataframe, analyze:\n", - " - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded \"Yes\", and\n", - " - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0Customer Lifetime ValueIncomeMonthly Premium AutoMonths Since Last ClaimMonths Since Policy InceptionNumber of Open ComplaintsNumber of PoliciesTotal Claim Amount
count10910.00000010910.00000010910.00000010910.00000010277.00000010910.00000010277.00000010910.00000010910.000000
mean5454.5000008018.24109437536.28478593.19605915.14907148.0919340.3842562.979193434.888330
std3149.5900536885.08143430359.19567034.44253210.08034927.9406750.9124572.399359292.180556
min0.0000001898.0076750.00000061.0000000.0000000.0000000.0000001.0000000.099007
25%2727.2500004014.4531130.00000068.0000006.00000024.0000000.0000001.000000271.082527
50%5454.5000005771.14723533813.50000083.00000014.00000048.0000000.0000002.000000382.564630
75%8181.7500008992.77913762250.750000109.00000023.00000071.0000000.0000004.000000547.200000
max10909.00000083325.38119099981.000000298.00000035.00000099.0000005.0000009.0000002893.239678
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer Lifetime Value Income \\\n", + "count 10910.000000 10910.000000 10910.000000 \n", + "mean 5454.500000 8018.241094 37536.284785 \n", + "std 3149.590053 6885.081434 30359.195670 \n", + "min 0.000000 1898.007675 0.000000 \n", + "25% 2727.250000 4014.453113 0.000000 \n", + "50% 5454.500000 5771.147235 33813.500000 \n", + "75% 8181.750000 8992.779137 62250.750000 \n", + "max 10909.000000 83325.381190 99981.000000 \n", + "\n", + " Monthly Premium Auto Months Since Last Claim \\\n", + "count 10910.000000 10277.000000 \n", + "mean 93.196059 15.149071 \n", + "std 34.442532 10.080349 \n", + "min 61.000000 0.000000 \n", + "25% 68.000000 6.000000 \n", + "50% 83.000000 14.000000 \n", + "75% 109.000000 23.000000 \n", + "max 298.000000 35.000000 \n", + "\n", + " Months Since Policy Inception Number of Open Complaints \\\n", + "count 10910.000000 10277.000000 \n", + "mean 48.091934 0.384256 \n", + "std 27.940675 0.912457 \n", + "min 0.000000 0.000000 \n", + "25% 24.000000 0.000000 \n", + "50% 48.000000 0.000000 \n", + "75% 71.000000 0.000000 \n", + "max 99.000000 5.000000 \n", + "\n", + " Number of Policies Total Claim Amount \n", + "count 10910.000000 10910.000000 \n", + "mean 2.979193 434.888330 \n", + "std 2.399359 292.180556 \n", + "min 1.000000 0.099007 \n", + "25% 1.000000 271.082527 \n", + "50% 2.000000 382.564630 \n", + "75% 4.000000 547.200000 \n", + "max 9.000000 2893.239678 " ] - }, + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "d5e8b4c8-0f90-4ee7-b19e-02c484101bc6", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", - "metadata": { - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" - }, - "source": [ - "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0Customer Lifetime ValueIncomeMonthly Premium AutoMonths Since Last ClaimMonths Since Policy InceptionNumber of Open ComplaintsNumber of PoliciesTotal Claim Amount
count1399.0000001399.0000001399.0000001399.0000001324.0000001399.0000001324.0000001399.0000001399.000000
mean5369.2930667709.16653038483.35025088.72051514.64048347.4231590.3640482.894210412.134087
std3138.6938256261.52548528191.66486122.6425189.78131526.7977040.9276672.424591180.460971
min3.0000002004.3506660.00000061.0000000.0000000.0000000.0000001.0000007.345946
25%2627.0000003936.41018218495.00000067.0000006.00000025.0000000.0000001.000000312.000000
50%5423.0000005548.03189233190.00000084.00000014.00000050.0000000.0000002.000000398.502948
75%7933.5000009031.21485960920.000000107.00000022.00000068.0000000.0000003.000000528.200860
max10897.00000041787.90343099845.000000154.00000035.00000099.0000005.0000009.000000960.115399
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer Lifetime Value Income \\\n", + "count 1399.000000 1399.000000 1399.000000 \n", + "mean 5369.293066 7709.166530 38483.350250 \n", + "std 3138.693825 6261.525485 28191.664861 \n", + "min 3.000000 2004.350666 0.000000 \n", + "25% 2627.000000 3936.410182 18495.000000 \n", + "50% 5423.000000 5548.031892 33190.000000 \n", + "75% 7933.500000 9031.214859 60920.000000 \n", + "max 10897.000000 41787.903430 99845.000000 \n", + "\n", + " Monthly Premium Auto Months Since Last Claim \\\n", + "count 1399.000000 1324.000000 \n", + "mean 88.720515 14.640483 \n", + "std 22.642518 9.781315 \n", + "min 61.000000 0.000000 \n", + "25% 67.000000 6.000000 \n", + "50% 84.000000 14.000000 \n", + "75% 107.000000 22.000000 \n", + "max 154.000000 35.000000 \n", + "\n", + " Months Since Policy Inception Number of Open Complaints \\\n", + "count 1399.000000 1324.000000 \n", + "mean 47.423159 0.364048 \n", + "std 26.797704 0.927667 \n", + "min 0.000000 0.000000 \n", + "25% 25.000000 0.000000 \n", + "50% 50.000000 0.000000 \n", + "75% 68.000000 0.000000 \n", + "max 99.000000 5.000000 \n", + "\n", + " Number of Policies Total Claim Amount \n", + "count 1399.000000 1399.000000 \n", + "mean 2.894210 412.134087 \n", + "std 2.424591 180.460971 \n", + "min 1.000000 7.345946 \n", + "25% 1.000000 312.000000 \n", + "50% 2.000000 398.502948 \n", + "75% 3.000000 528.200860 \n", + "max 9.000000 960.115399 " ] - }, + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# DataFrame for customers with low total claim and \"Yes\" response\n", + "# Filter for total claim amount < 1000 and Response == \"Yes\"\n", + "filtered_df = df[(df[\"Total Claim Amount\"] < 1000) & (df[\"Response\"] == \"Yes\")]\n", + "display(filtered_df.describe())" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a1124b90-411d-4d64-a067-c2f1f3b9f49e", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", - "metadata": { - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" - }, - "source": [ - "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Policy TypeGenderMonthly Premium AutoCustomer Lifetime ValueTotal Claim Amount
0Corporate AutoF94.3017757712.628736433.738499
1Corporate AutoM92.1883127944.465414408.582459
2Personal AutoF98.9981488339.791842452.965929
3Personal AutoM91.0858217448.383281457.010178
4Special AutoF92.3142867691.584111453.280164
5Special AutoM86.3437508247.088702429.527942
\n", + "
" + ], + "text/plain": [ + " Policy Type Gender Monthly Premium Auto Customer Lifetime Value \\\n", + "0 Corporate Auto F 94.301775 7712.628736 \n", + "1 Corporate Auto M 92.188312 7944.465414 \n", + "2 Personal Auto F 98.998148 8339.791842 \n", + "3 Personal Auto M 91.085821 7448.383281 \n", + "4 Special Auto F 92.314286 7691.584111 \n", + "5 Special Auto M 86.343750 8247.088702 \n", + "\n", + " Total Claim Amount \n", + "0 433.738499 \n", + "1 408.582459 \n", + "2 452.965929 \n", + "3 457.010178 \n", + "4 453.280164 \n", + "5 429.527942 " ] + }, + "metadata": {}, + "output_type": "display_data" }, { - "cell_type": "markdown", - "id": "b42999f9-311f-481e-ae63-40a5577072c5", - "metadata": { - "id": "b42999f9-311f-481e-ae63-40a5577072c5" - }, - "source": [ - "## Bonus" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
meanmedianminmax
Total Claim AmountTotal Claim AmountTotal Claim AmountTotal Claim Amount
Policy TypeGender
Corporate AutoF433.738499420.0419817.3459461358.400000
M408.582459364.80000057.7129851324.800000
Personal AutoF452.965929424.3301667.3459461358.400000
M457.010178412.80000057.7129851324.800000
Special AutoF453.280164420.04198156.6033301358.400000
M429.527942345.60000086.4615821027.000029
\n", + "
" + ], + "text/plain": [ + " mean median \\\n", + " Total Claim Amount Total Claim Amount \n", + "Policy Type Gender \n", + "Corporate Auto F 433.738499 420.041981 \n", + " M 408.582459 364.800000 \n", + "Personal Auto F 452.965929 424.330166 \n", + " M 457.010178 412.800000 \n", + "Special Auto F 453.280164 420.041981 \n", + " M 429.527942 345.600000 \n", + "\n", + " min max \n", + " Total Claim Amount Total Claim Amount \n", + "Policy Type Gender \n", + "Corporate Auto F 7.345946 1358.400000 \n", + " M 57.712985 1324.800000 \n", + "Personal Auto F 7.345946 1358.400000 \n", + " M 57.712985 1324.800000 \n", + "Special Auto F 56.603330 1358.400000 \n", + " M 86.461582 1027.000029 " ] - }, + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Subset for customers who responded \"Yes\"\n", + "yes_df = df[df[\"Response\"] == \"Yes\"]\n", + "\n", + "# Group by Policy Type and Gender, calculate means\n", + "agg_results = yes_df.groupby([\"Policy Type\", \"Gender\"]).agg({\n", + " \"Monthly Premium Auto\": \"mean\",\n", + " \"Customer Lifetime Value\": \"mean\",\n", + " \"Total Claim Amount\": \"mean\"\n", + "}).reset_index()\n", + "\n", + "display(agg_results)\n", + "\n", + "# Optional: Compare claim amount patterns\n", + "pivot_claims = yes_df.pivot_table(index=[\"Policy Type\", \"Gender\"], values=\"Total Claim Amount\", aggfunc=[\"mean\", \"median\", \"min\", \"max\"])\n", + "display(pivot_claims)\n", + "\n", + "# Discussion:\n", + "# - Segments with higher average lifetime value and lower claim amounts are most profitable/low-risk." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "9fa29403-98ba-4cb4-835a-325b2006ef24", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "81ff02c5-6584-4f21-a358-b918697c6432", - "metadata": { - "id": "81ff02c5-6584-4f21-a358-b918697c6432" - }, - "source": [ - "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + "data": { + "text/plain": [ + "State\n", + "California 3552\n", + "Oregon 2909\n", + "Arizona 1937\n", + "Nevada 993\n", + "Washington 888\n", + "Name: count, dtype: int64" ] + }, + "metadata": {}, + "output_type": "display_data" }, { - "cell_type": "markdown", - "id": "b6aec097-c633-4017-a125-e77a97259cda", - "metadata": { - "id": "b6aec097-c633-4017-a125-e77a97259cda" - }, - "source": [ - "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", - "\n", - "*Hint:*\n", - "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", - "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", - "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" - ] + "name": "stdout", + "output_type": "stream", + "text": [ + "---------------------------\n" + ] }, { - "cell_type": "markdown", - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", - "metadata": { - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" - }, - "source": [ - "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", - "\n", - "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + "data": { + "text/plain": [ + "State\n", + "California 3552\n", + "Oregon 2909\n", + "Arizona 1937\n", + "Nevada 993\n", + "Washington 888\n", + "Name: count, dtype: int64" ] + }, + "metadata": {}, + "output_type": "display_data" }, { - "cell_type": "markdown", - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", - "metadata": { - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" - }, - "source": [ - "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + "name": "stdout", + "output_type": "stream", + "text": [ + "---------------------------\n" + ] + } + ], + "source": [ + "# Number of policies per state, only states with >500 customers\n", + "# Count policies by state\n", + "state_counts = df[\"State\"].value_counts()\n", + "display(state_counts)\n", + "print('---------------------------')\n", + "states_over_500 = state_counts[state_counts > 500]\n", + "display(states_over_500)\n", + "print('---------------------------')\n", + "# Optionally, show filtered DataFrame\n", + "# df_states_500 = df[df[\"State\"].isin(states_over_500.index)]\n", + "# display(df_states_500[\"State\"].value_counts())" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "caecdc93-d1e3-4592-b913-be416156f7a5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EducationGendermaxminmedian
0BachelorF73225.956521904.0008525640.505303
1BachelorM67907.270501898.0076755548.031892
2CollegeF61850.188031898.6836865623.611187
3CollegeM61134.683071918.1197006005.847375
4DoctorF44856.113972395.5700005332.462694
5DoctorM32677.342842267.6040385577.669457
6High School or BelowF55277.445892144.9215356039.553187
7High School or BelowM83325.381191940.9812216286.731006
8MasterF51016.067042417.7770325729.855012
9MasterM50568.259122272.3073105579.099207
\n", + "
" + ], + "text/plain": [ + " Education Gender max min median\n", + "0 Bachelor F 73225.95652 1904.000852 5640.505303\n", + "1 Bachelor M 67907.27050 1898.007675 5548.031892\n", + "2 College F 61850.18803 1898.683686 5623.611187\n", + "3 College M 61134.68307 1918.119700 6005.847375\n", + "4 Doctor F 44856.11397 2395.570000 5332.462694\n", + "5 Doctor M 32677.34284 2267.604038 5577.669457\n", + "6 High School or Below F 55277.44589 2144.921535 6039.553187\n", + "7 High School or Below M 83325.38119 1940.981221 6286.731006\n", + "8 Master F 51016.06704 2417.777032 5729.855012\n", + "9 Master M 50568.25912 2272.307310 5579.099207" ] - }, + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "clv_stats = df.groupby([\"Education\", \"Gender\"])[\"Customer Lifetime Value\"].agg([\"max\", \"min\", \"median\"]).reset_index()\n", + "display(clv_stats)\n", + "\n", + "# Conclusion:\n", + "# - \"Bachelor\" educated females have the highest max CLV.\n", + "# - The median CLV is generally higher for females across most education levels, indicating better long-term value.\n", + "# - Segments with high median and low minimum CLV can be targeted for premium policies." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "2804c565-f758-4225-a5eb-3273afae83d1", + "metadata": {}, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "id": "449513f4-0459-46a0-a18d-9398d974c9ad", - "metadata": { - "id": "449513f4-0459-46a0-a18d-9398d974c9ad" - }, - "outputs": [], - "source": [ - "# your code goes here" + "data": { + "image/png": "", + "text/plain": [ + "
" ] + }, + "metadata": {}, + "output_type": "display_data" } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "\n", + "plt.figure(figsize=(12, 6))\n", + "sns.barplot(x=\"Education\", y=\"median\", hue=\"Gender\", data=clv_stats)\n", + "plt.title('Median Customer Lifetime Value by Education Level and Gender')\n", + "plt.ylabel('Median CLV')\n", + "plt.xlabel('Education Level')\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "06b4a600-4199-47c7-aa52-af4b5a5154a9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MonthFebruaryJanuary
State
Arizona9291008
California16341918
Nevada442551
Oregon13441565
Washington425463
\n", + "
" + ], + "text/plain": [ + "Month February January\n", + "State \n", + "Arizona 929 1008\n", + "California 1634 1918\n", + "Nevada 442 551\n", + "Oregon 1344 1565\n", + "Washington 425 463" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Policies Sold by State and Month (Pivot Table)\n", + "# Ensure Effective To Date is datetime type\n", + "df[\"Effective To Date\"] = pd.to_datetime(df[\"Effective To Date\"], format=\"%d %m %Y\", errors='coerce')\n", + "df[\"Month\"] = df[\"Effective To Date\"].dt.month_name()\n", + "\n", + "# Group by State and Month, count number of policies sold\n", + "policies_by_state_month = df.groupby([\"State\", \"Month\"])[\"Policy\"].count().reset_index()\n", + "# display(policies_by_state_month)\n", + "\n", + "# Pivot: States as rows, Months as columns, values = number of policies\n", + "pivot_table = policies_by_state_month.pivot(index=\"State\", columns=\"Month\", values=\"Policy\").fillna(0)\n", + "display(pivot_table)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "5694b98c-189d-486e-84b5-1f3aa91c5b13", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MonthFebruaryJanuary
State
Arizona9291008
California16341918
Oregon13441565
\n", + "
" + ], + "text/plain": [ + "Month February January\n", + "State \n", + "Arizona 929 1008\n", + "California 1634 1918\n", + "Oregon 1344 1565" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Policies Sold by Month for Top 3 States\n", + "# Total policies sold by state\n", + "state_totals = df.groupby(\"State\")[\"Policy\"].count().sort_values(ascending=False)\n", + "top_3_states = state_totals.head(3).index\n", + "# display(top_3_states)\n", + "\n", + "# Filter original group for only top 3 states\n", + "top3_policies = policies_by_state_month[policies_by_state_month[\"State\"].isin(top_3_states)]\n", + "# display(top3_policies )\n", + "\n", + "# Pivot for top 3 states only\n", + "top3_pivot = top3_policies.pivot(index=\"State\", columns=\"Month\", values=\"Policy\").fillna(0)\n", + "display(top3_pivot)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "cdaf043f-4b67-4726-9926-09ba10d32631", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Sales ChannelResponse Rate Yes
0Agent0.190746
1Branch0.113787
2Call Center0.109786
3Web0.117141
\n", + "
" + ], + "text/plain": [ + " Sales Channel Response Rate Yes\n", + "0 Agent 0.190746\n", + "1 Branch 0.113787\n", + "2 Call Center 0.109786\n", + "3 Web 0.117141" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Customer Response Rate by Marketing Channel\n", + "# Calculate response rate (Yes) by Sales Channel\n", + "response_rate = df.groupby(\"Sales Channel\")[\"Response\"].value_counts(normalize=True).unstack().fillna(0)\n", + "response_rate_yes = response_rate.get(\"Yes\", pd.Series())\n", + "response_rate_yes = response_rate_yes.reset_index().rename(columns={ \"Yes\": \"Response Rate Yes\" })\n", + "\n", + "display(response_rate_yes)\n", + "\n", + "# Or, melt for visualization\n", + "# melted = df[df[\"Response\"] == \"Yes\"].groupby(\"Sales Channel\").size().reset_index(name=\"Yes Responses\")\n", + "# total_by_channel = df.groupby(\"Sales Channel\").size().reset_index(name=\"Total\")\n", + "# merged = pd.merge(melted, total_by_channel, on=\"Sales Channel\")\n", + "# merged[\"Response Rate Yes\"] = merged[\"Yes Responses\"] / merged[\"Total\"]\n", + "\n", + "# display(merged[[\"Sales Channel\", \"Response Rate Yes\"]])" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "34ed0be6-edb0-4845-9bd1-a54e4e821b96", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" } + ], + "source": [ + "plt.figure(figsize=(10,6))\n", + "sns.heatmap(top3_pivot, annot=True, fmt=\".0f\", cmap=\"Blues\")\n", + "plt.title(\"Policies Sold by Month for Top 3 States\")\n", + "plt.ylabel(\"State\")\n", + "plt.xlabel(\"Month\")\n", + "plt.show()\n", + "\n", + "plt.figure(figsize=(8,5))\n", + "sns.barplot(x=\"Sales Channel\", y=\"Response Rate Yes\", data=merged)\n", + "plt.title(\"Customer Response Rate by Marketing Channel\")\n", + "plt.ylabel(\"Response Rate (Yes)\")\n", + "plt.xlabel(\"Sales Channel\")\n", + "plt.ylim(0,1)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c68df5c1-7e3d-4c2d-b4b6-a113f43abfd2", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 5 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 }