diff --git a/lab-dw-aggregating.ipynb b/lab-dw-aggregating.ipynb index fadd718..93fcb36 100644 --- a/lab-dw-aggregating.ipynb +++ b/lab-dw-aggregating.ipynb @@ -1,165 +1,2491 @@ { - "cells": [ + "cells": [ + { + "cell_type": "markdown", + "id": "31969215-2a90-4d8b-ac36-646a7ae13744", + "metadata": { + "id": "31969215-2a90-4d8b-ac36-646a7ae13744" + }, + "source": [ + "# Lab | Data Aggregation and Filtering" + ] + }, + { + "cell_type": "markdown", + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", + "metadata": { + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" + }, + "source": [ + "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", + "\n", + "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", + "\n", + "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + ] + }, + { + "cell_type": "markdown", + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", + "metadata": { + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" + }, + "source": [ + "1. Create a new DataFrame that only includes customers who:\n", + " - have a **low total_claim_amount** (e.g., below $1,000),\n", + " - have a response \"Yes\" to the last marketing campaign." + ] + }, + { + "cell_type": "markdown", + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", + "metadata": { + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" + }, + "source": [ + "2. Using the original Dataframe, analyze:\n", + " - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded \"Yes\", and\n", + " - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company." + ] + }, + { + "cell_type": "markdown", + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", + "metadata": { + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" + }, + "source": [ + "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + ] + }, + { + "cell_type": "markdown", + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", + "metadata": { + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" + }, + "source": [ + "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + ] + }, + { + "cell_type": "markdown", + "id": "b42999f9-311f-481e-ae63-40a5577072c5", + "metadata": { + "id": "b42999f9-311f-481e-ae63-40a5577072c5" + }, + "source": [ + "## Bonus" + ] + }, + { + "cell_type": "markdown", + "id": "81ff02c5-6584-4f21-a358-b918697c6432", + "metadata": { + "id": "81ff02c5-6584-4f21-a358-b918697c6432" + }, + "source": [ + "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + ] + }, + { + "cell_type": "markdown", + "id": "b6aec097-c633-4017-a125-e77a97259cda", + "metadata": { + "id": "b6aec097-c633-4017-a125-e77a97259cda" + }, + "source": [ + "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", + "\n", + "*Hint:*\n", + "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", + "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", + "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + ] + }, + { + "cell_type": "markdown", + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", + "metadata": { + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" + }, + "source": [ + "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", + "\n", + "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + ] + }, + { + "cell_type": "markdown", + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", + "metadata": { + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" + }, + "source": [ + "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "449513f4-0459-46a0-a18d-9398d974c9ad", + "metadata": { + "id": "449513f4-0459-46a0-a18d-9398d974c9ad" + }, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "453bb8b0-5abf-4cbd-a9a3-80449cbd69ca", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "31969215-2a90-4d8b-ac36-646a7ae13744", - "metadata": { - "id": "31969215-2a90-4d8b-ac36-646a7ae13744" - }, - "source": [ - "# Lab | Data Aggregation and Filtering" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGender...Number of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeVehicle Type
00DK49336Arizona4809.216960NoBasicCollege2/18/11EmployedM...0.09Corporate AutoCorporate L3Offer3Agent292.800000Four-Door CarMedsizeNaN
11KX64629California2228.525238NoBasicCollege1/18/11UnemployedF...0.01Personal AutoPersonal L3Offer4Call Center744.924331Four-Door CarMedsizeNaN
22LZ68649Washington14947.917300NoBasicBachelor2/10/11EmployedM...0.02Personal AutoPersonal L3Offer3Call Center480.000000SUVMedsizeA
33XL78013Oregon22332.439460YesExtendedCollege1/11/11EmployedM...0.02Corporate AutoCorporate L3Offer2Branch484.013411Four-Door CarMedsizeA
44QA50777Oregon9025.067525NoPremiumBachelor1/17/11Medical LeaveF...NaN7Personal AutoPersonal L2Offer1Branch707.925645Four-Door CarMedsizeNaN
\n", + "

5 rows × 26 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "0 0 DK49336 Arizona 4809.216960 No \n", + "1 1 KX64629 California 2228.525238 No \n", + "2 2 LZ68649 Washington 14947.917300 No \n", + "3 3 XL78013 Oregon 22332.439460 Yes \n", + "4 4 QA50777 Oregon 9025.067525 No \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus Gender ... \\\n", + "0 Basic College 2/18/11 Employed M ... \n", + "1 Basic College 1/18/11 Unemployed F ... \n", + "2 Basic Bachelor 2/10/11 Employed M ... \n", + "3 Extended College 1/11/11 Employed M ... \n", + "4 Premium Bachelor 1/17/11 Medical Leave F ... \n", + "\n", + " Number of Open Complaints Number of Policies Policy Type Policy \\\n", + "0 0.0 9 Corporate Auto Corporate L3 \n", + "1 0.0 1 Personal Auto Personal L3 \n", + "2 0.0 2 Personal Auto Personal L3 \n", + "3 0.0 2 Corporate Auto Corporate L3 \n", + "4 NaN 7 Personal Auto Personal L2 \n", + "\n", + " Renew Offer Type Sales Channel Total Claim Amount Vehicle Class \\\n", + "0 Offer3 Agent 292.800000 Four-Door Car \n", + "1 Offer4 Call Center 744.924331 Four-Door Car \n", + "2 Offer3 Call Center 480.000000 SUV \n", + "3 Offer2 Branch 484.013411 Four-Door Car \n", + "4 Offer1 Branch 707.925645 Four-Door Car \n", + "\n", + " Vehicle Size Vehicle Type \n", + "0 Medsize NaN \n", + "1 Medsize NaN \n", + "2 Medsize A \n", + "3 Medsize A \n", + "4 Medsize NaN \n", + "\n", + "[5 rows x 26 columns]" ] - }, + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "url = \"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\"\n", + "df = pd.read_csv(url)\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "d394381a-9844-464e-90ea-c59fbd8e43f1", + "metadata": {}, + "outputs": [], + "source": [ + "# 1. Create a new DataFrame that only includes customers who:\n", + "#have a low total_claim_amount (e.g., below $1,000),\n", + "#have a response \"Yes\" to the last marketing campaign." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "7d086474-204e-495c-8e2f-18c098150c0f", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", - "metadata": { - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" - }, - "source": [ - "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", - "\n", - "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", - "\n", - "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + "data": { + "text/plain": [ + "Index(['Unnamed: 0', 'Customer', 'State', 'Customer Lifetime Value',\n", + " 'Response', 'Coverage', 'Education', 'Effective To Date',\n", + " 'EmploymentStatus', 'Gender', 'Income', 'Location Code',\n", + " 'Marital Status', 'Monthly Premium Auto', 'Months Since Last Claim',\n", + " 'Months Since Policy Inception', 'Number of Open Complaints',\n", + " 'Number of Policies', 'Policy Type', 'Policy', 'Renew Offer Type',\n", + " 'Sales Channel', 'Total Claim Amount', 'Vehicle Class', 'Vehicle Size',\n", + " 'Vehicle Type'],\n", + " dtype='object')" ] - }, + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "52527928-33a1-4ce7-b3a2-e86b2eadd2c2", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", - "metadata": { - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" - }, - "source": [ - "1. Create a new DataFrame that only includes customers who:\n", - " - have a **low total_claim_amount** (e.g., below $1,000),\n", - " - have a response \"Yes\" to the last marketing campaign." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ResponseTotal Claim Amount
0No292.800000
1No744.924331
2No480.000000
3Yes484.013411
4No707.925645
.........
10905No1214.400000
10906No273.018929
10907No381.306996
10908No618.288849
10909NaN1021.719397
\n", + "

10910 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Response Total Claim Amount\n", + "0 No 292.800000\n", + "1 No 744.924331\n", + "2 No 480.000000\n", + "3 Yes 484.013411\n", + "4 No 707.925645\n", + "... ... ...\n", + "10905 No 1214.400000\n", + "10906 No 273.018929\n", + "10907 No 381.306996\n", + "10908 No 618.288849\n", + "10909 NaN 1021.719397\n", + "\n", + "[10910 rows x 2 columns]" ] - }, + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_reponse_complaints = df[[\"Response\",\"Total Claim Amount\"]].copy()\n", + "df_reponse_complaints" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "e00af658-7e73-40d2-a44e-fdbb92f94bc4", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", - "metadata": { - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" - }, - "source": [ - "2. Using the original Dataframe, analyze:\n", - " - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded \"Yes\", and\n", - " - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company." + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 10910 entries, 0 to 10909\n", + "Data columns (total 2 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Response 10279 non-null object \n", + " 1 Total Claim Amount 10910 non-null float64\n", + "dtypes: float64(1), object(1)\n", + "memory usage: 170.6+ KB\n" + ] + } + ], + "source": [ + "df_reponse_complaints.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "07355d6d-1182-48b5-ae35-d1fe3486e3a1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ResponseTotal Claim Amount
3Yes484.013411
8Yes739.200000
15Yes547.200000
19Yes19.575683
27Yes60.036683
.........
10844Yes547.200000
10852Yes791.878042
10872Yes547.200000
10887Yes528.200860
10897Yes158.077504
\n", + "

1466 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Response Total Claim Amount\n", + "3 Yes 484.013411\n", + "8 Yes 739.200000\n", + "15 Yes 547.200000\n", + "19 Yes 19.575683\n", + "27 Yes 60.036683\n", + "... ... ...\n", + "10844 Yes 547.200000\n", + "10852 Yes 791.878042\n", + "10872 Yes 547.200000\n", + "10887 Yes 528.200860\n", + "10897 Yes 158.077504\n", + "\n", + "[1466 rows x 2 columns]" ] - }, + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_reponse_complaintsd[df_reponse_complaints.Response == 'Yes']" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "e915dc34-f3aa-4fcf-9432-c2563adef407", + "metadata": {}, + "outputs": [], + "source": [ + "df_filter = df_reponse_complaints[(df_reponse_complaints[\"Total Claim Amount\"] < 1000) & (df_reponse_complaints.Response == 'Yes')].reset_index(drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "289b2490-5ffb-49b2-896e-86ba3aea4021", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", - "metadata": { - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" - }, - "source": [ - "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ResponseTotal Claim Amount
0Yes484.013411
1Yes739.200000
2Yes547.200000
3Yes19.575683
4Yes60.036683
.........
1394Yes547.200000
1395Yes791.878042
1396Yes547.200000
1397Yes528.200860
1398Yes158.077504
\n", + "

1399 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Response Total Claim Amount\n", + "0 Yes 484.013411\n", + "1 Yes 739.200000\n", + "2 Yes 547.200000\n", + "3 Yes 19.575683\n", + "4 Yes 60.036683\n", + "... ... ...\n", + "1394 Yes 547.200000\n", + "1395 Yes 791.878042\n", + "1396 Yes 547.200000\n", + "1397 Yes 528.200860\n", + "1398 Yes 158.077504\n", + "\n", + "[1399 rows x 2 columns]" ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_filter" + ] + }, + { + "cell_type": "raw", + "id": "856907ed-c837-46f2-ba81-288c468b616a", + "metadata": {}, + "source": [ + "2 Using the original Dataframe, analyze:\n", + "the average monthly_premium and/or customer lifetime value by policy_type and gender for customers who responded \"Yes\", and\n", + "compare these insights to total_claim_amount patterns, and discuss which segments appear most profitable or low-risk for the company.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7c520800-e806-4b1c-931c-bb7ab0086a72", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "verage monthly_premium. customer lifetime value" + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "id": "69a37d12-f2eb-4b30-92b3-bf97238ccf7e", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/f1/77pps9rs2l9ffb8_p5tj2yrr0000gp/T/ipykernel_58644/3304530285.py:2: FutureWarning: The provided callable is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n", + " pd.pivot_table(data = df[df.Response == 'Yes'],\n" + ] }, { - "cell_type": "markdown", - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", - "metadata": { - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" - }, - "source": [ - "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Customer Lifetime ValueMonthly Premium AutoTotal Claim Amount
Policy TypeGender
Corporate AutoF7712.62873694.301775433.738499
M7944.46541492.188312408.582459
Personal AutoF8339.79184298.998148452.965929
M7448.38328191.085821457.010178
Special AutoF7691.58411192.314286453.280164
M8247.08870286.343750429.527942
\n", + "
" + ], + "text/plain": [ + " Customer Lifetime Value Monthly Premium Auto \\\n", + "Policy Type Gender \n", + "Corporate Auto F 7712.628736 94.301775 \n", + " M 7944.465414 92.188312 \n", + "Personal Auto F 8339.791842 98.998148 \n", + " M 7448.383281 91.085821 \n", + "Special Auto F 7691.584111 92.314286 \n", + " M 8247.088702 86.343750 \n", + "\n", + " Total Claim Amount \n", + "Policy Type Gender \n", + "Corporate Auto F 433.738499 \n", + " M 408.582459 \n", + "Personal Auto F 452.965929 \n", + " M 457.010178 \n", + "Special Auto F 453.280164 \n", + " M 429.527942 " ] - }, + }, + "execution_count": 105, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "pd.pivot_table(data = df[df.Response == 'Yes'],\n", + " values = [\"Monthly Premium Auto\", 'Customer Lifetime Value', 'Total Claim Amount'],\n", + " index = [\"Policy Type\",\"Gender\"],\n", + " aggfunc = np.mean)" + ] + }, + { + "cell_type": "code", + "execution_count": 106, + "id": "781b466a-49bb-47d1-9e69-0be6862382fb", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b42999f9-311f-481e-ae63-40a5577072c5", - "metadata": { - "id": "b42999f9-311f-481e-ae63-40a5577072c5" - }, - "source": [ - "## Bonus" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Monthly Premium AutoCustomer Lifetime ValueTotal Claim Amount
Policy TypeGender
Corporate AutoF94.3017757712.628736433.738499
M92.1883127944.465414408.582459
Personal AutoF98.9981488339.791842452.965929
M91.0858217448.383281457.010178
Special AutoF92.3142867691.584111453.280164
M86.3437508247.088702429.527942
\n", + "
" + ], + "text/plain": [ + " Monthly Premium Auto Customer Lifetime Value \\\n", + "Policy Type Gender \n", + "Corporate Auto F 94.301775 7712.628736 \n", + " M 92.188312 7944.465414 \n", + "Personal Auto F 98.998148 8339.791842 \n", + " M 91.085821 7448.383281 \n", + "Special Auto F 92.314286 7691.584111 \n", + " M 86.343750 8247.088702 \n", + "\n", + " Total Claim Amount \n", + "Policy Type Gender \n", + "Corporate Auto F 433.738499 \n", + " M 408.582459 \n", + "Personal Auto F 452.965929 \n", + " M 457.010178 \n", + "Special Auto F 453.280164 \n", + " M 429.527942 " ] - }, + }, + "execution_count": 106, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#df_filter2 = df[df.Response == 'Yes']\n", + "df[df.Response == 'Yes'].groupby([\"Policy Type\", \"Gender\"])[['Monthly Premium Auto', 'Customer Lifetime Value','Total Claim Amount']].mean()\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "id": "b772abb7-c832-4b5f-872e-61d0e17f11a6", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "81ff02c5-6584-4f21-a358-b918697c6432", - "metadata": { - "id": "81ff02c5-6584-4f21-a358-b918697c6432" - }, - "source": [ - "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Monthly Premium AutoCustomer Lifetime Value
Policy TypeGender
Corporate AutoF91.3846157980.306825
M94.7642497750.741082
Personal AutoF93.1531798074.660516
M93.3010567971.386285
Special AutoF93.5630258460.398042
M93.1970449010.601583
\n", + "
" + ], + "text/plain": [ + " Monthly Premium Auto Customer Lifetime Value\n", + "Policy Type Gender \n", + "Corporate Auto F 91.384615 7980.306825\n", + " M 94.764249 7750.741082\n", + "Personal Auto F 93.153179 8074.660516\n", + " M 93.301056 7971.386285\n", + "Special Auto F 93.563025 8460.398042\n", + " M 93.197044 9010.601583" ] - }, + }, + "execution_count": 96, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "df.groupby([\"Policy Type\", \"Gender\"]).agg({'Monthly Premium Auto': 'mean', 'Customer Lifetime Value': 'mean'})" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "id": "02c26780-688b-467d-ba55-000220619902", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b6aec097-c633-4017-a125-e77a97259cda", - "metadata": { - "id": "b6aec097-c633-4017-a125-e77a97259cda" - }, - "source": [ - "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", - "\n", - "*Hint:*\n", - "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", - "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", - "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + "data": { + "text/plain": [ + "Index(['Unnamed: 0', 'Customer', 'State', 'Customer Lifetime Value',\n", + " 'Response', 'Coverage', 'Education', 'Effective To Date',\n", + " 'EmploymentStatus', 'Gender', 'Income', 'Location Code',\n", + " 'Marital Status', 'Monthly Premium Auto', 'Months Since Last Claim',\n", + " 'Months Since Policy Inception', 'Number of Open Complaints',\n", + " 'Number of Policies', 'Policy Type', 'Policy', 'Renew Offer Type',\n", + " 'Sales Channel', 'Total Claim Amount', 'Vehicle Class', 'Vehicle Size',\n", + " 'Vehicle Type'],\n", + " dtype='object')" ] - }, + }, + "execution_count": 136, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns\n" + ] + }, + { + "cell_type": "raw", + "id": "df46c1be-5324-4a10-8afb-0d3d4184bcd5", + "metadata": {}, + "source": [ + "Female Personal Auto customers are highly valuable.\n", + "They show the highest Customer Lifetime Value (~8340) and the highest monthly premium (~99), suggesting strong long-term profitability despite moderate claim amounts.\n", + "\n", + "Male Corporate Auto customers are low-risk.\n", + "Their claim amounts (~409) are the lowest across all groups, making them a stable and less costly segment for the company.\n", + "\n", + "Male Personal Auto customers appear riskier.\n", + "They combine the lowest Customer Lifetime Value (~7448) with the highest claim amounts (~457), signaling a segment that may be less profitable.\n", + "\n", + "Special Auto customers show mixed patterns.\n", + "Male Special Auto customers have a high Customer Lifetime Value (~8247) with relatively low premiums (~86), suggesting loyalty and potential for upselling, while female Special Auto customers are more average in both value and claims." + ] + }, + { + "cell_type": "raw", + "id": "3544dd01-58f6-4f8a-86ff-057c920486e5", + "metadata": {}, + "source": [ + "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "id": "02ac3184-cc66-4b24-8eb9-87c6b499f410", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", - "metadata": { - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" - }, - "source": [ - "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", - "\n", - "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + "data": { + "text/plain": [ + "Index(['Unnamed: 0', 'Customer', 'State', 'Customer Lifetime Value',\n", + " 'Response', 'Coverage', 'Education', 'Effective To Date',\n", + " 'EmploymentStatus', 'Gender', 'Income', 'Location Code',\n", + " 'Marital Status', 'Monthly Premium Auto', 'Months Since Last Claim',\n", + " 'Months Since Policy Inception', 'Number of Open Complaints',\n", + " 'Number of Policies', 'Policy Type', 'Policy', 'Renew Offer Type',\n", + " 'Sales Channel', 'Total Claim Amount', 'Vehicle Class', 'Vehicle Size',\n", + " 'Vehicle Type'],\n", + " dtype='object')" ] - }, + }, + "execution_count": 107, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 111, + "id": "bf8daf47-0734-411e-a14a-87b9815cd17d", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", - "metadata": { - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" - }, - "source": [ - "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + "data": { + "text/plain": [ + "array(['Arizona', 'California', 'Washington', 'Oregon', nan, 'Nevada'],\n", + " dtype=object)" ] - }, + }, + "execution_count": 111, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.State.unique()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "d324e72e-93eb-4679-a2c4-39ad0220aafa", + "metadata": {}, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "id": "449513f4-0459-46a0-a18d-9398d974c9ad", - "metadata": { - "id": "449513f4-0459-46a0-a18d-9398d974c9ad" - }, - "outputs": [], - "source": [ - "# your code goes here" - ] - } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" - } - }, - "nbformat": 4, - "nbformat_minor": 5 + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Customer
State
Arizona1937
California3552
Nevada993
Oregon2909
Washington888
\n", + "
" + ], + "text/plain": [ + " Customer\n", + "State \n", + "Arizona 1937\n", + "California 3552\n", + "Nevada 993\n", + "Oregon 2909\n", + "Washington 888" + ] + }, + "execution_count": 121, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.pivot_table(data = df,\n", + " values = 'Customer',\n", + " index = [\"State\"],\n", + " aggfunc = 'count')" + ] + }, + { + "cell_type": "raw", + "id": "6622b8b9-2732-46ef-a197-0208747b1f4d", + "metadata": {}, + "source": [ + "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + ] + }, + { + "cell_type": "code", + "execution_count": 165, + "id": "961cdaf8-31dc-4416-aec1-e7a33011034d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
GenderFM
State
Arizona30462870
California52365366
Nevada13181453
Oregon44414225
Washington12551328
\n", + "
" + ], + "text/plain": [ + "Gender F M\n", + "State \n", + "Arizona 3046 2870\n", + "California 5236 5366\n", + "Nevada 1318 1453\n", + "Oregon 4441 4225\n", + "Washington 1255 1328" + ] + }, + "execution_count": 165, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "pd.pivot_table(data = df,\n", + " columns = 'Gender',\n", + " values = 'Number of Policies',\n", + " index = [\"State\"],\n", + " aggfunc = 'sum')" + ] + }, + { + "cell_type": "code", + "execution_count": 167, + "id": "77673282-6273-4184-a685-f8df4619b8c5", + "metadata": {}, + "outputs": [], + "source": [ + "df.head().to_clipboard()" + ] + }, + { + "cell_type": "code", + "execution_count": 129, + "id": "a1e38843-e1a5-479c-8fe2-53a63c80fb67", + "metadata": {}, + "outputs": [], + "source": [ + "df_count = df.groupby([\"State\"]).agg({'Customer': 'count'}).reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 130, + "id": "f7236266-d725-4035-887c-55e6ae45ade1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateCustomer
0Arizona1937
1California3552
2Nevada993
3Oregon2909
4Washington888
\n", + "
" + ], + "text/plain": [ + " State Customer\n", + "0 Arizona 1937\n", + "1 California 3552\n", + "2 Nevada 993\n", + "3 Oregon 2909\n", + "4 Washington 888" + ] + }, + "execution_count": 130, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_count " + ] + }, + { + "cell_type": "code", + "execution_count": 131, + "id": "7239a447-56d4-4830-a8cb-2ba727131012", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateCustomer
0Arizona1937
1California3552
2Nevada993
3Oregon2909
4Washington888
\n", + "
" + ], + "text/plain": [ + " State Customer\n", + "0 Arizona 1937\n", + "1 California 3552\n", + "2 Nevada 993\n", + "3 Oregon 2909\n", + "4 Washington 888" + ] + }, + "execution_count": 131, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# FILTROOOOOO : df[df.Response == 'Yes]\n", + "df_count[df_count.Customer >500]" + ] + }, + { + "cell_type": "code", + "execution_count": 132, + "id": "e5fc2852-7306-401b-a33f-fa59c93a72af", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateCustomer
0Arizona1937
1California3552
3Oregon2909
\n", + "
" + ], + "text/plain": [ + " State Customer\n", + "0 Arizona 1937\n", + "1 California 3552\n", + "3 Oregon 2909" + ] + }, + "execution_count": 132, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_count[df_count.Customer >1000]" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "7b90628d-75b2-41fa-b167-51bca180b823", + "metadata": {}, + "outputs": [], + "source": [ + "df_count = df.value_counts(['State'])" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "id": "acd8b34e-a9bd-4dd5-b5a5-d390146dc7ce", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "State \n", + "California 3552\n", + "Oregon 2909\n", + "Arizona 1937\n", + "Nevada 993\n", + "Washington 888\n", + "Name: count, dtype: int64" + ] + }, + "execution_count": 126, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_count" + ] + }, + { + "cell_type": "raw", + "id": "b37ec6ff-2d9d-45c4-bef5-2c17fe49e83b", + "metadata": {}, + "source": [ + "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + ] + }, + { + "cell_type": "code", + "execution_count": 134, + "id": "4bb78fa7-9e03-4638-a579-51990a7b11c1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 10910 entries, 0 to 10909\n", + "Data columns (total 26 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Unnamed: 0 10910 non-null int64 \n", + " 1 Customer 10910 non-null object \n", + " 2 State 10279 non-null object \n", + " 3 Customer Lifetime Value 10910 non-null float64\n", + " 4 Response 10279 non-null object \n", + " 5 Coverage 10910 non-null object \n", + " 6 Education 10910 non-null object \n", + " 7 Effective To Date 10910 non-null object \n", + " 8 EmploymentStatus 10910 non-null object \n", + " 9 Gender 10910 non-null object \n", + " 10 Income 10910 non-null int64 \n", + " 11 Location Code 10910 non-null object \n", + " 12 Marital Status 10910 non-null object \n", + " 13 Monthly Premium Auto 10910 non-null int64 \n", + " 14 Months Since Last Claim 10277 non-null float64\n", + " 15 Months Since Policy Inception 10910 non-null int64 \n", + " 16 Number of Open Complaints 10277 non-null float64\n", + " 17 Number of Policies 10910 non-null int64 \n", + " 18 Policy Type 10910 non-null object \n", + " 19 Policy 10910 non-null object \n", + " 20 Renew Offer Type 10910 non-null object \n", + " 21 Sales Channel 10910 non-null object \n", + " 22 Total Claim Amount 10910 non-null float64\n", + " 23 Vehicle Class 10288 non-null object \n", + " 24 Vehicle Size 10288 non-null object \n", + " 25 Vehicle Type 5428 non-null object \n", + "dtypes: float64(4), int64(5), object(17)\n", + "memory usage: 2.2+ MB\n" + ] + } + ], + "source": [ + "df.info()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "id": "b5f2d049-9d0b-4d34-9129-9ecb37e4c5b1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Customer Lifetime Value
maximumminimummedian
EducationGender
BachelorF73225.961904.005640.51
M67907.271898.015548.03
CollegeF61850.191898.685623.61
M61134.681918.126005.85
DoctorF44856.112395.575332.46
M32677.342267.605577.67
High School or BelowF55277.452144.926039.55
M83325.381940.986286.73
MasterF51016.072417.785729.86
M50568.262272.315579.10
\n", + "
" + ], + "text/plain": [ + " Customer Lifetime Value \n", + " maximum minimum median\n", + "Education Gender \n", + "Bachelor F 73225.96 1904.00 5640.51\n", + " M 67907.27 1898.01 5548.03\n", + "College F 61850.19 1898.68 5623.61\n", + " M 61134.68 1918.12 6005.85\n", + "Doctor F 44856.11 2395.57 5332.46\n", + " M 32677.34 2267.60 5577.67\n", + "High School or Below F 55277.45 2144.92 6039.55\n", + " M 83325.38 1940.98 6286.73\n", + "Master F 51016.07 2417.78 5729.86\n", + " M 50568.26 2272.31 5579.10" + ] + }, + "execution_count": 141, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby(['Education', 'Gender']).agg({\n", + " 'Customer Lifetime Value':[('maximum', 'max'),\n", + " ('minimum', 'min'),\n", + " ('median', 'median')]}).round (2)" + ] + }, + { + "cell_type": "code", + "execution_count": 153, + "id": "9cbd18b8-c4ec-496b-ac6b-4d37dd19b0ad", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EducationGendermaxminmedian
Customer Lifetime ValueCustomer Lifetime ValueCustomer Lifetime Value
0BachelorF73225.961904.005640.51
1BachelorM67907.271898.015548.03
2CollegeF61850.191898.685623.61
3CollegeM61134.681918.126005.85
4DoctorF44856.112395.575332.46
5DoctorM32677.342267.605577.67
6High School or BelowF55277.452144.926039.55
7High School or BelowM83325.381940.986286.73
8MasterF51016.072417.785729.86
9MasterM50568.262272.315579.10
\n", + "
" + ], + "text/plain": [ + " Education Gender max \\\n", + " Customer Lifetime Value \n", + "0 Bachelor F 73225.96 \n", + "1 Bachelor M 67907.27 \n", + "2 College F 61850.19 \n", + "3 College M 61134.68 \n", + "4 Doctor F 44856.11 \n", + "5 Doctor M 32677.34 \n", + "6 High School or Below F 55277.45 \n", + "7 High School or Below M 83325.38 \n", + "8 Master F 51016.07 \n", + "9 Master M 50568.26 \n", + "\n", + " min median \n", + " Customer Lifetime Value Customer Lifetime Value \n", + "0 1904.00 5640.51 \n", + "1 1898.01 5548.03 \n", + "2 1898.68 5623.61 \n", + "3 1918.12 6005.85 \n", + "4 2395.57 5332.46 \n", + "5 2267.60 5577.67 \n", + "6 2144.92 6039.55 \n", + "7 1940.98 6286.73 \n", + "8 2417.78 5729.86 \n", + "9 2272.31 5579.10 " + ] + }, + "execution_count": 153, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.pivot_table(data=df,\n", + " values = ['Customer Lifetime Value'],\n", + " index = [\"Education\",\"Gender\"],\n", + " aggfunc = ['max','min', 'median'] ).round(2).reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 154, + "id": "3aae0c7b-dd04-4d12-86fe-f759ff46617c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
maxminmedian
Customer Lifetime ValueCustomer Lifetime ValueCustomer Lifetime Value
EducationGender
BachelorF73225.961904.005640.51
M67907.271898.015548.03
CollegeF61850.191898.685623.61
M61134.681918.126005.85
DoctorF44856.112395.575332.46
M32677.342267.605577.67
High School or BelowF55277.452144.926039.55
M83325.381940.986286.73
MasterF51016.072417.785729.86
M50568.262272.315579.10
\n", + "
" + ], + "text/plain": [ + " max min \\\n", + " Customer Lifetime Value Customer Lifetime Value \n", + "Education Gender \n", + "Bachelor F 73225.96 1904.00 \n", + " M 67907.27 1898.01 \n", + "College F 61850.19 1898.68 \n", + " M 61134.68 1918.12 \n", + "Doctor F 44856.11 2395.57 \n", + " M 32677.34 2267.60 \n", + "High School or Below F 55277.45 2144.92 \n", + " M 83325.38 1940.98 \n", + "Master F 51016.07 2417.78 \n", + " M 50568.26 2272.31 \n", + "\n", + " median \n", + " Customer Lifetime Value \n", + "Education Gender \n", + "Bachelor F 5640.51 \n", + " M 5548.03 \n", + "College F 5623.61 \n", + " M 6005.85 \n", + "Doctor F 5332.46 \n", + " M 5577.67 \n", + "High School or Below F 6039.55 \n", + " M 6286.73 \n", + "Master F 5729.86 \n", + " M 5579.10 " + ] + }, + "execution_count": 154, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.pivot_table(data=df,\n", + " values = ['Customer Lifetime Value'],\n", + " index = [\"Education\",\"Gender\"],\n", + " aggfunc = ['max','min', 'median'] ).round(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 150, + "id": "533f02c8-89e3-47ee-ac89-5834ffe9d207", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EducationBachelorCollegeDoctorHigh School or BelowMaster
GenderFMFMFMFMFM
maxCustomer Lifetime Value73225.9667907.2761850.1961134.6844856.1132677.3455277.4583325.3851016.0750568.26
minCustomer Lifetime Value1904.001898.011898.681918.122395.572267.602144.921940.982417.782272.31
medianCustomer Lifetime Value5640.515548.035623.616005.855332.465577.676039.556286.735729.865579.10
\n", + "
" + ], + "text/plain": [ + "Education Bachelor College \\\n", + "Gender F M F M \n", + "max Customer Lifetime Value 73225.96 67907.27 61850.19 61134.68 \n", + "min Customer Lifetime Value 1904.00 1898.01 1898.68 1918.12 \n", + "median Customer Lifetime Value 5640.51 5548.03 5623.61 6005.85 \n", + "\n", + "Education Doctor High School or Below \\\n", + "Gender F M F \n", + "max Customer Lifetime Value 44856.11 32677.34 55277.45 \n", + "min Customer Lifetime Value 2395.57 2267.60 2144.92 \n", + "median Customer Lifetime Value 5332.46 5577.67 6039.55 \n", + "\n", + "Education Master \n", + "Gender M F M \n", + "max Customer Lifetime Value 83325.38 51016.07 50568.26 \n", + "min Customer Lifetime Value 1940.98 2417.78 2272.31 \n", + "median Customer Lifetime Value 6286.73 5729.86 5579.10 " + ] + }, + "execution_count": 150, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.pivot_table(data=df,\n", + " values = ['Customer Lifetime Value'],\n", + " index = [\"Education\",\"Gender\"],\n", + " aggfunc = ['max','min', 'median'] ).round(2).T" + ] + }, + { + "cell_type": "raw", + "id": "50f41e93-9875-4786-9944-e57140942f0e", + "metadata": {}, + "source": [ + "The highest CLV overall is for High School or Below (Male) customers at $83,325.38, which is higher than even Doctorates or Masters.This suggests that having less formal education doesn’t necessarily limit long-term customer value; other factors might drive loyalty or spending.\n", + "\n", + "The lowest CLVs are fairly consistent across education and gender, all hovering just under $2,000.This indicates that at the low end, education level and gender do not make a large difference.\n", + "\n", + "Medians are much closer to each other than maximums.\n", + "Most groups sit between $5,300 and $6,300.\n", + "This suggests that, for the average customer, education and gender have only modest impact on CLV.\n", + "The highest median is for High School or Below (Male) at $6,286.73, while the lowest median is for Doctorate (Female) at $5,332.46.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 156, + "id": "1f478a09-ca30-45fd-9ef0-fd249dc47a76", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['Unnamed: 0', 'Customer', 'State', 'Customer Lifetime Value',\n", + " 'Response', 'Coverage', 'Education', 'Effective To Date',\n", + " 'EmploymentStatus', 'Gender', 'Income', 'Location Code',\n", + " 'Marital Status', 'Monthly Premium Auto', 'Months Since Last Claim',\n", + " 'Months Since Policy Inception', 'Number of Open Complaints',\n", + " 'Number of Policies', 'Policy Type', 'Policy', 'Renew Offer Type',\n", + " 'Sales Channel', 'Total Claim Amount', 'Vehicle Class', 'Vehicle Size',\n", + " 'Vehicle Type'],\n", + " dtype='object')" + ] + }, + "execution_count": 156, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 159, + "id": "9ffba624-394f-4cbc-bc14-bd2334747db6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([52, 26, 31, 3, 73, 99, 45, 24, 8, 29, 32, 25, 28, 87, 10, 74, 1,\n", + " 38, 58, 37, 7, 80, 95, 78, 63, 27, 97, 39, 11, 59, 46, 62, 13, 54,\n", + " 51, 22, 82, 91, 44, 43, 76, 48, 84, 6, 92, 12, 61, 4, 18, 66, 70,\n", + " 16, 75, 34, 35, 64, 9, 89, 0, 60, 71, 23, 55, 93, 2, 67, 81, 40,\n", + " 57, 86, 19, 72, 69, 33, 47, 42, 17, 49, 21, 83, 94, 30, 15, 50, 53,\n", + " 77, 41, 90, 5, 79, 56, 98, 20, 88, 65, 14, 85, 96, 36, 68])" + ] + }, + "execution_count": 159, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Months Since Policy Inception'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "affcbaed-7609-421e-b9e0-90f0b35ace18", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python [conda env:base] *", + "language": "python", + "name": "conda-base-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 }