diff --git a/.DS_Store b/.DS_Store
deleted file mode 100644
index 01222e6..0000000
Binary files a/.DS_Store and /dev/null differ
diff --git a/README.md b/README.md
deleted file mode 100644
index b19cdde..0000000
--- a/README.md
+++ /dev/null
@@ -1,179 +0,0 @@
-
-
-# LAB | Data Structuring and Combining Data
-
-
-
Learning Goals
-
-
- This lab allows you to practice and apply the concepts and techniques taught in class.
-
- Upon completion of this lab, you will be able to:
-
-- Apply Python programming to modify the structure of data by pivoting, stacking/unstacking, or melting dataframes.
-- Combine and integrate data from multiple sources using merging, concatenating, or joining techniques to generate more comprehensive and meaningful datasets for analysis.
-
-
-
-
-
-
-
-
-
Prerequisites
-
-
-Before this starting this lab, you should have learnt about:
-
-- Python Programming
-- Introduction to Pandas DataFrames and Series
-- Data Cleaning: handling null values and duplicates
-- Data Formatting: dealing with strings, dates, renaming columns, using map, apply and mapapply methods
-- Data structuring and combining data: methods such as pivot, stack/unstack or melt for data structuring and merge, concat or join for combining data.
-
-
-
-
-
-
-
-## Introduction
-
-Welcome to this lab on data structuring and combining data!
-
-In this lab, you will practice how integrate data from different datasets and generate more comprehensive datasets for analysis.
-
-Once we you have explored data combining techniques, we will move on to modifying and reorganizing datasets using techniques such as pivoting dataframes.
-
-
-By the end of this lab, you will have a strong understanding of how to manipulate and combine datasets to create more meaningful and efficient data structures. These skills are essential for anyone working with complex datasets, and will help you to become a more effective data analyst.
-
-
-
-**Happy coding!** :heart:
-
-## Important Notes
-
-This lab is built on top of the `data cleaning and formatting` lab. If you couldn't complete the `data cleaning and formatting` lab, ask your LT for the `data cleaning and formatting` lab solution so you can do this lab on top of it.
-
-## About the dataset
-
-### Context
-This is customer data with their vehicle insurance policies. It has the same data as the dataset we were using before, but has a couple of new features.
-
-### New Data Description
-
-- Customer - Customer ID
-
-- ST - State where customers live
-
-- Customer Lifetime Value - Customer lifetime value(CLV) is the total revenue the client will derive from their entire relationship with a customer. In other words, is the predicted or calculated value of a customer over their entire duration as a policyholder with the insurance company. It is an estimation of the net profit that the insurance company expects to generate from a customer throughout their relationship with the company. Customer Lifetime Value takes into account factors such as the duration of the customer's policy, premium payments, claim history, renewal likelihood, and potential additional services or products the customer may purchase. It helps insurers assess the long-term profitability and value associated with retaining a particular customer.
-
-- Response - Whether the customer responded to a marketing campaign (yes or no)
-
-- Coverage - The type of coverage the customer has (e.g., basic, extended, premium)
-
-- Education - Background education of customers
-
-- Effective To Date - The date when the policy becomes effective
-
-- EmploymentStatus - The employment status of the customer
-
-- Gender - Gender of the customer
-
-- Income - Customers income
-
-- Location Code - indicates if the customer lives in Rural, Suburban, or Urban location
-
-- Marital Status - The marital status of the customer
-
-- Monthly Premium Auto - Amount of money the customer pays on a monthly basis as a premium for their auto insurance coverage. It represents the recurring cost that the insured person must pay to maintain their insurance policy and receive coverage for potential damages, accidents, or other covered events related to their vehicle
-
-- Months Since Last Claim - The number of months since the customer's last claim
-
-- Months Since Policy Inception - The number of months since the policy was initiated
-
-- Number of Open Complaints - Number of complaints the customer opened
-
-- Number of Policies - The number of policies the customer holds
-
-- Policy Type - There are three type of policies in car insurance (Corporate Auto, Personal Auto, and Special Auto)
-
-- Policy - The specific policy identifier. There are three different policies for each policy type (Corporate L3, Corporate L2, Corporate L1, Personal L3,Personal L2, Personal L1,Special L3, Special L2, Special L1)
-
-- Renew Offer Type: The type of offer provided to the customer for policy renewal
-
-- Sales Channel - The channel through which the policy was sold.
-
-- Total Claim Amount - the sum of all claims made by the customer. It represents the total monetary value of all approved claims for incidents such as accidents, theft, vandalism, or other covered events.
-
-- Vehicle Class - Type of vehicle classes that customers have Two-Door Car, Four-Door Car SUV, Luxury SUV, Sports Car, and Luxury Car
-
-- Vehicle Size - The size category of the insured vehicle (e.g., small, midsize, large)
-- Vehicle Type - The type of vehicle insured (e.g., car, truck, motorcycle)
-
-## Requirements
-
-- Fork this repo
-- Clone it to your machine
-
-## Getting Started
-
-Complete the challenges in the `Jupyter Notebook` file. Follow the instructions and add your code and explanations as necessary.
-
-## Submission
-
-- Upon completion, run the following commands:
-
-```bash
-git add .
-git commit -m "Solved lab"
-git push origin master
-```
-
-- Paste the link of your lab in Student Portal.
-
-
-## FAQs
-
- I am stuck in the exercise and don't know how to solve the problem or where to start.
-
-
- If you are stuck in your code and don't know how to solve the problem or where to start, you should take a step back and try to form a clear question about the specific issue you are facing. This will help you narrow down the problem and come up with potential solutions.
-
-
- For example, is it a concept that you don't understand, or are you receiving an error message that you don't know how to fix? It is usually helpful to try to state the problem as clearly as possible, including any error messages you are receiving. This can help you communicate the issue to others and potentially get help from classmates or online resources.
-
-
- Once you have a clear understanding of the problem, you will be able to start working toward the solution.
-
- [Back to top](#faqs)
-
-
-
-
-
- I am unable to push changes to the repository. What should I do?
-
-
-There are a couple of possible reasons why you may be unable to *push* changes to a Git repository:
-
-1. **You have not committed your changes:** Before you can push your changes to the repository, you need to commit them using the `git commit` command. Make sure you have committed your changes and try pushing again. To do this, run the following terminal commands from the project folder:
- ```bash
- git add .
- git commit -m "Your commit message"
- git push
- ```
-2. **You do not have permission to push to the repository:** If you have cloned the repository directly from the main Ironhack repository without making a *Fork* first, you do not have write access to the repository.
-To check which remote repository you have cloned, run the following terminal command from the project folder:
- ```bash
- git remote -v
- ```
-If the link shown is the same as the main Ironhack repository, you will need to fork the repository to your GitHub account first and then clone your fork to your local machine to be able to push the changes.
-
-**Note**: You should make a copy of your local code to avoid losing it in the process.
-
- [Back to top](#faqs)
-
-
-
diff --git a/lab-dw-data-structuring-and-combining.ipynb b/lab-dw-data-structuring-and-combining.ipynb
index ec4e3f9..0917b95 100644
--- a/lab-dw-data-structuring-and-combining.ipynb
+++ b/lab-dw-data-structuring-and-combining.ipynb
@@ -1,168 +1 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "25d7736c-ba17-4aff-b6bb-66eba20fbf4e",
- "metadata": {
- "id": "25d7736c-ba17-4aff-b6bb-66eba20fbf4e"
- },
- "source": [
- "# Lab | Data Structuring and Combining Data"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a2cdfc70-44c8-478c-81e7-2bc43fdf4986",
- "metadata": {
- "id": "a2cdfc70-44c8-478c-81e7-2bc43fdf4986"
- },
- "source": [
- "## Challenge 1: Combining & Cleaning Data\n",
- "\n",
- "In this challenge, we will be working with the customer data from an insurance company, as we did in the two previous labs. The data can be found here:\n",
- "- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file1.csv\n",
- "\n",
- "But this time, we got new data, which can be found in the following 2 CSV files located at the links below.\n",
- "\n",
- "- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file2.csv\n",
- "- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file3.csv\n",
- "\n",
- "Note that you'll need to clean and format the new data.\n",
- "\n",
- "Observation:\n",
- "- One option is to first combine the three datasets and then apply the cleaning function to the new combined dataset\n",
- "- Another option would be to read the clean file you saved in the previous lab, and just clean the two new files and concatenate the three clean datasets"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "492d06e3-92c7-4105-ac72-536db98d3244",
- "metadata": {
- "id": "492d06e3-92c7-4105-ac72-536db98d3244"
- },
- "outputs": [],
- "source": [
- "# Your code goes here"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "31b8a9e7-7db9-4604-991b-ef6771603e57",
- "metadata": {
- "id": "31b8a9e7-7db9-4604-991b-ef6771603e57"
- },
- "source": [
- "# Challenge 2: Structuring Data"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a877fd6d-7a0c-46d2-9657-f25036e4ca4b",
- "metadata": {
- "id": "a877fd6d-7a0c-46d2-9657-f25036e4ca4b"
- },
- "source": [
- "In this challenge, we will continue to work with customer data from an insurance company, but we will use a dataset with more columns, called marketing_customer_analysis.csv, which can be found at the following link:\n",
- "\n",
- "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis_clean.csv\n",
- "\n",
- "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by performing data cleaning, formatting, and structuring."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "aa10d9b0-1c27-4d3f-a8e4-db6ab73bfd26",
- "metadata": {
- "id": "aa10d9b0-1c27-4d3f-a8e4-db6ab73bfd26"
- },
- "outputs": [],
- "source": [
- "# Your code goes here"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "df35fd0d-513e-4e77-867e-429da10a9cc7",
- "metadata": {
- "id": "df35fd0d-513e-4e77-867e-429da10a9cc7"
- },
- "source": [
- "1. You work at the marketing department and you want to know which sales channel brought the most sales in terms of total revenue. Using pivot, create a summary table showing the total revenue for each sales channel (branch, call center, web, and mail).\n",
- "Round the total revenue to 2 decimal points. Analyze the resulting table to draw insights."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "640993b2-a291-436c-a34d-a551144f8196",
- "metadata": {
- "id": "640993b2-a291-436c-a34d-a551144f8196"
- },
- "source": [
- "2. Create a pivot table that shows the average customer lifetime value per gender and education level. Analyze the resulting table to draw insights."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "32c7f2e5-3d90-43e5-be33-9781b6069198",
- "metadata": {
- "id": "32c7f2e5-3d90-43e5-be33-9781b6069198"
- },
- "source": [
- "## Bonus\n",
- "\n",
- "You work at the customer service department and you want to know which months had the highest number of complaints by policy type category. Create a summary table showing the number of complaints by policy type and month.\n",
- "Show it in a long format table."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e3d09a8f-953c-448a-a5f8-2e5a8cca7291",
- "metadata": {
- "id": "e3d09a8f-953c-448a-a5f8-2e5a8cca7291"
- },
- "source": [
- "*In data analysis, a long format table is a way of structuring data in which each observation or measurement is stored in a separate row of the table. The key characteristic of a long format table is that each column represents a single variable, and each row represents a single observation of that variable.*\n",
- "\n",
- "*More information about long and wide format tables here: https://www.statology.org/long-vs-wide-data/*"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "3a069e0b-b400-470e-904d-d17582191be4",
- "metadata": {
- "id": "3a069e0b-b400-470e-904d-d17582191be4"
- },
- "outputs": [],
- "source": [
- "# Your code goes here"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "provenance": []
- },
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.13"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
+{"cells":[{"cell_type":"markdown","id":"25d7736c-ba17-4aff-b6bb-66eba20fbf4e","metadata":{"id":"25d7736c-ba17-4aff-b6bb-66eba20fbf4e"},"source":["# Lab | Data Structuring and Combining Data"]},{"cell_type":"markdown","id":"a2cdfc70-44c8-478c-81e7-2bc43fdf4986","metadata":{"id":"a2cdfc70-44c8-478c-81e7-2bc43fdf4986"},"source":["## Challenge 1: Combining & Cleaning Data\n","\n","In this challenge, we will be working with the customer data from an insurance company, as we did in the two previous labs. The data can be found here:\n","- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file1.csv\n","\n","But this time, we got new data, which can be found in the following 2 CSV files located at the links below.\n","\n","- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file2.csv\n","- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file3.csv\n","\n","Note that you'll need to clean and format the new data.\n","\n","Observation:\n","- One option is to first combine the three datasets and then apply the cleaning function to the new combined dataset\n","- Another option would be to read the clean file you saved in the previous lab, and just clean the two new files and concatenate the three clean datasets"]},{"cell_type":"code","execution_count":26,"id":"492d06e3-92c7-4105-ac72-536db98d3244","metadata":{"id":"492d06e3-92c7-4105-ac72-536db98d3244","executionInfo":{"status":"ok","timestamp":1754235103475,"user_tz":-120,"elapsed":226,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}}},"outputs":[],"source":["# Your code goes here\n","import pandas as pd\n","\n","url_1 = 'https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file1.csv'\n","url_2 = 'https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file2.csv'\n","url_3 = 'https://raw.githubusercontent.com/data-bootcamp-v4/data/main/file3.csv'\n","\n","df_1 = pd.read_csv(url_1)\n","df_2 = pd.read_csv(url_2)\n","df_3 = pd.read_csv(url_3)\n","\n"]},{"cell_type":"code","source":["\n","df_definitive = pd.concat([df_1,df_2,df_3], ignore_index= True)\n","\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":424},"id":"Dj7Ea-IAj3V4","executionInfo":{"status":"ok","timestamp":1754235898430,"user_tz":-120,"elapsed":119,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"81b698ed-aeff-40b6-9caf-7e8e14df68dc"},"id":"Dj7Ea-IAj3V4","execution_count":61,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Customer ST GENDER Education \\\n","0 RB50392 Washington NaN Master \n","1 QZ44356 Arizona F Bachelor \n","2 AI49188 Nevada F Bachelor \n","3 WW63253 California M Bachelor \n","4 GA49547 Washington M High School or Below \n","... ... ... ... ... \n","12069 LA72316 NaN NaN Bachelor \n","12070 PK87824 NaN NaN College \n","12071 TD14365 NaN NaN Bachelor \n","12072 UP19263 NaN NaN College \n","12073 Y167826 NaN NaN College \n","\n"," Customer Lifetime Value Income Monthly Premium Auto \\\n","0 NaN 0.0 1000.0 \n","1 697953.59% 0.0 94.0 \n","2 1288743.17% 48767.0 108.0 \n","3 764586.18% 0.0 106.0 \n","4 536307.65% 36357.0 68.0 \n","... ... ... ... \n","12069 23405.98798 71941.0 73.0 \n","12070 3096.511217 21604.0 79.0 \n","12071 8163.890428 0.0 85.0 \n","12072 7524.442436 21941.0 96.0 \n","12073 2611.836866 0.0 77.0 \n","\n"," Number of Open Complaints Policy Type Vehicle Class \\\n","0 1/0/00 Personal Auto Four-Door Car \n","1 1/0/00 Personal Auto Four-Door Car \n","2 1/0/00 Personal Auto Two-Door Car \n","3 1/0/00 Corporate Auto SUV \n","4 1/0/00 Personal Auto Four-Door Car \n","... ... ... ... \n","12069 0 Personal Auto Four-Door Car \n","12070 0 Corporate Auto Four-Door Car \n","12071 3 Corporate Auto Four-Door Car \n","12072 0 Personal Auto Four-Door Car \n","12073 0 Corporate Auto Two-Door Car \n","\n"," Total Claim Amount State Gender \n","0 2.704934 NaN NaN \n","1 1131.464935 NaN NaN \n","2 566.472247 NaN NaN \n","3 529.881344 NaN NaN \n","4 17.269323 NaN NaN \n","... ... ... ... \n","12069 198.234764 California M \n","12070 379.200000 California F \n","12071 790.784983 California M \n","12072 691.200000 California M \n","12073 369.600000 California M \n","\n","[12074 rows x 13 columns]"],"text/html":["\n","
"]},"metadata":{}},{"output_type":"stream","name":"stderr","text":["/tmp/ipython-input-2362286921.py:11: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n","The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n","\n","For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n","\n","\n"," df_definitive['GENDER'].fillna('Unknown',inplace=True)\n"]}]},{"cell_type":"code","source":["df_definitive['Customer Lifetime Value'].fillna(df_definitive['Customer Lifetime Value'].mean(),inplace=True)\n","df_definitive['Customer Lifetime Value'] = round(df_definitive['Customer Lifetime Value'],2,)\n","\n","df_definitive"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":555},"id":"wMs-5m9hoQ0A","executionInfo":{"status":"ok","timestamp":1754236507543,"user_tz":-120,"elapsed":115,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"7f84f85e-ae15-4225-a35a-fc1160682a54"},"id":"wMs-5m9hoQ0A","execution_count":74,"outputs":[{"output_type":"stream","name":"stderr","text":["/tmp/ipython-input-856863045.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n","The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n","\n","For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n","\n","\n"," df_definitive['Customer Lifetime Value'].fillna(df_definitive['Customer Lifetime Value'].mean(),inplace=True)\n"]},{"output_type":"execute_result","data":{"text/plain":[" Customer ST GENDER Education \\\n","0 RB50392 Washington Unknown Master \n","1 QZ44356 Arizona F Bachelor \n","2 AI49188 Nevada F Bachelor \n","3 WW63253 California M Bachelor \n","4 GA49547 Washington M High School or Below \n","... ... ... ... ... \n","12069 LA72316 NaN Unknown Bachelor \n","12070 PK87824 NaN Unknown College \n","12071 TD14365 NaN Unknown Bachelor \n","12072 UP19263 NaN Unknown College \n","12073 Y167826 NaN Unknown College \n","\n"," Customer Lifetime Value Income Monthly Premium Auto \\\n","0 780264.02 0.0 1000.0 \n","1 697953.59 0.0 94.0 \n","2 1288743.17 48767.0 108.0 \n","3 764586.18 0.0 106.0 \n","4 536307.65 36357.0 68.0 \n","... ... ... ... \n","12069 780264.02 71941.0 73.0 \n","12070 780264.02 21604.0 79.0 \n","12071 780264.02 0.0 85.0 \n","12072 780264.02 21941.0 96.0 \n","12073 780264.02 0.0 77.0 \n","\n"," Number of Open Complaints Policy Type Vehicle Class \\\n","0 1/0/00 Personal Auto Four-Door Car \n","1 1/0/00 Personal Auto Four-Door Car \n","2 1/0/00 Personal Auto Two-Door Car \n","3 1/0/00 Corporate Auto SUV \n","4 1/0/00 Personal Auto Four-Door Car \n","... ... ... ... \n","12069 0 Personal Auto Four-Door Car \n","12070 0 Corporate Auto Four-Door Car \n","12071 3 Corporate Auto Four-Door Car \n","12072 0 Personal Auto Four-Door Car \n","12073 0 Corporate Auto Two-Door Car \n","\n"," Total Claim Amount State Gender \n","0 2.704934 NaN NaN \n","1 1131.464935 NaN NaN \n","2 566.472247 NaN NaN \n","3 529.881344 NaN NaN \n","4 17.269323 NaN NaN \n","... ... ... ... \n","12069 198.234764 California M \n","12070 379.200000 California F \n","12071 790.784983 California M \n","12072 691.200000 California M \n","12073 369.600000 California M \n","\n","[9135 rows x 13 columns]"],"text/html":["\n","
\n","
\n","\n","
\n"," \n","
\n","
\n","
Customer
\n","
ST
\n","
GENDER
\n","
Education
\n","
Customer Lifetime Value
\n","
Income
\n","
Monthly Premium Auto
\n","
Number of Open Complaints
\n","
Policy Type
\n","
Vehicle Class
\n","
Total Claim Amount
\n","
State
\n","
Gender
\n","
\n"," \n"," \n","
\n","
0
\n","
RB50392
\n","
Washington
\n","
Unknown
\n","
Master
\n","
780264.02
\n","
0.0
\n","
1000.0
\n","
1/0/00
\n","
Personal Auto
\n","
Four-Door Car
\n","
2.704934
\n","
NaN
\n","
NaN
\n","
\n","
\n","
1
\n","
QZ44356
\n","
Arizona
\n","
F
\n","
Bachelor
\n","
697953.59
\n","
0.0
\n","
94.0
\n","
1/0/00
\n","
Personal Auto
\n","
Four-Door Car
\n","
1131.464935
\n","
NaN
\n","
NaN
\n","
\n","
\n","
2
\n","
AI49188
\n","
Nevada
\n","
F
\n","
Bachelor
\n","
1288743.17
\n","
48767.0
\n","
108.0
\n","
1/0/00
\n","
Personal Auto
\n","
Two-Door Car
\n","
566.472247
\n","
NaN
\n","
NaN
\n","
\n","
\n","
3
\n","
WW63253
\n","
California
\n","
M
\n","
Bachelor
\n","
764586.18
\n","
0.0
\n","
106.0
\n","
1/0/00
\n","
Corporate Auto
\n","
SUV
\n","
529.881344
\n","
NaN
\n","
NaN
\n","
\n","
\n","
4
\n","
GA49547
\n","
Washington
\n","
M
\n","
High School or Below
\n","
536307.65
\n","
36357.0
\n","
68.0
\n","
1/0/00
\n","
Personal Auto
\n","
Four-Door Car
\n","
17.269323
\n","
NaN
\n","
NaN
\n","
\n","
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
\n","
\n","
12069
\n","
LA72316
\n","
NaN
\n","
Unknown
\n","
Bachelor
\n","
780264.02
\n","
71941.0
\n","
73.0
\n","
0
\n","
Personal Auto
\n","
Four-Door Car
\n","
198.234764
\n","
California
\n","
M
\n","
\n","
\n","
12070
\n","
PK87824
\n","
NaN
\n","
Unknown
\n","
College
\n","
780264.02
\n","
21604.0
\n","
79.0
\n","
0
\n","
Corporate Auto
\n","
Four-Door Car
\n","
379.200000
\n","
California
\n","
F
\n","
\n","
\n","
12071
\n","
TD14365
\n","
NaN
\n","
Unknown
\n","
Bachelor
\n","
780264.02
\n","
0.0
\n","
85.0
\n","
3
\n","
Corporate Auto
\n","
Four-Door Car
\n","
790.784983
\n","
California
\n","
M
\n","
\n","
\n","
12072
\n","
UP19263
\n","
NaN
\n","
Unknown
\n","
College
\n","
780264.02
\n","
21941.0
\n","
96.0
\n","
0
\n","
Personal Auto
\n","
Four-Door Car
\n","
691.200000
\n","
California
\n","
M
\n","
\n","
\n","
12073
\n","
Y167826
\n","
NaN
\n","
Unknown
\n","
College
\n","
780264.02
\n","
0.0
\n","
77.0
\n","
0
\n","
Corporate Auto
\n","
Two-Door Car
\n","
369.600000
\n","
California
\n","
M
\n","
\n"," \n","
\n","
9135 rows × 13 columns
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_definitive","summary":"{\n \"name\": \"df_definitive\",\n \"rows\": 9135,\n \"fields\": [\n {\n \"column\": \"Customer\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 9056,\n \"samples\": [\n \"UV89077\",\n \"XE96798\",\n \"MT78037\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ST\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 8,\n \"samples\": [\n \"Arizona\",\n \"Cali\",\n \"Washington\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"GENDER\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Unknown\",\n \"F\",\n \"female\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Education\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Master\",\n \"Bachelor\",\n \"Doctor\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Customer Lifetime Value\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 300471.4340568492,\n \"min\": 200435.07,\n \"max\": 5816655.35,\n \"num_unique_values\": 1924,\n \"samples\": [\n 688368.66,\n 531983.87,\n 325676.64\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Income\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 30359.23293287194,\n \"min\": 0.0,\n \"max\": 99981.0,\n \"num_unique_values\": 5655,\n \"samples\": [\n 40639.0,\n 90252.0,\n 34398.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Monthly Premium Auto\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 581.4714606817221,\n \"min\": 61.0,\n \"max\": 35354.0,\n \"num_unique_values\": 209,\n \"samples\": [\n 103.0,\n 232.0,\n 132.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Number of Open Complaints\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n 5,\n 1,\n \"1/0/00\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Policy Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Personal Auto\",\n \"Corporate Auto\",\n \"Special Auto\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Vehicle Class\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Four-Door Car\",\n \"Two-Door Car\",\n \"Luxury Car\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total Claim Amount\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 289.6179854607152,\n \"min\": 0.099007,\n \"max\": 2893.239678,\n \"num_unique_values\": 5070,\n \"samples\": [\n 358.643521,\n 355.818306,\n 534.653787\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"State\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Arizona\",\n \"Oregon\",\n \"Nevada\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Gender\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"F\",\n \"M\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":74}]},{"cell_type":"code","source":["\n","df_definitive.dropna(inplace=True)\n"],"metadata":{"id":"V_xi8S8ikAbp","executionInfo":{"status":"ok","timestamp":1754237601571,"user_tz":-120,"elapsed":11,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}}},"id":"V_xi8S8ikAbp","execution_count":77,"outputs":[]},{"cell_type":"markdown","id":"31b8a9e7-7db9-4604-991b-ef6771603e57","metadata":{"id":"31b8a9e7-7db9-4604-991b-ef6771603e57"},"source":["# Challenge 2: Structuring Data"]},{"cell_type":"markdown","id":"a877fd6d-7a0c-46d2-9657-f25036e4ca4b","metadata":{"id":"a877fd6d-7a0c-46d2-9657-f25036e4ca4b"},"source":["In this challenge, we will continue to work with customer data from an insurance company, but we will use a dataset with more columns, called marketing_customer_analysis.csv, which can be found at the following link:\n","\n","https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis_clean.csv\n","\n","This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by performing data cleaning, formatting, and structuring."]},{"cell_type":"code","execution_count":81,"id":"aa10d9b0-1c27-4d3f-a8e4-db6ab73bfd26","metadata":{"id":"aa10d9b0-1c27-4d3f-a8e4-db6ab73bfd26","colab":{"base_uri":"https://localhost:8080/","height":617},"executionInfo":{"status":"ok","timestamp":1754237950313,"user_tz":-120,"elapsed":173,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"3865b87d-8831-4756-db2c-cb736909f86f"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" unnamed:_0 customer state customer_lifetime_value response \\\n","0 0 DK49336 Arizona 4809.216960 No \n","1 1 KX64629 California 2228.525238 No \n","2 2 LZ68649 Washington 14947.917300 No \n","3 3 XL78013 Oregon 22332.439460 Yes \n","4 4 QA50777 Oregon 9025.067525 No \n","... ... ... ... ... ... \n","10905 10905 FE99816 Nevada 15563.369440 No \n","10906 10906 KX53892 Oregon 5259.444853 No \n","10907 10907 TL39050 Arizona 23893.304100 No \n","10908 10908 WA60547 California 11971.977650 No \n","10909 10909 IV32877 California 6857.519928 No \n","\n"," coverage education effective_to_date employmentstatus gender income \\\n","0 Basic College 2011-02-18 Employed M 48029 \n","1 Basic College 2011-01-18 Unemployed F 0 \n","2 Basic Bachelor 2011-02-10 Employed M 22139 \n","3 Extended College 2011-01-11 Employed M 49078 \n","4 Premium Bachelor 2011-01-17 Medical Leave F 23675 \n","... ... ... ... ... ... ... \n","10905 Premium Bachelor 2011-01-19 Unemployed F 0 \n","10906 Basic College 2011-01-06 Employed F 61146 \n","10907 Extended Bachelor 2011-02-06 Employed F 39837 \n","10908 Premium College 2011-02-13 Employed F 64195 \n","10909 Basic Bachelor 2011-01-08 Unemployed M 0 \n","\n"," location_code marital_status monthly_premium_auto \\\n","0 Suburban Married 61 \n","1 Suburban Single 64 \n","2 Suburban Single 100 \n","3 Suburban Single 97 \n","4 Suburban Married 117 \n","... ... ... ... \n","10905 Suburban Married 253 \n","10906 Urban Married 65 \n","10907 Rural Married 201 \n","10908 Urban Divorced 158 \n","10909 Suburban Single 101 \n","\n"," months_since_last_claim months_since_policy_inception \\\n","0 7.000000 52 \n","1 3.000000 26 \n","2 34.000000 31 \n","3 10.000000 3 \n","4 15.149071 31 \n","... ... ... \n","10905 15.149071 40 \n","10906 7.000000 68 \n","10907 11.000000 63 \n","10908 0.000000 27 \n","10909 31.000000 1 \n","\n"," number_of_open_complaints number_of_policies policy_type \\\n","0 0.000000 9 Corporate Auto \n","1 0.000000 1 Personal Auto \n","2 0.000000 2 Personal Auto \n","3 0.000000 2 Corporate Auto \n","4 0.384256 7 Personal Auto \n","... ... ... ... \n","10905 0.384256 7 Personal Auto \n","10906 0.000000 6 Personal Auto \n","10907 0.000000 2 Corporate Auto \n","10908 4.000000 6 Personal Auto \n","10909 0.000000 3 Personal Auto \n","\n"," policy renew_offer_type sales_channel total_claim_amount \\\n","0 Corporate L3 Offer3 Agent 292.800000 \n","1 Personal L3 Offer4 Call Center 744.924331 \n","2 Personal L3 Offer3 Call Center 480.000000 \n","3 Corporate L3 Offer2 Branch 484.013411 \n","4 Personal L2 Offer1 Branch 707.925645 \n","... ... ... ... ... \n","10905 Personal L1 Offer3 Web 1214.400000 \n","10906 Personal L3 Offer2 Branch 273.018929 \n","10907 Corporate L3 Offer1 Web 381.306996 \n","10908 Personal L1 Offer1 Branch 618.288849 \n","10909 Personal L1 Offer4 Web 1021.719397 \n","\n"," vehicle_class vehicle_size vehicle_type month \n","0 Four-Door Car Medsize A 2 \n","1 Four-Door Car Medsize A 1 \n","2 SUV Medsize A 2 \n","3 Four-Door Car Medsize A 1 \n","4 Four-Door Car Medsize A 1 \n","... ... ... ... ... \n","10905 Luxury Car Medsize A 1 \n","10906 Four-Door Car Medsize A 1 \n","10907 Luxury SUV Medsize A 2 \n","10908 SUV Medsize A 2 \n","10909 SUV Medsize A 1 \n","\n","[10910 rows x 27 columns]"],"text/html":["\n","
\n","
\n","\n","
\n"," \n","
\n","
\n","
unnamed:_0
\n","
customer
\n","
state
\n","
customer_lifetime_value
\n","
response
\n","
coverage
\n","
education
\n","
effective_to_date
\n","
employmentstatus
\n","
gender
\n","
income
\n","
location_code
\n","
marital_status
\n","
monthly_premium_auto
\n","
months_since_last_claim
\n","
months_since_policy_inception
\n","
number_of_open_complaints
\n","
number_of_policies
\n","
policy_type
\n","
policy
\n","
renew_offer_type
\n","
sales_channel
\n","
total_claim_amount
\n","
vehicle_class
\n","
vehicle_size
\n","
vehicle_type
\n","
month
\n","
\n"," \n"," \n","
\n","
0
\n","
0
\n","
DK49336
\n","
Arizona
\n","
4809.216960
\n","
No
\n","
Basic
\n","
College
\n","
2011-02-18
\n","
Employed
\n","
M
\n","
48029
\n","
Suburban
\n","
Married
\n","
61
\n","
7.000000
\n","
52
\n","
0.000000
\n","
9
\n","
Corporate Auto
\n","
Corporate L3
\n","
Offer3
\n","
Agent
\n","
292.800000
\n","
Four-Door Car
\n","
Medsize
\n","
A
\n","
2
\n","
\n","
\n","
1
\n","
1
\n","
KX64629
\n","
California
\n","
2228.525238
\n","
No
\n","
Basic
\n","
College
\n","
2011-01-18
\n","
Unemployed
\n","
F
\n","
0
\n","
Suburban
\n","
Single
\n","
64
\n","
3.000000
\n","
26
\n","
0.000000
\n","
1
\n","
Personal Auto
\n","
Personal L3
\n","
Offer4
\n","
Call Center
\n","
744.924331
\n","
Four-Door Car
\n","
Medsize
\n","
A
\n","
1
\n","
\n","
\n","
2
\n","
2
\n","
LZ68649
\n","
Washington
\n","
14947.917300
\n","
No
\n","
Basic
\n","
Bachelor
\n","
2011-02-10
\n","
Employed
\n","
M
\n","
22139
\n","
Suburban
\n","
Single
\n","
100
\n","
34.000000
\n","
31
\n","
0.000000
\n","
2
\n","
Personal Auto
\n","
Personal L3
\n","
Offer3
\n","
Call Center
\n","
480.000000
\n","
SUV
\n","
Medsize
\n","
A
\n","
2
\n","
\n","
\n","
3
\n","
3
\n","
XL78013
\n","
Oregon
\n","
22332.439460
\n","
Yes
\n","
Extended
\n","
College
\n","
2011-01-11
\n","
Employed
\n","
M
\n","
49078
\n","
Suburban
\n","
Single
\n","
97
\n","
10.000000
\n","
3
\n","
0.000000
\n","
2
\n","
Corporate Auto
\n","
Corporate L3
\n","
Offer2
\n","
Branch
\n","
484.013411
\n","
Four-Door Car
\n","
Medsize
\n","
A
\n","
1
\n","
\n","
\n","
4
\n","
4
\n","
QA50777
\n","
Oregon
\n","
9025.067525
\n","
No
\n","
Premium
\n","
Bachelor
\n","
2011-01-17
\n","
Medical Leave
\n","
F
\n","
23675
\n","
Suburban
\n","
Married
\n","
117
\n","
15.149071
\n","
31
\n","
0.384256
\n","
7
\n","
Personal Auto
\n","
Personal L2
\n","
Offer1
\n","
Branch
\n","
707.925645
\n","
Four-Door Car
\n","
Medsize
\n","
A
\n","
1
\n","
\n","
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
\n","
\n","
10905
\n","
10905
\n","
FE99816
\n","
Nevada
\n","
15563.369440
\n","
No
\n","
Premium
\n","
Bachelor
\n","
2011-01-19
\n","
Unemployed
\n","
F
\n","
0
\n","
Suburban
\n","
Married
\n","
253
\n","
15.149071
\n","
40
\n","
0.384256
\n","
7
\n","
Personal Auto
\n","
Personal L1
\n","
Offer3
\n","
Web
\n","
1214.400000
\n","
Luxury Car
\n","
Medsize
\n","
A
\n","
1
\n","
\n","
\n","
10906
\n","
10906
\n","
KX53892
\n","
Oregon
\n","
5259.444853
\n","
No
\n","
Basic
\n","
College
\n","
2011-01-06
\n","
Employed
\n","
F
\n","
61146
\n","
Urban
\n","
Married
\n","
65
\n","
7.000000
\n","
68
\n","
0.000000
\n","
6
\n","
Personal Auto
\n","
Personal L3
\n","
Offer2
\n","
Branch
\n","
273.018929
\n","
Four-Door Car
\n","
Medsize
\n","
A
\n","
1
\n","
\n","
\n","
10907
\n","
10907
\n","
TL39050
\n","
Arizona
\n","
23893.304100
\n","
No
\n","
Extended
\n","
Bachelor
\n","
2011-02-06
\n","
Employed
\n","
F
\n","
39837
\n","
Rural
\n","
Married
\n","
201
\n","
11.000000
\n","
63
\n","
0.000000
\n","
2
\n","
Corporate Auto
\n","
Corporate L3
\n","
Offer1
\n","
Web
\n","
381.306996
\n","
Luxury SUV
\n","
Medsize
\n","
A
\n","
2
\n","
\n","
\n","
10908
\n","
10908
\n","
WA60547
\n","
California
\n","
11971.977650
\n","
No
\n","
Premium
\n","
College
\n","
2011-02-13
\n","
Employed
\n","
F
\n","
64195
\n","
Urban
\n","
Divorced
\n","
158
\n","
0.000000
\n","
27
\n","
4.000000
\n","
6
\n","
Personal Auto
\n","
Personal L1
\n","
Offer1
\n","
Branch
\n","
618.288849
\n","
SUV
\n","
Medsize
\n","
A
\n","
2
\n","
\n","
\n","
10909
\n","
10909
\n","
IV32877
\n","
California
\n","
6857.519928
\n","
No
\n","
Basic
\n","
Bachelor
\n","
2011-01-08
\n","
Unemployed
\n","
M
\n","
0
\n","
Suburban
\n","
Single
\n","
101
\n","
31.000000
\n","
1
\n","
0.000000
\n","
3
\n","
Personal Auto
\n","
Personal L1
\n","
Offer4
\n","
Web
\n","
1021.719397
\n","
SUV
\n","
Medsize
\n","
A
\n","
1
\n","
\n"," \n","
\n","
10910 rows × 27 columns
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df"}},"metadata":{},"execution_count":81}],"source":["# Your code goes here\n","url = 'https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis_clean.csv'\n","\n","df = pd.read_csv(url)\n","\n","pd.set_option('display.max_columns', None)\n","\n","\n","df"]},{"cell_type":"code","source":["#1\n","\n","df.pivot_table(index='sales_channel',values='income', aggfunc='sum')\n","\n","#CONCLUSIONS: Based on the pivot table, the \"Agent\" sales channel generates the highest total income by a significant margin, reaching over 152 million. This suggests that agents are the most effective channel in terms of revenue generation."],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"aQ6bQSmbtsb8","executionInfo":{"status":"ok","timestamp":1754238483772,"user_tz":-120,"elapsed":26,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"33d651af-70ad-468c-899a-65b8c1c1cf28"},"id":"aQ6bQSmbtsb8","execution_count":85,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" income\n","sales_channel \n","Agent 152490152\n","Branch 113775608\n","Call Center 81055004\n","Web 62200103"],"text/html":["\n","
\n","
\n","\n","
\n"," \n","
\n","
\n","
income
\n","
\n","
\n","
sales_channel
\n","
\n","
\n"," \n"," \n","
\n","
Agent
\n","
152490152
\n","
\n","
\n","
Branch
\n","
113775608
\n","
\n","
\n","
Call Center
\n","
81055004
\n","
\n","
\n","
Web
\n","
62200103
\n","
\n"," \n","
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"sales_channel\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Branch\",\n \"Web\",\n \"Agent\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"income\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 39623506,\n \"min\": 62200103,\n \"max\": 152490152,\n \"num_unique_values\": 4,\n \"samples\": [\n 113775608,\n 62200103,\n 152490152\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":85}]},{"cell_type":"code","source":["#2\n","\n","df.pivot_table(index='education',columns=['gender'],values = 'customer_lifetime_value', aggfunc='mean').round(2)\n","\n","#Conclusions:ChatGPT Plus\n","#Customers with lower education levels (e.g., \"High School or Below\") show the highest average lifetime value.\n","#In most education categories, females tend to have slightly higher customer lifetime value than males.\n","#The lowest averages are observed among those with doctoral degrees."],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":238},"id":"f5huYbDxxH8h","executionInfo":{"status":"ok","timestamp":1754238977510,"user_tz":-120,"elapsed":42,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"343f8ac5-f3a0-4aff-e7ab-ce6eb2f3b9a5"},"id":"f5huYbDxxH8h","execution_count":91,"outputs":[{"output_type":"execute_result","data":{"text/plain":["gender F M\n","education \n","Bachelor 7874.27 7703.60\n","College 7748.82 8052.46\n","Doctor 7328.51 7415.33\n","High School or Below 8675.22 8149.69\n","Master 8157.05 8168.83"],"text/html":["\n","
\n","
\n","\n","
\n"," \n","
\n","
gender
\n","
F
\n","
M
\n","
\n","
\n","
education
\n","
\n","
\n","
\n"," \n"," \n","
\n","
Bachelor
\n","
7874.27
\n","
7703.60
\n","
\n","
\n","
College
\n","
7748.82
\n","
8052.46
\n","
\n","
\n","
Doctor
\n","
7328.51
\n","
7415.33
\n","
\n","
\n","
High School or Below
\n","
8675.22
\n","
8149.69
\n","
\n","
\n","
Master
\n","
8157.05
\n","
8168.83
\n","
\n"," \n","
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"df\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"education\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"College\",\n \"Master\",\n \"Doctor\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"F\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 500.2605723320595,\n \"min\": 7328.51,\n \"max\": 8675.22,\n \"num_unique_values\": 5,\n \"samples\": [\n 7748.82,\n 8157.05,\n 7328.51\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"M\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 328.3733865129754,\n \"min\": 7415.33,\n \"max\": 8168.83,\n \"num_unique_values\": 5,\n \"samples\": [\n 8052.46,\n 8168.83,\n 7415.33\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":91}]},{"cell_type":"markdown","id":"df35fd0d-513e-4e77-867e-429da10a9cc7","metadata":{"id":"df35fd0d-513e-4e77-867e-429da10a9cc7"},"source":["1. You work at the marketing department and you want to know which sales channel brought the most sales in terms of total revenue. Using pivot, create a summary table showing the total revenue for each sales channel (branch, call center, web, and mail).\n","Round the total revenue to 2 decimal points. Analyze the resulting table to draw insights."]},{"cell_type":"markdown","id":"640993b2-a291-436c-a34d-a551144f8196","metadata":{"id":"640993b2-a291-436c-a34d-a551144f8196"},"source":["2. Create a pivot table that shows the average customer lifetime value per gender and education level. Analyze the resulting table to draw insights."]},{"cell_type":"markdown","id":"32c7f2e5-3d90-43e5-be33-9781b6069198","metadata":{"id":"32c7f2e5-3d90-43e5-be33-9781b6069198"},"source":["## Bonus\n","\n","You work at the customer service department and you want to know which months had the highest number of complaints by policy type category. Create a summary table showing the number of complaints by policy type and month.\n","Show it in a long format table."]},{"cell_type":"markdown","id":"e3d09a8f-953c-448a-a5f8-2e5a8cca7291","metadata":{"id":"e3d09a8f-953c-448a-a5f8-2e5a8cca7291"},"source":["*In data analysis, a long format table is a way of structuring data in which each observation or measurement is stored in a separate row of the table. The key characteristic of a long format table is that each column represents a single variable, and each row represents a single observation of that variable.*\n","\n","*More information about long and wide format tables here: https://www.statology.org/long-vs-wide-data/*"]},{"cell_type":"code","execution_count":94,"id":"3a069e0b-b400-470e-904d-d17582191be4","metadata":{"id":"3a069e0b-b400-470e-904d-d17582191be4","colab":{"base_uri":"https://localhost:8080/","height":238},"executionInfo":{"status":"ok","timestamp":1754240012981,"user_tz":-120,"elapsed":23,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"e03c212a-c270-4de5-9249-11d1a7f0a839"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" policy_type month number_of_open_complaints\n","0 Corporate Auto 1 443.434952\n","1 Corporate Auto 2 385.208135\n","2 Personal Auto 1 1727.605722\n","3 Personal Auto 2 1453.684441\n","4 Special Auto 1 87.074049\n","5 Special Auto 2 95.226817"],"text/html":["\n","