- Project Owner: @dark-teal-coder
- First Published Date: 2022-12-19
- Title: Python Data Analysis of Tech Gadget Sales with Pandas
- Difficulty:
- Beginner
- Intermediate
- Advanced
- Scale:
- Small
- Medium
- Large
This repository contains a Jupyter notebook which demonstrates how to analyze tech gadget sales in the US in 2019. We use the Python Pandas and Matplotlib libraries to analyze and answer business questions about 12 months worth of sales data here. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.
We walk through different Pandas & Matplotlib methods below.
- Concatenating multiple CSVs together to create a new DataFrame (
pd.concat()
) - Adding columns
- Parsing cells as strings to make new columns (
.str
) - Using the
apply()
method - Using
groupby()
to perform aggregate analysis - Plotting bar charts and lines graphs to visualize our results
- Labeling our graphs
- Python 3
- Python Package Installer/Manager
pip
- If you installed Python from python.org, you should already have
pip
. If it is not installed, you can use the commandpy -m ensurepip --default-pip
to bootstrap it from the standard library. If you are using Linux, you will have to install the package manager separately. You can find out more about thepip
tool here.
- If you installed Python from python.org, you should already have
- Text Editor and Integrated Development Environment (IDE)
- Command-line interface (CLI)
- You can install the open-source PowerShell on Windows, Linux and macOS if you do not have or want to use a pre-installed CLI on your local machine.
Check if you have Python installed using the command python --version
, or simply, python version
, in the CLI. Git-clone the project repository from Github to the local machine. Use the command py -m pip install package_name
to install the necessary Python libraries. Check out pip documentation to learn more about pip install
. Check the top part of the .py
script file for the list of libraries required. For example, you may need requests
and beautifulsoup4
libraries if you see the following lines in the top part of the script file:
import requests
from bs4 import BeautifulSoup
If pip
fails to locate the relevant packages, you may find it at Python Package Index (PyPI). Use python file_name.py
to run the script in a CLI. Or, use an IDE, such as VS Code, to run the script. There will usually be a [Run] button in the top right corner of the opened script file.
- Click [Code]
- Click [Download ZIP]
- Extract the .zip file to the working directory
To access all of the files, fork this repo and then clone it locally.
For more information, please refer to Fork a repo.
- Open a command-line interface
- Type
pip install pandas
- Press [Enter]
For more information, please refer to Installing Pandas.
Prerequisite: Python1
- Run
pip3 install --upgrade pip
to upgrade to the latest version ofpip
- Run
pip3 install jupyter
to install Jupyter Notebook
For more information, please refer to Installing the Classic Jupyter Notebook Interface.
- pandas.DataFrame.any documentation
- pandas.DataFrame.dropna documentation
- pandas.to_numeric documentation
- pandas.to_datetime documentation
- pandas.Series.dt.month documentation
- pandas.DataFrame.groupby documentation
- matplotlib.pyplot.plot documentation
- matplotlib.pyplot.grid documentation
- pandas.DataFrame.duplicated documentation
- pandas.DataFrame.transform documentation
- itertools.combinations documentation
- itertools.combinations() in Python
- collections.Counter documentation
- Python's Counter: The Pythonic Way to Count Objects
- Update Method Of Counter Class
- matplotlib.pyplot.subplots documentation
- matplotlib.axes.Axes.twinx documentation
1st Completion Date: Dec 20, 2022
Footnotes
-
Python is a requirement for installing the Jupyter Notebook. ↩