From a6059751b9c4ba7e4410246151b82ee93d3fc9f7 Mon Sep 17 00:00:00 2001 From: Julia Date: Thu, 18 Sep 2025 14:13:35 +0100 Subject: [PATCH] Solved Lab --- .../README.md | 113 ++++++++++++++++++ .../answers.sql | 87 ++++++++++++++ 2 files changed, 200 insertions(+) create mode 100644 lab-sql-aggregation-and-transformation/README.md create mode 100644 lab-sql-aggregation-and-transformation/answers.sql diff --git a/lab-sql-aggregation-and-transformation/README.md b/lab-sql-aggregation-and-transformation/README.md new file mode 100644 index 0000000..61abd5c --- /dev/null +++ b/lab-sql-aggregation-and-transformation/README.md @@ -0,0 +1,113 @@ +![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png) + +# Lab | SQL Data Aggregation and Transformation + +
+ +

Learning Goals

+
+ + This lab allows you to practice and apply the concepts and techniques taught in class. + + Upon completion of this lab, you will be able to: + +- Use SQL built-in functions such as COUNT, MAX, MIN, AVG to aggregate and summarize data, and use GROUP BY to group data by specific columns. Use the HAVING clause to filter data based on aggregate functions. +- Use SQL to clean, transform, and prepare data for analysis by handling duplicates, null values, renaming columns, and converting data types. Use functions like ROUND, DATE_DIFF, CONCAT, and SUBSTRING to manipulate data and generate insights. +- Use conditional expressions for creating new columns. + + +
+
+ +
+ +
+ +

Prerequisites

+
+ +Before this starting this lab, you should have learnt about: + +- SELECT, FROM, ORDER BY, LIMIT, WHERE, GROUP BY, and HAVING clauses. +- DISTINCT keyword to return only unique values, AS keyword for using aliases. +- Built-in SQL functions such as COUNT, MAX, MIN, AVG, ROUND, DATEDIFF, or DATE_FORMAT. +- CASE statement for conditional logic. +
+
+ +
+ + +## Introduction + +Welcome to the SQL Data Aggregation and Transformation lab! + +In this lab, you will practice how to use SQL queries to extract insights from the [Sakila](https://dev.mysql.com/doc/sakila/en/) database which contains information about movie rentals. + +You will build on your SQL skills by practicing how to use the `GROUP BY` and `HAVING` clauses to group data and filter results based on aggregate values. You will also practice how to handle null values, rename columns, and use built-in functions like `MAX`, `MIN`, `ROUND`, `DATE_DIFF`, `CONCAT`, and `SUBSTRING` to manipulate and transform data for generating insights. + +Throughout the lab, you will work with two SQL query files: `sakila-schema.sql`, which creates the database schema, and `sakila-data.sql`, which inserts the data into the database. You can download the necessary files locally by following the steps listed in [Sakila sample database - installation](https://dev.mysql.com/doc/sakila/en/sakila-installation.html). + +You can also refer to the Entity Relationship Diagram (ERD) of the database to guide your analysis: + +
+ +![DB schema](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/database-sakila-schema.png) + +

+ +Imagine you work at a movie rental company as an analyst. By using SQL in the challenges below, you are required to gain insights into different elements of its business operations. + +## Challenge 1 + +1. You need to use SQL built-in functions to gain insights relating to the duration of movies: + - 1.1 Determine the **shortest and longest movie durations** and name the values as `max_duration` and `min_duration`. + - 1.2. Express the **average movie duration in hours and minutes**. Don't use decimals. + - *Hint: Look for floor and round functions.* +2. You need to gain insights related to rental dates: + - 2.1 Calculate the **number of days that the company has been operating**. + - *Hint: To do this, use the `rental` table, and the `DATEDIFF()` function to subtract the earliest date in the `rental_date` column from the latest date.* + - 2.2 Retrieve rental information and add two additional columns to show the **month and weekday of the rental**. Return 20 rows of results. + - 2.3 *Bonus: Retrieve rental information and add an additional column called `DAY_TYPE` with values **'weekend' or 'workday'**, depending on the day of the week.* + - *Hint: use a conditional expression.* +3. You need to ensure that customers can easily access information about the movie collection. To achieve this, retrieve the **film titles and their rental duration**. If any rental duration value is **NULL, replace** it with the string **'Not Available'**. Sort the results of the film title in ascending order. + - *Please note that even if there are currently no null values in the rental duration column, the query should still be written to handle such cases in the future.* + - *Hint: Look for the `IFNULL()` function.* + +4. *Bonus: The marketing team for the movie rental company now needs to create a personalized email campaign for customers. To achieve this, you need to retrieve the **concatenated first and last names of customers**, along with the **first 3 characters of their email** address, so that you can address them by their first name and use their email address to send personalized recommendations. The results should be ordered by last name in ascending order to make it easier to use the data.* + +## Challenge 2 + +1. Next, you need to analyze the films in the collection to gain some more insights. Using the `film` table, determine: + - 1.1 The **total number of films** that have been released. + - 1.2 The **number of films for each rating**. + - 1.3 The **number of films for each rating, sorting** the results in descending order of the number of films. + This will help you to better understand the popularity of different film ratings and adjust purchasing decisions accordingly. +2. Using the `film` table, determine: + - 2.1 The **mean film duration for each rating**, and sort the results in descending order of the mean duration. Round off the average lengths to two decimal places. This will help identify popular movie lengths for each category. + - 2.2 Identify **which ratings have a mean duration of over two hours** in order to help select films for customers who prefer longer movies. +3. *Bonus: determine which last names are not repeated in the table `actor`.* + + + +## Requirements + +- Fork this repo +- Clone it to your machine + + +## Getting Started + +Complete the challenges in this readme in a `.sql`file. + +## Submission + +- Upon completion, run the following commands: + +```bash +git add . +git commit -m "Solved lab" +git push origin master +``` + +- Paste the link of your lab in Student Portal. diff --git a/lab-sql-aggregation-and-transformation/answers.sql b/lab-sql-aggregation-and-transformation/answers.sql new file mode 100644 index 0000000..ef79249 --- /dev/null +++ b/lab-sql-aggregation-and-transformation/answers.sql @@ -0,0 +1,87 @@ +USE SAKILA; + +#Challenge 1.1.1 +SELECT MAX(length) +AS max_duration +FROM film; + +SELECT MIN(length) +AS min_duration +FROM film; + +#Challenge 1.1.2 +SELECT ROUND(AVG(length)) +FROM film; + +#Challenge 1.2.1 +SELECT DATEDIFF( + (SELECT MAX(rental_date) FROM rental), + (SELECT MIN(rental_date) FROM rental) + ) AS operating_days; + +#Challenge 1.2.2 +SELECT *, +MONTH(rental_date) AS rental_month, +DAYNAME(rental_date) AS rental_weekday +FROM rental +LIMIT 20; + +#Challenge 1.2.3 +SELECT *, +DAYNAME(rental_date) AS rental_weekday, +CASE + WHEN DAYOFWEEK(rental_date) IN (1, 7) THEN 'weekend' + ELSE 'workday' + END AS day_type +FROM rental; + +#Challenge 1.3 +SELECT + title, + IFNULL(rental_duration, 'Not Available') AS rental_duration +FROM film +ORDER BY title ASC; + +#Challenge 1.4 +SELECT + CONCAT(first_name, ' ', last_name) AS full_name, + LEFT(email, 3) AS email_prefix +FROM customer +ORDER BY last_name ASC; + +#Challenge 2.1.1 +SELECT COUNT(*) +FROM film; + +#Challenge 2.1.2 +SELECT COUNT(*), rating +FROM film +GROUP BY rating; + +#Challenge 2.1.3 +SELECT COUNT(*) AS number_films_rating, rating +FROM film +GROUP BY rating +ORDER BY number_films_rating ASC; + +#Challenge 2.2.1 +SELECT + ROUND(AVG(length),2) AS avg_lenght_films, + rating +FROM film +GROUP BY rating +ORDER BY avg_lenght_films DESC; + +#Challenge 2.2.2 +SELECT + ROUND(AVG(length),2) AS avg_lenght_films, + rating +FROM film +GROUP BY rating +HAVING AVG(length) > 120 +ORDER BY avg_lenght_films DESC; + +#Challenge 2.3 +SELECT DISTINCT + last_name AS unique_last_name +FROM actor; \ No newline at end of file