diff --git a/lab-web-scraping.ipynb b/lab-web-scraping.ipynb index e552783..b3b5581 100644 --- a/lab-web-scraping.ipynb +++ b/lab-web-scraping.ipynb @@ -1,148 +1,472 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "7e7a1ab8-2599-417d-9a65-25ef07f3a786", - "metadata": { - "id": "7e7a1ab8-2599-417d-9a65-25ef07f3a786" - }, - "source": [ - "# Lab | Web Scraping" - ] - }, - { - "cell_type": "markdown", - "id": "ce8882fc-4815-4567-92fa-b4816358ba7d", - "metadata": { - "id": "ce8882fc-4815-4567-92fa-b4816358ba7d" - }, - "source": [ - "Welcome to the \"Books to Scrape\" Web Scraping Adventure Lab!\n", - "\n", - "**Objective**\n", - "\n", - "In this lab, we will embark on a mission to unearth valuable insights from the data available on Books to Scrape, an online platform showcasing a wide variety of books. As data analyst, you have been tasked with scraping a specific subset of book data from Books to Scrape to assist publishing companies in understanding the landscape of highly-rated books across different genres. Your insights will help shape future book marketing strategies and publishing decisions.\n", - "\n", - "**Background**\n", - "\n", - "In a world where data has become the new currency, businesses are leveraging big data to make informed decisions that drive success and profitability. The publishing industry, much like others, utilizes data analytics to understand market trends, reader preferences, and the performance of books based on factors such as genre, author, and ratings. Books to Scrape serves as a rich source of such data, offering detailed information about a diverse range of books, making it an ideal platform for extracting insights to aid in informed decision-making within the literary world.\n", - "\n", - "**Task**\n", - "\n", - "Your task is to create a Python script using BeautifulSoup and pandas to scrape Books to Scrape book data, focusing on book ratings and genres. The script should be able to filter books with ratings above a certain threshold and in specific genres. Additionally, the script should structure the scraped data in a tabular format using pandas for further analysis.\n", - "\n", - "**Expected Outcome**\n", - "\n", - "A function named `scrape_books` that takes two parameters: `min_rating` and `max_price`. The function should scrape book data from the \"Books to Scrape\" website and return a `pandas` DataFrame with the following columns:\n", - "\n", - "**Expected Outcome**\n", - "\n", - "- A function named `scrape_books` that takes two parameters: `min_rating` and `max_price`.\n", - "- The function should return a DataFrame with the following columns:\n", - " - **UPC**: The Universal Product Code (UPC) of the book.\n", - " - **Title**: The title of the book.\n", - " - **Price (£)**: The price of the book in pounds.\n", - " - **Rating**: The rating of the book (1-5 stars).\n", - " - **Genre**: The genre of the book.\n", - " - **Availability**: Whether the book is in stock or not.\n", - " - **Description**: A brief description or product description of the book (if available).\n", - " \n", - "You will execute this script to scrape data for books with a minimum rating of `4.0 and above` and a maximum price of `£20`. \n", - "\n", - "Remember to experiment with different ratings and prices to ensure your code is versatile and can handle various searches effectively!\n", - "\n", - "**Resources**\n", - "\n", - "- [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n", - "- [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/index.html)\n", - "- [Books to Scrape](https://books.toscrape.com/)\n" - ] - }, - { - "cell_type": "markdown", - "id": "3519921d-5890-445b-9a33-934ed8ee378c", - "metadata": { - "id": "3519921d-5890-445b-9a33-934ed8ee378c" - }, - "source": [ - "**Hint**\n", - "\n", - "Your first mission is to familiarize yourself with the **Books to Scrape** website. Navigate to [Books to Scrape](http://books.toscrape.com/) and explore the available books to understand their layout and structure. \n", - "\n", - "Next, think about how you can set parameters for your data extraction:\n", - "\n", - "- **Minimum Rating**: Focus on books with a rating of 4.0 and above.\n", - "- **Maximum Price**: Filter for books priced up to £20.\n", - "\n", - "After reviewing the site, you can construct a plan for scraping relevant data. Pay attention to the details displayed for each book, including the title, price, rating, and availability. This will help you identify the correct HTML elements to target with your scraping script.\n", - "\n", - "Make sure to build your scraping URL and logic based on the patterns you observe in the HTML structure of the book listings!" - ] - }, + "cells": [ + { + "cell_type": "markdown", + "id": "7e7a1ab8-2599-417d-9a65-25ef07f3a786", + "metadata": { + "id": "7e7a1ab8-2599-417d-9a65-25ef07f3a786" + }, + "source": [ + "# Lab | Web Scraping" + ] + }, + { + "cell_type": "markdown", + "id": "ce8882fc-4815-4567-92fa-b4816358ba7d", + "metadata": { + "id": "ce8882fc-4815-4567-92fa-b4816358ba7d" + }, + "source": [ + "Welcome to the \"Books to Scrape\" Web Scraping Adventure Lab!\n", + "\n", + "**Objective**\n", + "\n", + "In this lab, we will embark on a mission to unearth valuable insights from the data available on Books to Scrape, an online platform showcasing a wide variety of books. As data analyst, you have been tasked with scraping a specific subset of book data from Books to Scrape to assist publishing companies in understanding the landscape of highly-rated books across different genres. Your insights will help shape future book marketing strategies and publishing decisions.\n", + "\n", + "**Background**\n", + "\n", + "In a world where data has become the new currency, businesses are leveraging big data to make informed decisions that drive success and profitability. The publishing industry, much like others, utilizes data analytics to understand market trends, reader preferences, and the performance of books based on factors such as genre, author, and ratings. Books to Scrape serves as a rich source of such data, offering detailed information about a diverse range of books, making it an ideal platform for extracting insights to aid in informed decision-making within the literary world.\n", + "\n", + "**Task**\n", + "\n", + "Your task is to create a Python script using BeautifulSoup and pandas to scrape Books to Scrape book data, focusing on book ratings and genres. The script should be able to filter books with ratings above a certain threshold and in specific genres. Additionally, the script should structure the scraped data in a tabular format using pandas for further analysis.\n", + "\n", + "**Expected Outcome**\n", + "\n", + "A function named `scrape_books` that takes two parameters: `min_rating` and `max_price`. The function should scrape book data from the \"Books to Scrape\" website and return a `pandas` DataFrame with the following columns:\n", + "\n", + "**Expected Outcome**\n", + "\n", + "- A function named `scrape_books` that takes two parameters: `min_rating` and `max_price`.\n", + "- The function should return a DataFrame with the following columns:\n", + " - **UPC**: The Universal Product Code (UPC) of the book.\n", + " - **Title**: The title of the book.\n", + " - **Price (£)**: The price of the book in pounds.\n", + " - **Rating**: The rating of the book (1-5 stars).\n", + " - **Genre**: The genre of the book.\n", + " - **Availability**: Whether the book is in stock or not.\n", + " - **Description**: A brief description or product description of the book (if available).\n", + " \n", + "You will execute this script to scrape data for books with a minimum rating of `4.0 and above` and a maximum price of `£20`. \n", + "\n", + "Remember to experiment with different ratings and prices to ensure your code is versatile and can handle various searches effectively!\n", + "\n", + "**Resources**\n", + "\n", + "- [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n", + "- [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/index.html)\n", + "- [Books to Scrape](https://books.toscrape.com/)\n" + ] + }, + { + "cell_type": "markdown", + "id": "3519921d-5890-445b-9a33-934ed8ee378c", + "metadata": { + "id": "3519921d-5890-445b-9a33-934ed8ee378c" + }, + "source": [ + "**Hint**\n", + "\n", + "Your first mission is to familiarize yourself with the **Books to Scrape** website. Navigate to [Books to Scrape](http://books.toscrape.com/) and explore the available books to understand their layout and structure. \n", + "\n", + "Next, think about how you can set parameters for your data extraction:\n", + "\n", + "- **Minimum Rating**: Focus on books with a rating of 4.0 and above.\n", + "- **Maximum Price**: Filter for books priced up to £20.\n", + "\n", + "After reviewing the site, you can construct a plan for scraping relevant data. Pay attention to the details displayed for each book, including the title, price, rating, and availability. This will help you identify the correct HTML elements to target with your scraping script.\n", + "\n", + "Make sure to build your scraping URL and logic based on the patterns you observe in the HTML structure of the book listings!" + ] + }, + { + "cell_type": "markdown", + "id": "25a83a0d-a742-49f6-985e-e27887cbf922", + "metadata": { + "id": "25a83a0d-a742-49f6-985e-e27887cbf922" + }, + "source": [ + "\n", + "---\n", + "\n", + "**Best of luck! Immerse yourself in the world of books, and may the data be with you!**" + ] + }, + { + "cell_type": "markdown", + "id": "7b75cf0d-9afa-4eec-a9e2-befeac68b2a0", + "metadata": { + "id": "7b75cf0d-9afa-4eec-a9e2-befeac68b2a0" + }, + "source": [ + "**Important Note**:\n", + "\n", + "In the fast-changing online world, websites often update and change their structures. When you try this lab, the **Books to Scrape** website might differ from what you expect.\n", + "\n", + "If you encounter issues due to these changes, like new rules or obstacles preventing data extraction, don’t worry! Get creative.\n", + "\n", + "You can choose another website that interests you and is suitable for scraping data. Options like Wikipedia, The New York Times, or even library databases are great alternatives. The main goal remains the same: extract useful data and enhance your web scraping skills while exploring a source of information you enjoy. This is your opportunity to practice and adapt to different web environments!" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "40359eee-9cd7-4884-bfa4-83344c222305", + "metadata": { + "id": "40359eee-9cd7-4884-bfa4-83344c222305" + }, + "outputs": [ { - "cell_type": "markdown", - "id": "25a83a0d-a742-49f6-985e-e27887cbf922", - "metadata": { - "id": "25a83a0d-a742-49f6-985e-e27887cbf922" - }, - "source": [ - "\n", - "---\n", - "\n", - "**Best of luck! Immerse yourself in the world of books, and may the data be with you!**" + "data": { + "text/plain": [ + "" ] - }, + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import requests\n", + "\n", + "from bs4 import BeautifulSoup\n", + "soup = BeautifulSoup(response.content, \"html.parser\")\n", + "\n", + "url = \"https://books.toscrape.com/index.html\"\n", + "response = requests.get(url)\n", + "response\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "45364471-5c97-44a1-98cd-8dace00fa66b", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "7b75cf0d-9afa-4eec-a9e2-befeac68b2a0", - "metadata": { - "id": "7b75cf0d-9afa-4eec-a9e2-befeac68b2a0" - }, - "source": [ - "**Important Note**:\n", - "\n", - "In the fast-changing online world, websites often update and change their structures. When you try this lab, the **Books to Scrape** website might differ from what you expect.\n", - "\n", - "If you encounter issues due to these changes, like new rules or obstacles preventing data extraction, don’t worry! Get creative.\n", - "\n", - "You can choose another website that interests you and is suitable for scraping data. Options like Wikipedia, The New York Times, or even library databases are great alternatives. The main goal remains the same: extract useful data and enhance your web scraping skills while exploring a source of information you enjoy. This is your opportunity to practice and adapt to different web environments!" + "data": { + "text/plain": [ + "{'Date': 'Thu, 11 Sep 2025 18:55:47 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Last-Modified': 'Wed, 08 Feb 2023 21:02:32 GMT', 'ETag': 'W/\"63e40de8-c85e\"', 'Strict-Transport-Security': 'max-age=0; includeSubDomains; preload', 'Content-Encoding': 'br'}" ] - }, + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response.headers\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "4239adfb-bc21-4e4b-af15-5a469ecc2584", + "metadata": {}, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "id": "40359eee-9cd7-4884-bfa4-83344c222305", - "metadata": { - "id": "40359eee-9cd7-4884-bfa4-83344c222305" - }, - "outputs": [], - "source": [ - "# Your solution goes here" - ] + "name": "stdout", + "output_type": "stream", + "text": [ + "text/html\n" + ] } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" + ], + "source": [ + "print(response.headers['Content-Type'])" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "dea4cc41-b004-4421-a9b0-b8396eecab3f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting the scraping process...\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-2.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-3.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-4.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-5.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-6.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-7.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-8.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-9.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-10.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-11.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-12.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-13.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-14.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-15.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-16.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-17.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-18.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-19.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-20.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-21.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-22.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-23.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-24.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-25.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-26.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-27.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-28.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-29.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-30.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-31.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-32.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-33.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-34.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-35.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-36.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-37.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-38.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-39.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-40.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-41.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-42.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-43.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-44.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-45.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-46.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-47.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-48.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-49.html\n", + "Navigating to next page: https://books.toscrape.com/catalogue/page-50.html\n", + "Scraping process finished.\n", + "\n", + "Found 75 books with a rating of 4+ stars and price under £20.00\n", + "\n", + " UPC \\\n", + "0 ce6396b0f23f6ecc \n", + "1 6258a1f6a6dcfe50 \n", + "2 6be3beb0793a53e7 \n", + "3 657fe5ead67a7767 \n", + "4 51653ef291ab7ddc \n", + ".. ... \n", + "70 9c96cd1329fbd82d \n", + "71 b78deb463531d078 \n", + "72 4280ac3eab57aa5d \n", + "73 29fc016c459aeb14 \n", + "74 19fec36a1dfb4c16 \n", + "\n", + " Title Price (£) \\\n", + "0 Set Me Free 17.46 \n", + "1 The Four Agreements: A Practical Guide to Personal Freedom 17.66 \n", + "2 Sophie's World 15.94 \n", + "3 Untitled Collection: Sabbath Poems 2014 14.27 \n", + "4 This One Summer 19.49 \n", + ".. ... ... \n", + "70 The Zombie Room 19.69 \n", + "71 The Silent Wife 12.34 \n", + "72 The Girl You Lost 12.29 \n", + "73 The Edge of Reason (Bridget Jones #2) 19.18 \n", + "74 A Spy's Devotion (The Regency Spies of London #1) 16.97 \n", + "\n", + " Rating Genre Availability \\\n", + "0 5 Young Adult In Stock \n", + "1 5 Spirituality In Stock \n", + "2 5 Philosophy In Stock \n", + "3 4 Poetry In Stock \n", + "4 4 Sequential Art In Stock \n", + ".. ... ... ... \n", + "70 5 Default In Stock \n", + "71 5 Fiction In Stock \n", + "72 5 Mystery In Stock \n", + "73 4 Womens Fiction In Stock \n", + "74 5 Historical Fiction In Stock \n", + "\n", + " Description \n", + "0 Aaron Ledbetter’s future had been planned out for him since before he was born. Each year, the Ledbetter family vacation on Tybee Island gave Aaron a chance to briefly free himself from his family’s expectations. When he meets Jonas “Lucky” Luckett, a caricature artist in town with the traveling carnival, he must choose between the life that’s been mapped out for him, and Aaron Ledbetter’s future had been planned out for him since before he was born. Each year, the Ledbetter family vacation on Tybee Island gave Aaron a chance to briefly free himself from his family’s expectations. When he meets Jonas “Lucky” Luckett, a caricature artist in town with the traveling carnival, he must choose between the life that’s been mapped out for him, and the chance at true love. ...more \n", + "1 In The Four Agreements, don Miguel Ruiz reveals the source of self-limiting beliefs that rob us of joy and create needless suffering. Based on ancient Toltec wisdom, the Four Agreements offer a powerful code of conduct that can rapidly transform our lives to a new experience of freedom, true happiness, and love. The Four Agreements are: Be Impeccable With Your Word, Don't In The Four Agreements, don Miguel Ruiz reveals the source of self-limiting beliefs that rob us of joy and create needless suffering. Based on ancient Toltec wisdom, the Four Agreements offer a powerful code of conduct that can rapidly transform our lives to a new experience of freedom, true happiness, and love. The Four Agreements are: Be Impeccable With Your Word, Don't Take Anything Personally, Don't Make Assumptions, Always Do Your Best. ...more \n", + "2 A page-turning novel that is also an exploration of the great philosophical concepts of Western thought, Sophie’s World has fired the imagination of readers all over the world, with more than twenty million copies in print.One day fourteen-year-old Sophie Amundsen comes home from school to find in her mailbox two notes, with one question on each: “Who are you?” and “Where A page-turning novel that is also an exploration of the great philosophical concepts of Western thought, Sophie’s World has fired the imagination of readers all over the world, with more than twenty million copies in print.One day fourteen-year-old Sophie Amundsen comes home from school to find in her mailbox two notes, with one question on each: “Who are you?” and “Where does the world come from?” From that irresistible beginning, Sophie becomes obsessed with questions that take her far beyond what she knows of her Norwegian village. Through those letters, she enrolls in a kind of correspondence course, covering Socrates to Sartre, with a mysterious philosopher, while receiving letters addressed to another girl. Who is Hilde? And why does her mail keep turning up? To unravel this riddle, Sophie must use the philosophy she is learning—but the truth turns out to be far more complicated than she could have imagined. ...more \n", + "3 More than thirty-five years ago, when the weather allowed, Wendell Berry began spending his sabbaths outdoors, walking and wandering around familiar territory, seeking a deep intimacy only time could provide. These walks arranged themselves into poems and each year since he has completed a sequence dated by the year of its composition. Last year we collected the lot into a More than thirty-five years ago, when the weather allowed, Wendell Berry began spending his sabbaths outdoors, walking and wandering around familiar territory, seeking a deep intimacy only time could provide. These walks arranged themselves into poems and each year since he has completed a sequence dated by the year of its composition. Last year we collected the lot into a collection, This Day, the Sabbath Poems 1979-2013. This new sequence for the following year is one of the richest yet. This group provides a virtual syllabus for all of Mr. Berry’s cultural and agricultural work in concentrated form. Many of these poems are drawn from the view from a small porch in the woods, a place of stillness and reflection, a vantage point “of the one/life of the forest composed/of uncountable lives in countless/years each life coherent itself within/ the coherence, the great composure,/of all.” A new collection of Wendell Berry poems is always an occasion of joyful celebration and this one is especially so. ...more \n", + "4 Every summer, Rose goes with her mom and dad to a lake house in Awago Beach. It's their getaway, their refuge. Rosie's friend Windy is always there, too, like the little sister she never had. But this summer is different. Rose's mom and dad won't stop fighting, and when Rose and Windy seek a distraction from the drama, they find themselves with a whole new set of problems. Every summer, Rose goes with her mom and dad to a lake house in Awago Beach. It's their getaway, their refuge. Rosie's friend Windy is always there, too, like the little sister she never had. But this summer is different. Rose's mom and dad won't stop fighting, and when Rose and Windy seek a distraction from the drama, they find themselves with a whole new set of problems. It's a summer of secrets and sorrow and growing up, and it's a good thing Rose and Windy have each other.In This One Summer two stellar creators redefine the teen graphic novel. Cousins Mariko and Jillian Tamaki, the team behind Skim, have collaborated on this gorgeous, heartbreaking, and ultimately hopeful story about a girl on the cusp of her teen age — a story of renewal and revelation. ...more \n", + ".. ... \n", + "70 An unlikely bond is forged between three men from very different backgrounds when they serve time together in prison. A series of wrong turns and disastrous life choices has led to their incarceration. Following their release, Mangle, Decker and Tazeem stick together as they return to a life of crime, embarking on a lucrative scam. But when they stumble upon a sophisticate An unlikely bond is forged between three men from very different backgrounds when they serve time together in prison. A series of wrong turns and disastrous life choices has led to their incarceration. Following their release, Mangle, Decker and Tazeem stick together as they return to a life of crime, embarking on a lucrative scam. But when they stumble upon a sophisticated sex-trafficking operation, they soon realise that they are in mortal danger. The disappearance of a family member and the murder of a dear friend lead the three to delve deeper into a world of violence and deception. In their quest for retribution and justice, they put their lives on the line. Their paths cross with that of Tatiana, who has left her home country for a better life in the West - or so she thinks. She soon realises she is in the hands of ruthless, violent people, who run an operation supplying girls to meet the most deviant desires of rich and powerful men. Will she survive the horrors of The Zombie Room? Are Mangle, Decker and Tazeem brave enough to follow her there, in an attempt to set her free? ...more \n", + "71 A chilling psychological thriller about a marriage, a way of life, and how far one woman will go to keep what is rightfully hersJodi and Todd are at a bad place in their marriage. Much is at stake, including the affluent life they lead in their beautiful waterfront condo in Chicago, as she, the killer, and he, the victim, rush haplessly toward the main event. He is a commi A chilling psychological thriller about a marriage, a way of life, and how far one woman will go to keep what is rightfully hersJodi and Todd are at a bad place in their marriage. Much is at stake, including the affluent life they lead in their beautiful waterfront condo in Chicago, as she, the killer, and he, the victim, rush haplessly toward the main event. He is a committed cheater. She lives and breathes denial. He exists in dual worlds. She likes to settle scores. He decides to play for keeps. She has nothing left to lose. Told in alternating voices, The Silent Wife is about a marriage in the throes of dissolution, a couple headed for catastrophe, concessions that can’t be made, and promises that won’t be kept. Expertly plotted and reminiscent of Gone Girl and These Things Hidden, The Silent Wife ensnares the reader from page one and does not let go. ...more \n", + "72 Eighteen years ago your baby daughter was snatched. Today, she came back. A sinister and darkly compelling psychological thriller from the No.1 bestselling author of The Girl With No Past. Eighteen years ago, Simone Porter’s six-month-old daughter, Helena, was abducted. Simone and husband, Matt, have slowly rebuilt their shattered lives, but the pain at losing their child Eighteen years ago your baby daughter was snatched. Today, she came back. A sinister and darkly compelling psychological thriller from the No.1 bestselling author of The Girl With No Past. Eighteen years ago, Simone Porter’s six-month-old daughter, Helena, was abducted. Simone and husband, Matt, have slowly rebuilt their shattered lives, but the pain at losing their child has never left them. Then a young woman, Grace, appears out of the blue and tells Simone she has information about her stolen baby. But just who is Grace – and can Simone trust her? When Grace herself disappears, Simone becomes embroiled in a desperate search for her daughter and the woman who has vital clues about her whereabouts. Simone is inching closer to the truth but it’ll take her into dangerous and disturbing territory. Simone lost her baby. Will she lose her life trying to find her? Read what people are saying about the Number One Bestseller, The Girl With No Past: ‘I read this in a day and found myself totally engaged with the plot. Kathryn Croft has pulled off a very accessible mystery, that exceeded my expectations and shows her talent. The ending was just right! Worth a read if you fancy a well paced mystery, on these dark autumnal nights.’ Northern Crime ‘Kept the tension and mystery going right until the end … An intense read that keeps you turning page after page.’ Crime Book Club ‘Wow! This book grabbed me from the very beginning! … To say this book is a page turner would be an understatement!’ Chat About Books ‘It kept me up all night and cost me my beauty sleep! I will get it out of the way immediately and tell you that this is one of the best thrillers I have read this year and it is fully deserving of my 5-star rating.’ Books Are Man’s Other Best Friend ‘BLIMEY. This book is GRIPPY - I sat and read it over the course of a day and a night, purely because I couldn't put it down.’ Reading Room with a View ‘The reader is kept guessing until the end. It's perfection for a thriller and the author does amazingly to keep our intrigue.’ Chic Toronto ‘Gripping, a real page turner… Excellent plot, and gripping stuff, that keeps the reader guessing until the end … raced to the end to find out what was happening and how it would all end … what a storyteller Kathryn Croft is!’ Emma’s Book Reviews ‘The concept of this book, and the story itself is phenomenal - honestly one of the best i have read, it really does outshine all other thrillers i have read.’ Afternoon Bookery ‘I really enjoyed this from start to finish. It's one of these books where you just HAVE to read JUST a couple more pages to see what is going to happen next. The author is very good at building suspense and revealing details bit by bit. It was totally unpredictable and had some very good twists. ...more \n", + "73 Monday 27 January“7:15 a.m. Hurrah! The wilderness years are over. For four weeks and five days now have been in functional relationship with adult male, thereby proving am not love pariah as recently feared.”Lurching from the cappuccino bars of Notting Hill to the blissed-out shores of Thailand, Bridget Jones searches for The Truth in spite of pathetically unevolved men, Monday 27 January“7:15 a.m. Hurrah! The wilderness years are over. For four weeks and five days now have been in functional relationship with adult male, thereby proving am not love pariah as recently feared.”Lurching from the cappuccino bars of Notting Hill to the blissed-out shores of Thailand, Bridget Jones searches for The Truth in spite of pathetically unevolved men, insane dating theories, and Smug Married advice (\"I'm just calling to say in the potty! In the potty! Well, do it in Daddy's hand then!\"). She experiences a zeitgeist-esque Spiritual Epiphany somewhere between the pages of How to Find the Love You Want Without Seeking It (\"can self-help books really help self?\"), protective custody, and a lightly chilled Chardonnay.Wednesday 5 March“7:08 p.m. Am assured, receptive, responsive woman of substance. My sense of self comes not from other people but... from... myself? That can’t be right.”With another devastatingly hilarious, ridiculous, unnervingly accurate take on modern womanhood, Bridget Jones is back. ...more \n", + "74 In England’s Regency era, manners and elegance reign in public life—but behind closed doors treason and tawdriness thrive. Nicholas Langdon is no stranger to reserved civility or bloody barbarity. After suffering a battlefield injury, the wealthy, well-connected British officer returns home to heal—and to fulfill a dying soldier’s last wish by delivering his coded diary.At In England’s Regency era, manners and elegance reign in public life—but behind closed doors treason and tawdriness thrive. Nicholas Langdon is no stranger to reserved civility or bloody barbarity. After suffering a battlefield injury, the wealthy, well-connected British officer returns home to heal—and to fulfill a dying soldier’s last wish by delivering his coded diary.At the home of the Wilherns, one of England’s most powerful families, Langdon attends a lavish ball where he meets their beautiful and intelligent ward, Julia Grey. Determined to maintain propriety, he keeps his distance—until the diary is stolen and all clues lead to Julia’s guardian. As Langdon traces an evil plot that could be the nation’s undoing, he grows ever more intrigued by the lovely young woman. And when Julia realizes that England—and the man she is falling in love with—need her help, she finds herself caught in the fray. Will the two succumb to their attraction while fighting to save their country? ...more \n", + "\n", + "[75 rows x 7 columns]\n" + ] } + ], + "source": [ + "import requests\n", + "from bs4 import BeautifulSoup\n", + "import pandas as pd\n", + "from urllib.parse import urljoin\n", + "\n", + "def scrape_books(min_rating, max_price):\n", + " \"\"\"\n", + " Scrapes book data from the \"Books to Scrape\" website based on minimum rating and maximum price.\n", + "\n", + " Args:\n", + " min_rating (int): The minimum rating (1-5) of books to include.\n", + " max_price (float): The maximum price of books to include.\n", + "\n", + " Returns:\n", + " pandas.DataFrame: A DataFrame containing the scraped book data with columns:\n", + " 'UPC', 'Title', 'Price (£)', 'Rating', 'Genre',\n", + " 'Availability', and 'Description'.\n", + " \"\"\"\n", + " base_url = \"https://books.toscrape.com/catalogue/\"\n", + " current_url = urljoin(base_url, 'page-1.html')\n", + " scraped_books = []\n", + " \n", + " # Map star ratings from text to integers\n", + " rating_map = {\"One\": 1, \"Two\": 2, \"Three\": 3, \"Four\": 4, \"Five\": 5}\n", + "\n", + " print(\"Starting the scraping process...\")\n", + "\n", + " while current_url:\n", + " try:\n", + " response = requests.get(current_url)\n", + " response.raise_for_status() # Raise an exception for bad status codes\n", + " soup = BeautifulSoup(response.content, 'html.parser')\n", + "\n", + " # Find all book articles on the current page\n", + " books_on_page = soup.find_all('article', class_='product_pod')\n", + "\n", + " for book in books_on_page:\n", + " # Get the link to the book's detail page\n", + " book_link = book.find('h3').find('a')['href']\n", + " book_url = urljoin(base_url, book_link)\n", + "\n", + " # Scrape the individual book page\n", + " try:\n", + " book_response = requests.get(book_url)\n", + " book_response.raise_for_status()\n", + " book_soup = BeautifulSoup(book_response.content, 'html.parser')\n", + "\n", + " # --- Initial Extraction for Filtering ---\n", + " # Extract rating\n", + " rating_p = book_soup.find('p', class_='star-rating')\n", + " rating_text = rating_p['class'][1] if rating_p else 'Zero'\n", + " rating = rating_map.get(rating_text, 0)\n", + " \n", + " # Extract price\n", + " price_str = book_soup.find('p', class_='price_color').text.replace('£', '')\n", + " price = float(price_str)\n", + "\n", + " # --- Filtering ---\n", + " if rating >= min_rating and price <= max_price:\n", + " # --- Detailed Extraction for Filtered Books ---\n", + " # Extract title\n", + " title = book_soup.find('h1').text\n", + "\n", + " # Extract product information table data\n", + " table = book_soup.find('table', class_='table-striped')\n", + " rows = table.find_all('tr')\n", + " upc = rows[0].find('td').text\n", + " availability_str = rows[5].find('td').text\n", + " availability = 'In stock' in availability_str\n", + " \n", + " # Extract genre from breadcrumbs\n", + " genre = book_soup.find('ul', class_='breadcrumb').find_all('li')[2].find('a').text\n", + " \n", + " # Extract description\n", + " description_tag = book_soup.find('div', id='product_description')\n", + " description = description_tag.find_next_sibling('p').text if description_tag else \"No description available.\"\n", + "\n", + " scraped_books.append({\n", + " \"UPC\": upc,\n", + " \"Title\": title,\n", + " \"Price (£)\": price,\n", + " \"Rating\": rating,\n", + " \"Genre\": genre,\n", + " \"Availability\": \"In Stock\" if availability else \"Out of Stock\",\n", + " \"Description\": description\n", + " })\n", + " \n", + " except requests.exceptions.RequestException as e:\n", + " print(f\"Could not process book URL {book_url}: {e}\")\n", + " except (AttributeError, KeyError, IndexError) as e:\n", + " print(f\"Error parsing details for a book at {book_url}: {e}\")\n", + "\n", + " # Find the next page link\n", + " next_page_tag = soup.find('li', class_='next')\n", + " if next_page_tag:\n", + " next_page_link = next_page_tag.find('a')['href']\n", + " current_url = urljoin(base_url, next_page_link)\n", + " print(f\"Navigating to next page: {current_url}\")\n", + " else:\n", + " current_url = None # No more pages\n", + "\n", + " except requests.exceptions.RequestException as e:\n", + " print(f\"Could not retrieve page {current_url}: {e}\")\n", + " break\n", + " \n", + " print(\"Scraping process finished.\")\n", + " \n", + " # Create a pandas DataFrame\n", + " df = pd.DataFrame(scraped_books, columns=[\n", + " \"UPC\", \"Title\", \"Price (£)\", \"Rating\", \"Genre\", \n", + " \"Availability\", \"Description\"\n", + " ])\n", + " \n", + " return df\n", + "\n", + "# --- Main execution block ---\n", + "if __name__ == '__main__':\n", + " # Set the desired filter criteria\n", + " MINIMUM_RATING = 4\n", + " MAXIMUM_PRICE = 20.00\n", + " \n", + " # Execute the scraping function\n", + " filtered_books_df = scrape_books(min_rating=MINIMUM_RATING, max_price=MAXIMUM_PRICE)\n", + " \n", + " # Display the results\n", + " print(f\"\\nFound {len(filtered_books_df)} books with a rating of {MINIMUM_RATING}+ stars and price under £{MAXIMUM_PRICE:.2f}\\n\")\n", + " \n", + " if not filtered_books_df.empty:\n", + " # To display all columns without truncation\n", + " pd.set_option('display.max_columns', None)\n", + " # To display the full description without truncation\n", + " pd.set_option('display.max_colwidth', None)\n", + " print(filtered_books_df)\n", + " else:\n", + " print(\"No books found matching the specified criteria.\")\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da200c40-edb1-4190-ba6f-8cf38cd1d029", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python [conda env:base] *", + "language": "python", + "name": "conda-base-py" }, - "nbformat": 4, - "nbformat_minor": 5 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 }