From 6eb92d730f81c366658b0594e55682fbf91d23dd Mon Sep 17 00:00:00 2001
From: Rodrigo Mendes <rodrigorangelmendes@gmail.com>
Date: Tue, 16 Sep 2025 20:34:13 +0100
Subject: [PATCH] Update main.ipynb

---
 your-code/main.ipynb | 288 +++++--------------------------------------
 1 file changed, 31 insertions(+), 257 deletions(-)

diff --git a/your-code/main.ipynb b/your-code/main.ipynb
index 9f0e67b..dc60fff 100644
--- a/your-code/main.ipynb
+++ b/your-code/main.ipynb
@@ -12,14 +12,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
     "# Import reduce from functools, numpy and pandas\n",
     "from functools import reduce\n",
-    "import numpy\n",
-    "import pandas"
+    "import numpy as np\n",
+    "import pandas as pd"
    ]
   },
   {
@@ -29,187 +29,48 @@
     "# Challenge 1 - Mapping\n",
     "\n",
     "#### We will use the map function to clean up words in a book.\n",
-    "\n",
     "In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Run this code:\n",
-    "\n",
     "location = '../data/58585-0.txt'\n",
-    "with open(location, 'r', encoding=\"utf8\") as f:\n",
-    "    prophet = f.read().split(' ')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "len(prophet)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. \n",
-    "\n",
-    "Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### The next step is to create a function that will remove references. \n",
+    "with open(location, 'r', encoding='utf8') as f:\n",
+    "    prophet = f.read().split(' ')\n",
     "\n",
-    "We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below."
+    "# Remove the first 568 words (metadata) \n",
+    "prophet = prophet[568:]"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Function to remove references\n",
     "def reference(x):\n",
-    "    '''\n",
-    "    Input: A string\n",
-    "    Output: The string with references removed\n",
-    "    \n",
-    "    Example:\n",
-    "    Input: 'the{7}'\n",
-    "    Output: 'the'\n",
-    "    '''\n",
-    "    \n",
-    "    # your code here"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\\n`. Write your function in the cell below."
+    "    return x.split('{')[0]\n",
+    "\n",
+    "prophet_reference = list(map(reference, prophet))"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Function to handle line breaks\n",
     "def line_break(x):\n",
-    "    '''\n",
-    "    Input: A string\n",
-    "    Output: A list of strings split on the line break (\\n) character\n",
-    "        \n",
-    "    Example:\n",
-    "    Input: 'the\\nbeloved'\n",
-    "    Output: ['the', 'beloved']\n",
-    "    '''\n",
-    "    \n",
-    "    # your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "prophet_flat = [i for sub in prophet_line for i in sub]\n",
-    "prophet_flat"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# your code here"
+    "    return x.split('\\n')\n",
+    "\n",
+    "prophet_line = list(map(line_break, prophet_reference))\n",
+    "prophet_flat = [i for sub in prophet_line for i in sub]"
    ]
   },
   {
@@ -217,74 +78,32 @@
    "metadata": {},
    "source": [
     "# Challenge 2 - Filtering\n",
-    "\n",
-    "When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise."
+    "Remove words like 'and', 'the', etc."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [],
    "source": [
     "def word_filter(x):\n",
-    "    '''\n",
-    "    Input: A string\n",
-    "    Output: True if the word is not in the specified list \n",
-    "    and False if the word is in the list.\n",
-    "        \n",
-    "    Example:\n",
-    "    word list = ['and', 'the']\n",
-    "    Input: 'and'\n",
-    "    Output: False\n",
-    "    \n",
-    "    Input: 'John'\n",
-    "    Output: True\n",
-    "    '''\n",
-    "    \n",
     "    word_list = ['and', 'the', 'a', 'an']\n",
-    "    \n",
-    "    # your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Use the `filter()` function to filter out the words speficied in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Bonus Challenge\n",
+    "    return x not in word_list\n",
     "\n",
-    "Rewrite the `word_filter` function above to not be case sensitive."
+    "prophet_filter = list(filter(word_filter, prophet_flat))"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Bonus: case insensitive\n",
     "def word_filter_case(x):\n",
-    "   \n",
     "    word_list = ['and', 'the', 'a', 'an']\n",
-    "    \n",
-    "    # your code here"
+    "    return x.lower() not in word_list"
    ]
   },
   {
@@ -292,56 +111,19 @@
    "metadata": {},
    "source": [
     "# Challenge 3 - Reducing\n",
-    "\n",
-    "#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. \n",
-    "\n",
-    "We will start by writing a function that takes two strings and concatenates them together with a space between the two strings."
+    "Concatenate all words into a single string."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
     "def concat_space(a, b):\n",
-    "    '''\n",
-    "    Input:Two strings\n",
-    "    Output: A single string separated by a space\n",
-    "        \n",
-    "    Example:\n",
-    "    Input: 'John', 'Smith'\n",
-    "    Output: 'John Smith'\n",
-    "    '''\n",
-    "    \n",
-    "    # your code here"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# your code here"
+    "    return a + ' ' + b\n",
+    "\n",
+    "prophet_string = reduce(concat_space, prophet_filter)"
    ]
   }
  ],
@@ -352,15 +134,7 @@
    "name": "python3"
   },
   "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
    "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
    "version": "3.9.13"
   }
  },