diff --git a/lab-hypothesis-testing (1).ipynb b/lab-hypothesis-testing (1).ipynb new file mode 100644 index 0000000..24f4d15 --- /dev/null +++ b/lab-hypothesis-testing (1).ipynb @@ -0,0 +1 @@ +{"cells":[{"cell_type":"markdown","metadata":{"id":"9yKDqwpZIB4b"},"source":["# Lab | Hypothesis Testing"]},{"cell_type":"markdown","metadata":{"id":"ubGOQIPaIB4d"},"source":["**Objective**\n","\n","Welcome to the Hypothesis Testing Lab, where we embark on an enlightening journey through the realm of statistical decision-making! In this laboratory, we delve into various scenarios, applying the powerful tools of hypothesis testing to scrutinize and interpret data.\n","\n","From testing the mean of a single sample (One Sample T-Test), to investigating differences between independent groups (Two Sample T-Test), and exploring relationships within dependent samples (Paired Sample T-Test), our exploration knows no bounds. Furthermore, we'll venture into the realm of Analysis of Variance (ANOVA), unraveling the complexities of comparing means across multiple groups.\n","\n","So, grab your statistical tools, prepare your hypotheses, and let's embark on this fascinating journey of exploration and discovery in the world of hypothesis testing!"]},{"cell_type":"markdown","metadata":{"id":"AvXjTD9fIB4f"},"source":["**Challenge 1**"]},{"cell_type":"markdown","metadata":{"id":"Nf4jTOWTIB4f"},"source":["In this challenge, we will be working with pokemon data. The data can be found here:\n","\n","- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/pokemon.csv"]},{"cell_type":"code","execution_count":5,"metadata":{"id":"6y2L200kIB4g","executionInfo":{"status":"ok","timestamp":1756140778028,"user_tz":-120,"elapsed":1044,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}}},"outputs":[],"source":["#libraries\n","import pandas as pd\n","import scipy.stats as st\n","import numpy as np\n","\n"]},{"cell_type":"code","execution_count":6,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":424},"id":"ZXrDfQTdIB4i","executionInfo":{"status":"ok","timestamp":1756140779623,"user_tz":-120,"elapsed":247,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"b7035cb0-6657-42be-b015-29b941310c58"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def \\\n","0 Bulbasaur Grass Poison 45 49 49 65 65 \n","1 Ivysaur Grass Poison 60 62 63 80 80 \n","2 Venusaur Grass Poison 80 82 83 100 100 \n","3 Mega Venusaur Grass Poison 80 100 123 122 120 \n","4 Charmander Fire NaN 39 52 43 60 50 \n",".. ... ... ... .. ... ... ... ... \n","795 Diancie Rock Fairy 50 100 150 100 150 \n","796 Mega Diancie Rock Fairy 50 160 110 160 110 \n","797 Hoopa Confined Psychic Ghost 80 110 60 150 130 \n","798 Hoopa Unbound Psychic Dark 80 160 60 170 130 \n","799 Volcanion Fire Water 80 110 120 130 90 \n","\n"," Speed Generation Legendary \n","0 45 1 False \n","1 60 1 False \n","2 80 1 False \n","3 80 1 False \n","4 65 1 False \n",".. ... ... ... \n","795 50 6 True \n","796 110 6 True \n","797 70 6 True \n","798 80 6 True \n","799 70 6 True \n","\n","[800 rows x 11 columns]"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
0BulbasaurGrassPoison4549496565451False
1IvysaurGrassPoison6062638080601False
2VenusaurGrassPoison808283100100801False
3Mega VenusaurGrassPoison80100123122120801False
4CharmanderFireNaN3952436050651False
....................................
795DiancieRockFairy50100150100150506True
796Mega DiancieRockFairy501601101601101106True
797Hoopa ConfinedPsychicGhost8011060150130706True
798Hoopa UnboundPsychicDark8016060170130806True
799VolcanionFireWater8011012013090706True
\n","

800 rows × 11 columns

\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 799,\n \"samples\": [\n \"Hydreigon\",\n \"Beheeyem\",\n \"Arcanine\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 1\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Grass\",\n \"Fire\",\n \"Fairy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 2\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Poison\",\n \"Flying\",\n \"Steel\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HP\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 25,\n \"min\": 1,\n \"max\": 255,\n \"num_unique_values\": 94,\n \"samples\": [\n 106,\n 81,\n 170\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Attack\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 5,\n \"max\": 190,\n \"num_unique_values\": 111,\n \"samples\": [\n 79,\n 63,\n 52\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Defense\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31,\n \"min\": 5,\n \"max\": 230,\n \"num_unique_values\": 103,\n \"samples\": [\n 20,\n 88,\n 23\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Atk\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 10,\n \"max\": 194,\n \"num_unique_values\": 105,\n \"samples\": [\n 58,\n 150,\n 160\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Def\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 27,\n \"min\": 20,\n \"max\": 230,\n \"num_unique_values\": 92,\n \"samples\": [\n 154,\n 45,\n 44\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 29,\n \"min\": 5,\n \"max\": 180,\n \"num_unique_values\": 108,\n \"samples\": [\n 113,\n 50,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Generation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 6,\n \"num_unique_values\": 6,\n \"samples\": [\n 1,\n 2,\n 6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Legendary\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":6}],"source":["df = pd.read_csv(\"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/pokemon.csv\")\n","df"]},{"cell_type":"markdown","metadata":{"id":"vljVDHeZIB4j"},"source":["- We posit that Pokemons of type Dragon have, on average, more HP stats than Grass. Choose the propper test and, with 5% significance, comment your findings."]},{"cell_type":"code","execution_count":7,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"1SKZjl2PIB4j","executionInfo":{"status":"ok","timestamp":1756140781207,"user_tz":-120,"elapsed":43,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"1e3fe284-d2ee-4fa6-ede4-f45f1009904f"},"outputs":[{"output_type":"stream","name":"stdout","text":["t-statistic: 3.3349632905124063\n","p-value (bilateral): 0.0015987219490841199\n","p-value (unilateral Dragon > Grass): 0.0007993609745420599\n","We reject the null hypothesis: Dragons tend to have more HP than Grass.\n"]}],"source":["#code here\n","\n","#H0 Pokemons Dragon more HP than Grass\n","#H1 H0 is not true\n","\n","alpha = 0.05\n","\n","\n","\n","dragon_hp = df[df['Type 1'] == 'Dragon']['HP']\n","grass_hp = df[df['Type 1'] == 'Grass']['HP']\n","\n","\n","stat, p_value = st.ttest_ind(dragon_hp, grass_hp, equal_var=False)\n","\n","print(\"t-statistic:\", stat)\n","print(\"p-value (bilateral):\", p_value)\n","\n","p_value_one_sided = p_value / 2 if stat > 0 else 1 - (p_value / 2)\n","print(\"p-value (unilateral Dragon > Grass):\", p_value_one_sided)\n","\n","alpha = 0.05\n","if p_value_one_sided < alpha:\n"," print(\"We reject the null hypothesis: Dragons tend to have more HP than Grass.\")\n","else:\n"," print(\"We are not able to reject the null hypothesis.\")"]},{"cell_type":"markdown","metadata":{"id":"ne2LGNxhIB4k"},"source":["- We posit that Legendary Pokemons have different stats (HP, Attack, Defense, Sp.Atk, Sp.Def, Speed) when comparing with Non-Legendary. Choose the propper test and, with 5% significance, comment your findings.\n"]},{"cell_type":"code","execution_count":8,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"U1I9gBhGIB4l","executionInfo":{"status":"ok","timestamp":1756140816326,"user_tz":-120,"elapsed":46,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"bd5de97b-d489-4323-c925-c692dab96d5c"},"outputs":[{"output_type":"stream","name":"stdout","text":["HP: t = 8.98, p = 0.0000\n"," → Reject H0: Significant difference.\n","\n","Attack: t = 10.44, p = 0.0000\n"," → Reject H0: Significant difference.\n","\n","Defense: t = 7.64, p = 0.0000\n"," → Reject H0: Significant difference.\n","\n","Sp. Atk: t = 13.42, p = 0.0000\n"," → Reject H0: Significant difference.\n","\n","Sp. Def: t = 10.02, p = 0.0000\n"," → Reject H0: Significant difference.\n","\n","Speed: t = 11.48, p = 0.0000\n"," → Reject H0: Significant difference.\n","\n"]}],"source":["#code here\n","stats_cols = [\"HP\", \"Attack\", \"Defense\", \"Sp. Atk\", \"Sp. Def\", \"Speed\"]\n","\n","legendary = df[df['Legendary'] == True]\n","non_legendary = df[df['Legendary'] == False]\n","\n","alpha = 0.05\n","\n","for col in stats_cols:\n"," stat, p_value = st.ttest_ind(\n"," legendary[col],\n"," non_legendary[col],\n"," equal_var=False\n"," )\n"," print(f\"{col}: t = {stat:.2f}, p = {p_value:.4f}\")\n"," if p_value < alpha:\n"," print(\" → Reject H0: Significant difference.\\n\")\n"," else:\n"," print(\" → Fail to reject H0: No significant difference.\\n\")"]},{"cell_type":"markdown","metadata":{"id":"vnFngSMSIB4l"},"source":["**Challenge 2**"]},{"cell_type":"markdown","metadata":{"id":"_sJdAdhdIB4l"},"source":["In this challenge, we will be working with california-housing data. The data can be found here:\n","- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv"]},{"cell_type":"code","execution_count":9,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"IswnQug2IB4m","executionInfo":{"status":"ok","timestamp":1756140827891,"user_tz":-120,"elapsed":409,"user":{"displayName":"Santiago Larrea","userId":"01956628674237458372"}},"outputId":"b9e747c4-af76-43cb-92bd-a39b88c34ff7"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" longitude latitude housing_median_age total_rooms total_bedrooms \\\n","0 -114.31 34.19 15.0 5612.0 1283.0 \n","1 -114.47 34.40 19.0 7650.0 1901.0 \n","2 -114.56 33.69 17.0 720.0 174.0 \n","3 -114.57 33.64 14.0 1501.0 337.0 \n","4 -114.57 33.57 20.0 1454.0 326.0 \n","\n"," population households median_income median_house_value \n","0 1015.0 472.0 1.4936 66900.0 \n","1 1129.0 463.0 1.8200 80100.0 \n","2 333.0 117.0 1.6509 85700.0 \n","3 515.0 226.0 3.1917 73400.0 \n","4 624.0 262.0 1.9250 65500.0 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-114.3134.1915.05612.01283.01015.0472.01.493666900.0
1-114.4734.4019.07650.01901.01129.0463.01.820080100.0
2-114.5633.6917.0720.0174.0333.0117.01.650985700.0
3-114.5733.6414.01501.0337.0515.0226.03.191773400.0
4-114.5733.5720.01454.0326.0624.0262.01.925065500.0
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 17000,\n \"fields\": [\n {\n \"column\": \"longitude\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2.005166408426173,\n \"min\": -124.35,\n \"max\": -114.31,\n \"num_unique_values\": 827,\n \"samples\": [\n -117.56,\n -123.32,\n -118.26\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"latitude\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2.1373397946570734,\n \"min\": 32.54,\n \"max\": 41.95,\n \"num_unique_values\": 840,\n \"samples\": [\n 38.44,\n 40.79,\n 32.69\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"housing_median_age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.586936981660335,\n \"min\": 1.0,\n \"max\": 52.0,\n \"num_unique_values\": 52,\n \"samples\": [\n 23.0,\n 52.0,\n 47.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_rooms\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2179.947071452768,\n \"min\": 2.0,\n \"max\": 37937.0,\n \"num_unique_values\": 5533,\n \"samples\": [\n 3564.0,\n 6955.0,\n 5451.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_bedrooms\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 421.49945157986514,\n \"min\": 1.0,\n \"max\": 6445.0,\n \"num_unique_values\": 1848,\n \"samples\": [\n 729.0,\n 719.0,\n 2075.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"population\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1147.852959159525,\n \"min\": 3.0,\n \"max\": 35682.0,\n \"num_unique_values\": 3683,\n \"samples\": [\n 249.0,\n 1735.0,\n 235.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"households\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 384.52084085590013,\n \"min\": 1.0,\n \"max\": 6082.0,\n \"num_unique_values\": 1740,\n \"samples\": [\n 390.0,\n 1089.0,\n 1351.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"median_income\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.908156518379093,\n \"min\": 0.4999,\n \"max\": 15.0001,\n \"num_unique_values\": 11175,\n \"samples\": [\n 7.2655,\n 5.6293,\n 4.2262\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"median_house_value\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 115983.76438720913,\n \"min\": 14999.0,\n \"max\": 500001.0,\n \"num_unique_values\": 3694,\n \"samples\": [\n 162300.0,\n 346800.0,\n 116700.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":9}],"source":["df = pd.read_csv(\"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv\")\n","df.head()"]},{"cell_type":"markdown","metadata":{"id":"cTbI47gxIB4m"},"source":["**We posit that houses close to either a school or a hospital are more expensive.**\n","\n","- School coordinates (-118, 34)\n","- Hospital coordinates (-122, 37)\n","\n","We consider a house (neighborhood) to be close to a school or hospital if the distance is lower than 0.50.\n","\n","Hint:\n","- Write a function to calculate euclidean distance from each house (neighborhood) to the school and to the hospital.\n","- Divide your dataset into houses close and far from either a hospital or school.\n","- Choose the propper test and, with 5% significance, comment your findings.\n",""]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VVdwx7f5IB4m"},"outputs":[],"source":["def euclidean_distance(lon1, lat1, lon2, lat2):\n"," return np.sqrt((lon1 - lon2)**2 + (lat1 - lat2)**2)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"d-gt-0xgIB4m"},"outputs":[],"source":["school = (-118, 34)\n","hospital = (-122, 37)\n","\n","# Distancias\n","df['dist_school'] = euclidean_distance(df['longitude'], df['latitude'], school[0], school[1])\n","df['dist_hospital'] = euclidean_distance(df['longitude'], df['latitude'], hospital[0], hospital[1])\n","\n","# Consideramos \"close\" si está a menos de 0.5 de alguno\n","df['close'] = ((df['dist_school'] < 0.5) | (df['dist_hospital'] < 0.5))\n","\n","close_prices = df[df['close']]['median_house_value']\n","far_prices = df[~df['close']]['median_house_value']\n","\n","stat, p_value = st.ttest_ind(close_prices, far_prices, equal_var=False)\n","\n","print(\"t-statistic:\", stat)\n","print(\"p-value:\", p_value)\n","\n","alpha = 0.05\n","if p_value < alpha:\n"," print(\"Reject H0: Houses close to school/hospital are significantly more expensive.\")\n","else:\n"," print(\"Fail to reject H0: No significant price difference found.\")"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.9"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0} \ No newline at end of file diff --git a/lab-hypothesis-testing.ipynb b/lab-hypothesis-testing.ipynb deleted file mode 100644 index 0cc26d5..0000000 --- a/lab-hypothesis-testing.ipynb +++ /dev/null @@ -1,520 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Lab | Hypothesis Testing" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Objective**\n", - "\n", - "Welcome to the Hypothesis Testing Lab, where we embark on an enlightening journey through the realm of statistical decision-making! In this laboratory, we delve into various scenarios, applying the powerful tools of hypothesis testing to scrutinize and interpret data.\n", - "\n", - "From testing the mean of a single sample (One Sample T-Test), to investigating differences between independent groups (Two Sample T-Test), and exploring relationships within dependent samples (Paired Sample T-Test), our exploration knows no bounds. Furthermore, we'll venture into the realm of Analysis of Variance (ANOVA), unraveling the complexities of comparing means across multiple groups.\n", - "\n", - "So, grab your statistical tools, prepare your hypotheses, and let's embark on this fascinating journey of exploration and discovery in the world of hypothesis testing!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Challenge 1**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this challenge, we will be working with pokemon data. The data can be found here:\n", - "\n", - "- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/pokemon.csv" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "#libraries\n", - "import pandas as pd\n", - "import scipy.stats as st\n", - "import numpy as np\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
0BulbasaurGrassPoison4549496565451False
1IvysaurGrassPoison6062638080601False
2VenusaurGrassPoison808283100100801False
3Mega VenusaurGrassPoison80100123122120801False
4CharmanderFireNaN3952436050651False
....................................
795DiancieRockFairy50100150100150506True
796Mega DiancieRockFairy501601101601101106True
797Hoopa ConfinedPsychicGhost8011060150130706True
798Hoopa UnboundPsychicDark8016060170130806True
799VolcanionFireWater8011012013090706True
\n", - "

800 rows × 11 columns

\n", - "
" - ], - "text/plain": [ - " Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def \\\n", - "0 Bulbasaur Grass Poison 45 49 49 65 65 \n", - "1 Ivysaur Grass Poison 60 62 63 80 80 \n", - "2 Venusaur Grass Poison 80 82 83 100 100 \n", - "3 Mega Venusaur Grass Poison 80 100 123 122 120 \n", - "4 Charmander Fire NaN 39 52 43 60 50 \n", - ".. ... ... ... .. ... ... ... ... \n", - "795 Diancie Rock Fairy 50 100 150 100 150 \n", - "796 Mega Diancie Rock Fairy 50 160 110 160 110 \n", - "797 Hoopa Confined Psychic Ghost 80 110 60 150 130 \n", - "798 Hoopa Unbound Psychic Dark 80 160 60 170 130 \n", - "799 Volcanion Fire Water 80 110 120 130 90 \n", - "\n", - " Speed Generation Legendary \n", - "0 45 1 False \n", - "1 60 1 False \n", - "2 80 1 False \n", - "3 80 1 False \n", - "4 65 1 False \n", - ".. ... ... ... \n", - "795 50 6 True \n", - "796 110 6 True \n", - "797 70 6 True \n", - "798 80 6 True \n", - "799 70 6 True \n", - "\n", - "[800 rows x 11 columns]" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv(\"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/pokemon.csv\")\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- We posit that Pokemons of type Dragon have, on average, more HP stats than Grass. Choose the propper test and, with 5% significance, comment your findings." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "#code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- We posit that Legendary Pokemons have different stats (HP, Attack, Defense, Sp.Atk, Sp.Def, Speed) when comparing with Non-Legendary. Choose the propper test and, with 5% significance, comment your findings.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "#code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Challenge 2**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this challenge, we will be working with california-housing data. The data can be found here:\n", - "- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-114.3134.1915.05612.01283.01015.0472.01.493666900.0
1-114.4734.4019.07650.01901.01129.0463.01.820080100.0
2-114.5633.6917.0720.0174.0333.0117.01.650985700.0
3-114.5733.6414.01501.0337.0515.0226.03.191773400.0
4-114.5733.5720.01454.0326.0624.0262.01.925065500.0
\n", - "
" - ], - "text/plain": [ - " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", - "0 -114.31 34.19 15.0 5612.0 1283.0 \n", - "1 -114.47 34.40 19.0 7650.0 1901.0 \n", - "2 -114.56 33.69 17.0 720.0 174.0 \n", - "3 -114.57 33.64 14.0 1501.0 337.0 \n", - "4 -114.57 33.57 20.0 1454.0 326.0 \n", - "\n", - " population households median_income median_house_value \n", - "0 1015.0 472.0 1.4936 66900.0 \n", - "1 1129.0 463.0 1.8200 80100.0 \n", - "2 333.0 117.0 1.6509 85700.0 \n", - "3 515.0 226.0 3.1917 73400.0 \n", - "4 624.0 262.0 1.9250 65500.0 " - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv(\"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv\")\n", - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**We posit that houses close to either a school or a hospital are more expensive.**\n", - "\n", - "- School coordinates (-118, 34)\n", - "- Hospital coordinates (-122, 37)\n", - "\n", - "We consider a house (neighborhood) to be close to a school or hospital if the distance is lower than 0.50.\n", - "\n", - "Hint:\n", - "- Write a function to calculate euclidean distance from each house (neighborhood) to the school and to the hospital.\n", - "- Divide your dataset into houses close and far from either a hospital or school.\n", - "- Choose the propper test and, with 5% significance, comment your findings.\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -}