Skip to content

During this segment, all of the individual pieces we've been building will come together. Once we have assembled these pieces, then we can start putting the final touches on the repository. Does the README.md fully describe the project and its purpose? Is the repository ready to be added to your portfolio? Details like these are part of "pluggin…

Notifications You must be signed in to change notification settings

ALIYA2Group/Mod20_Segment_3

Repository files navigation

Mod20_Segment_3

During this segment, all of the individual pieces we've been building will come together. Once we have assembled these pieces, then we can start putting the final touches on the repository. Does the README.md fully describe the project and its purpose? Is the repository ready to be added to your portfolio? Details like these are part of "plugging it in" because each piece is vital to the final presentation.

Purpose

Third Segment: Plug It In: Connect your final database to your model, continue to train your model, and create your dashboard and presentation.

Presentation

Selected topic

We want to see if we could predict the melting of Arctic Sea Ice using the time series predictive model SARIMAX using Python.

Header

Reason topic was selected

We selected this topic because we wanted to predict at what point in time would the sea ice shrink. It is a topic that all the team members felt passionate about and wanted to explore.

Description of the source of data

We reviewed data from the following resources and identified the most useful features to be used in our analysis:

  1. National Snow and Ice Data Center (NSIDC)
  2. Climate Data Store
  3. Visualize Arctic and Antarctic Sea Ice

Here is the Question the team hopes to answer with the data

Based on past features of science exploration and calculation on melting and atmosphere changes, we extract, transform and load the selected trends to show the sea-ice diminishing from 2003-2020 to try and predict at what point in time in the future these factors would impact the melting of the Arctic sea ice.

Description of the data exploration phase of the project

  • Extract

Through data exploration, we determined that we wanted to use Arctic data only as it is melting faster than Antarctic.

P5

D3

Description of the analysis phase of the project and data processing

  • Transform

We had to understand the significance of the science behind the data, to understand which features of the data set we needed to use. When we transformed the features of the data set that were applicable, we reviewed the trends.

M2

  • Technologies, languages, tools, and algorithms used throughout the project

technology

Google Slides

Github

Main Branch All code in the main branch is production-ready.

G1

  • All code necessary to perform exploratory analysis
  • All code necessary to complete the machine learning portion of the project

*Commits per Segment

  1. Segment 1 - Commits
  2. Segment 2 - Commits
  3. Segment 3 - Commits

Machine Learning Model

Data pre-processing included importing the dataset using SQLAlchemy from AWS, dropping unwanted columns and setting the date as index.

Description of feature engineering and the feature selection, including the decision-making process

The selected features were visualized as time series and against the target (extent) to understand the correlation.

CORR

Description of how data was split into training and testing sets

Data was split into training and testing sets using a 70-30 ratio and using the scikit library.

traintest

Description of how we have trained the model thus far, and any additional training that will take place

The model was trained using SARIMAX. After splitting the data into training and testing sets:

1- Decomposed Time Series into several components-Trend, Seasonality, and Random noise

seasonal

2- Checked for Data Stationarity using Augmented Dickey-Fuller(ADF) test. If we make the data stationary, then the model can make predictions based on the fact that mean and variance will remain the same in the future. A stationarized series is easier to predict. For data points that were not stationary, data was differenced to make it stationary.

3- An ACF and PACF bar chart was plotted. ACF is a plot of the coefficients of correlation between a time series and its lag and helps determine the value of p or the AR term while PACF is a plot of the partial correlation coefficients between the series and lags of itself and helps determine the value of q or the MA term. Both p and q are required input parameters for the SARIMAX Model.

ACF

3- Ran the SARIMAX model to forecast the extent based on the order obtained using ARIMA model and using the traing set as the exogenous variables

4- Fitted the model and trained and tested data was put into a dataframe (converted back to scale)

F9

Description of current accuracy score

The Accuracy Score : 0.08529 (root square mean error)

We resulted in a very good accuracy score.

F10

Future Prediction:

We successfully completed prediction model to predict the features of the Artic Sea Ice melt:

  1. A univariate time-series model was applied to each of the features to estimate their future value, which are put into a dataframe

XFORE

  1. using the predicted values of the features, we used the model to predict the values of Y (Extent):

F10c

Explanation of model choice, including limitations and benefits

Given the nature of the data and the question we are trying to answer, we used a Timer-Series prediction model SARIMAX (Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors). It helps to predict future values using auto-regression and moving average along with adding in the seasonality factor.

ARIMA Model - using Using ARIMA model, we can forecast a time series using the series past values. We built optimal ARIMA model from scratch and extend it to Seasonal ARIMA (SARIMA) and SARIMAX models.

SARIMAX is seasonal updated version of the ARIMA model family.

Explanation of changes in model choice

There were a number of different models tried and tested, which were the changes that occurred between the Segment 2 and Segment 3 as follows:

  1. Method using vector autoregression (VAR) M1

  2. Method Time-series forecasting using tensor flow, including convolutional and recurrent neural networks (CNN and RNN) M1a

Database Integration

There are no deliverables for the database integration section of the project for this segment.

Dashboard

The dashboard plan includes the following:

  • Using" beautiful soup" and "splinter" to scrap the news from idc website. (interactive element)
  • We put the scraping script in the "Google app engine Cron task" and it will automatically do the scraping everyday. (interactive element)
  • Store the data into MongoDB.
  • Deploy the web page to "Google app engine".
  • The website is using "Flask" and "pymongo" to show and read the data from MongoDB.
  • Images & Databases available to view and download

W2

Preview of Final Project Website:

About

During this segment, all of the individual pieces we've been building will come together. Once we have assembled these pieces, then we can start putting the final touches on the repository. Does the README.md fully describe the project and its purpose? Is the repository ready to be added to your portfolio? Details like these are part of "pluggin…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages