Mod20_Segment_3

During this segment, all of the individual pieces we've been building will come together. Once we have assembled these pieces, then we can start putting the final touches on the repository. Does the README.md fully describe the project and its purpose? Is the repository ready to be added to your portfolio? Details like these are part of "plugging it in" because each piece is vital to the final presentation.

Purpose

Third Segment: Plug It In: Connect your final database to your model, continue to train your model, and create your dashboard and presentation.

Link to Topic Segment Details

Presentation

Selected topic

We want to see if we could predict the melting of Arctic Sea Ice using the time series predictive model SARIMAX using Python.

Reason topic was selected

We selected this topic because we wanted to predict at what point in time would the sea ice shrink. It is a topic that all the team members felt passionate about and wanted to explore.

Description of the source of data

We reviewed data from the following resources and identified the most useful features to be used in our analysis:

Here is the Question the team hopes to answer with the data

Based on past features of science exploration and calculation on melting and atmosphere changes, we extract, transform and load the selected trends to show the sea-ice diminishing from 2003-2020 to try and predict at what point in time in the future these factors would impact the melting of the Arctic sea ice.

Description of the data exploration phase of the project

Extract

Through data exploration, we determined that we wanted to use Arctic data only as it is melting faster than Antarctic.

Description of the analysis phase of the project and data processing

Transform

We had to understand the significance of the science behind the data, to understand which features of the data set we needed to use. When we transformed the features of the data set that were applicable, we reviewed the trends.

Technologies, languages, tools, and algorithms used throughout the project

Google Slides

Link to Google Slides Presentation

Github

Main Branch All code in the main branch is production-ready.

All code necessary to perform exploratory analysis
All code necessary to complete the machine learning portion of the project

*Commits per Segment

Machine Learning Model

Data pre-processing included importing the dataset using SQLAlchemy from AWS, dropping unwanted columns and setting the date as index.

Description of feature engineering and the feature selection, including the decision-making process

The selected features were visualized as time series and against the target (extent) to understand the correlation.

Description of how data was split into training and testing sets

Data was split into training and testing sets using a 70-30 ratio and using the scikit library.

Description of how we have trained the model thus far, and any additional training that will take place

The model was trained using SARIMAX. After splitting the data into training and testing sets:

1- Decomposed Time Series into several components-Trend, Seasonality, and Random noise

2- Checked for Data Stationarity using Augmented Dickey-Fuller(ADF) test. If we make the data stationary, then the model can make predictions based on the fact that mean and variance will remain the same in the future. A stationarized series is easier to predict. For data points that were not stationary, data was differenced to make it stationary.

3- An ACF and PACF bar chart was plotted. ACF is a plot of the coefficients of correlation between a time series and its lag and helps determine the value of p or the AR term while PACF is a plot of the partial correlation coefficients between the series and lags of itself and helps determine the value of q or the MA term. Both p and q are required input parameters for the SARIMAX Model.

3- Ran the SARIMAX model to forecast the extent based on the order obtained using ARIMA model and using the traing set as the exogenous variables

4- Fitted the model and trained and tested data was put into a dataframe (converted back to scale)

Description of current accuracy score

The Accuracy Score : 0.08529 (root square mean error)

We resulted in a very good accuracy score.

Future Prediction:

We successfully completed prediction model to predict the features of the Artic Sea Ice melt:

A univariate time-series model was applied to each of the features to estimate their future value, which are put into a dataframe

using the predicted values of the features, we used the model to predict the values of Y (Extent):

Explanation of model choice, including limitations and benefits

Given the nature of the data and the question we are trying to answer, we used a Timer-Series prediction model SARIMAX (Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors). It helps to predict future values using auto-regression and moving average along with adding in the seasonality factor.

ARIMA Model - using Using ARIMA model, we can forecast a time series using the series past values. We built optimal ARIMA model from scratch and extend it to Seasonal ARIMA (SARIMA) and SARIMAX models.

SARIMAX is seasonal updated version of the ARIMA model family.

Explanation of changes in model choice

There were a number of different models tried and tested, which were the changes that occurred between the Segment 2 and Segment 3 as follows:

Method using vector autoregression (VAR)
Method Time-series forecasting using tensor flow, including convolutional and recurrent neural networks (CNN and RNN)

Database Integration

There are no deliverables for the database integration section of the project for this segment.

Dashboard

Link to Dashboard

The dashboard plan includes the following:

Using" beautiful soup" and "splinter" to scrap the news from idc website. (interactive element)
We put the scraping script in the "Google app engine Cron task" and it will automatically do the scraping everyday. (interactive element)
Store the data into MongoDB.
Deploy the web page to "Google app engine".
The website is using "Flask" and "pymongo" to show and read the data from MongoDB.
Images & Databases available to view and download

Preview of Final Project Website:

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
.gitignore		.gitignore
Content		Content
Dashboard&Website		Dashboard&Website
Database		Database
ETL Data		ETL Data
Machine Learning		Machine Learning
Pictures		Pictures
R Analysis		R Analysis
Websites		Websites
__pycache__		__pycache__
static		static
README.md		README.md
_config.yml		_config.yml
app.py		app.py
index.html		index.html
index1.html		index1.html
index2.html		index2.html
indexS.html		indexS.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mod20_Segment_3

Purpose

Link to Topic Segment Details

Presentation

Selected topic

Reason topic was selected

Description of the source of data

Here is the Question the team hopes to answer with the data

Description of the data exploration phase of the project

Description of the analysis phase of the project and data processing

Google Slides

Link to Google Slides Presentation

Github

Machine Learning Model

Description of feature engineering and the feature selection, including the decision-making process

Description of how data was split into training and testing sets

Description of how we have trained the model thus far, and any additional training that will take place

Description of current accuracy score

Future Prediction:

Explanation of model choice, including limitations and benefits

Explanation of changes in model choice

Database Integration

Dashboard

Link to Dashboard

Preview of Final Project Website:

Final Preview

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

ALIYA2Group/Mod20_Segment_3

Folders and files

Latest commit

History

Repository files navigation

Mod20_Segment_3

Purpose

Link to Topic Segment Details

Presentation

Selected topic

Reason topic was selected

Description of the source of data

Here is the Question the team hopes to answer with the data

Description of the data exploration phase of the project

Description of the analysis phase of the project and data processing

Google Slides

Link to Google Slides Presentation

Github

Machine Learning Model

Description of feature engineering and the feature selection, including the decision-making process

Description of how data was split into training and testing sets

Description of how we have trained the model thus far, and any additional training that will take place

Description of current accuracy score

Future Prediction:

Explanation of model choice, including limitations and benefits

Explanation of changes in model choice

Database Integration

Dashboard

Link to Dashboard

Preview of Final Project Website:

Final Preview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages