Skip to content

Conversation

@jotaylo
Copy link
Contributor

@jotaylo jotaylo commented Mar 2, 2020

This change splits train.py into two files.

The new train.py is standalone, and has no references to AzureML. It defines two functions, split_data to split a dataframe into test/train data, and train_model which takes the test/train data and a parameter object and trains the model and returns related metrics. The script can be run locally, in which case it loads a dataset from a file.

The second file, train_aml.py contains reasonably general AzureML logic. It reads data from a dataset, then calls the split_data function from train.py. It loads input parameters from a config file and logs them, then calls train_model from train.py. It then uploads the model and logs any metrics returned by train_model.

The hope with these changes is to demonstrate a simple interface for integrating an existing ML script with MLOpsPython, as well as providing an example for how the core ML functionality can be invoked in multiple ways for development purposes.

Copy link
Contributor

@dtzar dtzar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'd probably wait to merge this until you have the Juypter notebook equivalent of train_aml.py flushed out in conjunction with this.

@eedorenko eedorenko self-requested a review March 2, 2020 23:05
@tcare tcare mentioned this pull request Mar 3, 2020
@jotaylo jotaylo merged commit 39609ae into master Mar 5, 2020
@jotaylo jotaylo deleted the jotaylo/split_train_script branch March 5, 2020 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants