Skip to content

Added get_train_test_stratified_by_time_and_col #27272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Added get_train_test_stratified_by_time_and_col #27272

wants to merge 1 commit into from

Conversation

bhavsarpratik
Copy link

A lot of time we want to split dataframe into train and test by time and make sure it's stratified by a key. I suggest get_train_test_stratified_by_time_and_col which does this.

stratify_split_col_name: The column name which has to be split in train and test in a stratified way.
time_sort_col_name: The column name used for sorting rows by time.
test_size: Test size
ascending: For sorting time_sort_col_name by ascending or descending.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

A lot of time we want to split dataframe into train and test by time and make sure it's stratified by a key. I suggest get_train_test_stratified_by_time_and_col which does this. 

stratify_split_col_name: The column name which has to be split in train and test in a stratified way.
time_sort_col_name: The column name used for sorting rows by time.
test_size: Test size
ascending: For sorting time_sort_col_name by ascending or descending.
@pep8speaks
Copy link

Hello @bhavsarpratik! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 2480:1: E302 expected 2 blank lines, found 1
Line 2493:42: E226 missing whitespace around arithmetic operator

@jreback
Copy link
Contributor

jreback commented Jul 7, 2019

you should open an issue to discuss adding API before a PR
we have rejected splitting apis like this is past as they are out of scope for pandas

@jreback jreback closed this Jul 7, 2019
@jorisvandenbossche
Copy link
Member

For context, see #6687 for some previous discussion about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants