-
Notifications
You must be signed in to change notification settings - Fork 299
Enhancement for the tabular validator. #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ArlindKadra
merged 25 commits into
cocktail_fixes_time_debug
from
tabular_validator_enhancement
Oct 8, 2021
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
359b4c9
Initial try at an enhancement for the tabular validator
ArlindKadra 65e8ffb
Adding a few type annotations
ArlindKadra 217c38d
Fixing bugs in implementation
ArlindKadra f7dd8fe
Adding wrongly deleted code part during rebase
ArlindKadra 92bd535
Fix bug in _get_args
ravinkohli 5f672b5
Fix bug in _get_args
ravinkohli 223c09e
Addressing Shuhei's comments
ArlindKadra a1ed883
Address Shuhei's comments
ArlindKadra f585310
Refactoring code
ArlindKadra f298c46
Refactoring code
ArlindKadra 03bef16
Typos fix and additional comments
ArlindKadra a7d01f1
Replace nan in categoricals with simple imputer
ravinkohli 38fe9e8
Remove unused function
ravinkohli 7693753
add comment
ravinkohli f4cd3a4
Merge branch 'cocktail_fixes_time_debug' into tabular_validator_enhan…
ravinkohli 497c546
Update autoPyTorch/data/tabular_feature_validator.py
ravinkohli 9254eb2
Update autoPyTorch/data/tabular_feature_validator.py
ravinkohli b63ff3c
Adding unit test for only nall columns in the tabular feature categor…
ArlindKadra d5bbdbe
fix bug in remove all nan columns
ravinkohli bfe4899
Bug fix for making tests run by arlind
ravinkohli 369edad
fix flake errors in feature validator
ravinkohli a4fb0cb
made typing code uniform
ravinkohli 44229a6
Apply suggestions from code review
ravinkohli ba3c1e7
address comments from shuhei
ravinkohli 10a8441
address comments from shuhei (2)
ravinkohli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,5 +1,5 @@ | ||||||
import logging | ||||||
import typing | ||||||
from typing import List, Optional, Union, cast | ||||||
|
||||||
import numpy as np | ||||||
|
||||||
|
@@ -12,8 +12,8 @@ | |||||
from autoPyTorch.utils.logging_ import PicklableClientLogger | ||||||
|
||||||
|
||||||
SUPPORTED_TARGET_TYPES = typing.Union[ | ||||||
typing.List, | ||||||
SUPPORTED_TARGET_TYPES = Union[ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AutoPep8 rule
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets keep this a part of a separate PR later. |
||||||
List, | ||||||
pd.Series, | ||||||
pd.DataFrame, | ||||||
np.ndarray, | ||||||
|
@@ -35,39 +35,39 @@ class BaseTargetValidator(BaseEstimator): | |||||
is_classification (bool): | ||||||
A bool that indicates if the validator should operate in classification mode. | ||||||
During classification, the targets are encoded. | ||||||
encoder (typing.Optional[BaseEstimator]): | ||||||
encoder (Optional[BaseEstimator]): | ||||||
Host a encoder object if the data requires transformation (for example, | ||||||
if provided a categorical column in a pandas DataFrame) | ||||||
enc_columns (typing.List[str]) | ||||||
enc_columns (List[str]) | ||||||
List of columns that where encoded | ||||||
""" | ||||||
def __init__(self, | ||||||
is_classification: bool = False, | ||||||
logger: typing.Optional[typing.Union[PicklableClientLogger, logging.Logger | ||||||
logger: Optional[Union[PicklableClientLogger, logging.Logger | ||||||
]] = None, | ||||||
) -> None: | ||||||
self.is_classification = is_classification | ||||||
|
||||||
self.data_type = None # type: typing.Optional[type] | ||||||
self.data_type: Optional[type] = None | ||||||
|
||||||
self.encoder = None # type: typing.Optional[BaseEstimator] | ||||||
self.encoder: Optional[BaseEstimator] = None | ||||||
|
||||||
self.out_dimensionality = None # type: typing.Optional[int] | ||||||
self.type_of_target = None # type: typing.Optional[str] | ||||||
self.out_dimensionality: Optional[int] = None | ||||||
self.type_of_target: Optional[str] = None | ||||||
|
||||||
self.logger: typing.Union[ | ||||||
self.logger: Union[ | ||||||
PicklableClientLogger, logging.Logger | ||||||
] = logger if logger is not None else logging.getLogger(__name__) | ||||||
|
||||||
# Store the dtype for remapping to correct type | ||||||
self.dtype = None # type: typing.Optional[type] | ||||||
self.dtype: Optional[type] = None | ||||||
|
||||||
self._is_fitted = False | ||||||
|
||||||
def fit( | ||||||
self, | ||||||
y_train: SUPPORTED_TARGET_TYPES, | ||||||
y_test: typing.Optional[SUPPORTED_TARGET_TYPES] = None, | ||||||
y_test: Optional[SUPPORTED_TARGET_TYPES] = None, | ||||||
) -> BaseEstimator: | ||||||
""" | ||||||
Validates and fit a categorical encoder (if needed) to the targets | ||||||
|
@@ -76,7 +76,7 @@ def fit( | |||||
Arguments: | ||||||
y_train (SUPPORTED_TARGET_TYPES) | ||||||
A set of targets set aside for training | ||||||
y_test (typing.Union[SUPPORTED_TARGET_TYPES]) | ||||||
y_test (Union[SUPPORTED_TARGET_TYPES]) | ||||||
A hold out set of data used of the targets. It is also used to fit the | ||||||
categories of the encoder. | ||||||
""" | ||||||
|
@@ -95,8 +95,8 @@ def fit( | |||||
np.shape(y_test) | ||||||
)) | ||||||
if isinstance(y_train, pd.DataFrame): | ||||||
y_train = typing.cast(pd.DataFrame, y_train) | ||||||
y_test = typing.cast(pd.DataFrame, y_test) | ||||||
y_train = cast(pd.DataFrame, y_train) | ||||||
y_test = cast(pd.DataFrame, y_test) | ||||||
if y_train.columns.tolist() != y_test.columns.tolist(): | ||||||
raise ValueError( | ||||||
"Train and test targets must both have the same columns, yet " | ||||||
|
@@ -127,21 +127,21 @@ def fit( | |||||
def _fit( | ||||||
self, | ||||||
y_train: SUPPORTED_TARGET_TYPES, | ||||||
y_test: typing.Optional[SUPPORTED_TARGET_TYPES] = None, | ||||||
y_test: Optional[SUPPORTED_TARGET_TYPES] = None, | ||||||
) -> BaseEstimator: | ||||||
""" | ||||||
Arguments: | ||||||
y_train (SUPPORTED_TARGET_TYPES) | ||||||
The labels of the current task. They are going to be encoded in case | ||||||
of classification | ||||||
y_test (typing.Optional[SUPPORTED_TARGET_TYPES]) | ||||||
y_test (Optional[SUPPORTED_TARGET_TYPES]) | ||||||
A holdout set of labels | ||||||
""" | ||||||
raise NotImplementedError() | ||||||
|
||||||
def transform( | ||||||
self, | ||||||
y: typing.Union[SUPPORTED_TARGET_TYPES], | ||||||
y: Union[SUPPORTED_TARGET_TYPES], | ||||||
) -> np.ndarray: | ||||||
""" | ||||||
Arguments: | ||||||
|
@@ -162,7 +162,7 @@ def inverse_transform( | |||||
Revert any encoding transformation done on a target array | ||||||
|
||||||
Arguments: | ||||||
y (typing.Union[np.ndarray, pd.DataFrame, pd.Series]): | ||||||
y (Union[np.ndarray, pd.DataFrame, pd.Series]): | ||||||
Target array to be transformed back to original form before encoding | ||||||
Returns: | ||||||
np.ndarray: | ||||||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.