Skip to content

Cocktail fixes time debug #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Oct 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
eade387
preprocess inside data validator
ravinkohli Aug 3, 2021
b76b05e
add time debug statements
ravinkohli Aug 3, 2021
bbf9b07
Add fixes for categorical data
ravinkohli Aug 3, 2021
99d7407
add fit_ensemble
ravinkohli Aug 5, 2021
8144774
add arlind fix for swa and se
ravinkohli Aug 31, 2021
06ad658
fix bug in trainer choice fit
ravinkohli Sep 6, 2021
1942279
fix ensemble bug
ravinkohli Sep 8, 2021
2dc8850
Correct bug in cleanup
ravinkohli Sep 8, 2021
06d80d4
Cleanup for removing time debug statements
ravinkohli Sep 16, 2021
d8b553a
ablation for adversarial
ravinkohli Sep 20, 2021
062de69
Merge branch 'cocktail_fixes_time_debug' of github.com:automl/Auto-Py…
ravinkohli Sep 20, 2021
34712b3
shuffle false in dataloader
ravinkohli Sep 21, 2021
49f40dc
drop last false in dataloader
ravinkohli Sep 21, 2021
f4ea158
fix bug for validation set, and cutout and cutmix
ravinkohli Sep 23, 2021
209a4e8
shuffle = False
ravinkohli Sep 24, 2021
8fb0bc2
Shake Shake updates (#287)
ravinkohli Sep 30, 2021
064e4a9
fix issues with ensemble fitting post hoc
ravinkohli Sep 30, 2021
ed48dab
Address comments on the PR
ravinkohli Sep 30, 2021
9cdfb64
Fix flake and mypy errors
ravinkohli Sep 30, 2021
6bd4300
Address comments from PR #286
ravinkohli Oct 4, 2021
9c0c47b
fix bug in embedding
ravinkohli Oct 4, 2021
e838004
Update autoPyTorch/api/tabular_classification.py
ravinkohli Oct 4, 2021
893a15d
Update autoPyTorch/datasets/base_dataset.py
ravinkohli Oct 4, 2021
ed0602c
Update autoPyTorch/datasets/base_dataset.py
ravinkohli Oct 4, 2021
224c69e
Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py
ravinkohli Oct 4, 2021
e61c1a3
Address comments from shuhei
ravinkohli Oct 4, 2021
3d47afa
adress comments from shuhei
ravinkohli Oct 4, 2021
b417346
fix flake and mypy
ravinkohli Oct 4, 2021
2354159
Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrai…
ravinkohli Oct 4, 2021
7e59f4d
Update autoPyTorch/pipeline/tabular_classification.py
ravinkohli Oct 4, 2021
7ab5d26
Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py
ravinkohli Oct 4, 2021
0032834
Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py
ravinkohli Oct 4, 2021
90ce40c
Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py
ravinkohli Oct 4, 2021
f51d239
Apply suggestions from code review
ravinkohli Oct 4, 2021
42e6b5a
increase threads_per_worker
ravinkohli Oct 4, 2021
f79a4fc
fix bug in rowcutmix
ravinkohli Oct 5, 2021
6d9f99f
Enhancement for the tabular validator. (#291)
ArlindKadra Oct 8, 2021
9661409
Apply suggestions from code review
ravinkohli Oct 11, 2021
36cb3c4
resolve code issues with new versions
ravinkohli Oct 11, 2021
6953ee7
Address comments from shuhei
ravinkohli Oct 11, 2021
4b7e75f
make run_traditional_ml function
ravinkohli Oct 11, 2021
cce21a6
implement suggestion from shuhei and fix bug in rowcutmixtrainer
ravinkohli Oct 11, 2021
4b5db0d
fix return type docstring
ravinkohli Oct 11, 2021
80f1c1e
add better documentation and fix bug in shake_drop_get_bl
ravinkohli Oct 11, 2021
dc01cd3
Apply suggestions from code review
ravinkohli Oct 12, 2021
f0c2aa0
add test for comparator and other improvements based on PR comments
ravinkohli Oct 12, 2021
57111e9
fix bug in test
ravinkohli Oct 12, 2021
153878f
[fix] Fix the condition in the raising error of all_nan_columns
nabenabe0928 Oct 12, 2021
64862fe
[refactor] Unite name conventions of numpy array and pandas dataframe
nabenabe0928 Oct 12, 2021
410c7fe
[doc] Add the description about the tabular feature transformation
nabenabe0928 Oct 12, 2021
baa7ab8
[doc] Add the description of the tabular feature transformation
nabenabe0928 Oct 12, 2021
e1eb854
address comments from arlind
ravinkohli Oct 20, 2021
4545fdb
address comments from arlind
ravinkohli Oct 20, 2021
8519a48
change to as_tensor and address comments from arlind
ravinkohli Oct 20, 2021
2c3a525
correct description for functions in data module
ravinkohli Oct 20, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 280 additions & 57 deletions autoPyTorch/api/base_task.py

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions autoPyTorch/api/tabular_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,8 @@ def search(
y_test=y_test,
dataset_name=dataset_name)

if self.dataset is None:
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
return self._search(
dataset=self.dataset,
optimize_metric=optimize_metric,
Expand Down
2 changes: 2 additions & 0 deletions autoPyTorch/api/tabular_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,8 @@ def search(
y_test=y_test,
dataset_name=dataset_name)

if self.dataset is None:
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
return self._search(
dataset=self.dataset,
optimize_metric=optimize_metric,
Expand Down
83 changes: 56 additions & 27 deletions autoPyTorch/data/base_feature_validator.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import logging
import typing
from typing import List, Optional, Set, Tuple, Union

import numpy as np

Expand All @@ -12,8 +12,8 @@
from autoPyTorch.utils.logging_ import PicklableClientLogger


SUPPORTED_FEAT_TYPES = typing.Union[
typing.List,
SUPPORTED_FEAT_TYPES = Union[
List,
pd.DataFrame,
np.ndarray,
scipy.sparse.bsr_matrix,
Expand All @@ -35,60 +35,61 @@ class BaseFeatureValidator(BaseEstimator):
List of the column types found by this estimator during fit.
data_type (str):
Class name of the data type provided during fit.
encoder (typing.Optional[BaseEstimator])
encoder (Optional[BaseEstimator])
Host a encoder object if the data requires transformation (for example,
if provided a categorical column in a pandas DataFrame)
enc_columns (typing.List[str])
enc_columns (List[str])
List of columns that were encoded.
"""
def __init__(self,
logger: typing.Optional[typing.Union[PicklableClientLogger, logging.Logger
]] = None,
logger: Optional[Union[PicklableClientLogger, logging.Logger
]
] = None,
) -> None:
# Register types to detect unsupported data format changes
self.feat_type = None # type: typing.Optional[typing.List[str]]
self.data_type = None # type: typing.Optional[type]
self.dtypes = [] # type: typing.List[str]
self.column_order = [] # type: typing.List[str]
self.feat_type: Optional[List[str]] = None
self.data_type: Optional[type] = None
self.dtypes: List[str] = []
self.column_order: List[str] = []

self.encoder = None # type: typing.Optional[BaseEstimator]
self.enc_columns = [] # type: typing.List[str]
self.encoder: Optional[BaseEstimator] = None
self.enc_columns: List[str] = []

self.logger: typing.Union[
self.logger: Union[
PicklableClientLogger, logging.Logger
] = logger if logger is not None else logging.getLogger(__name__)

# Required for dataset properties
self.num_features = None # type: typing.Optional[int]
self.categories = [] # type: typing.List[typing.List[int]]
self.categorical_columns: typing.List[int] = []
self.numerical_columns: typing.List[int] = []
# column identifiers may be integers or strings
self.null_columns: typing.Set[str] = set()
self.num_features: Optional[int] = None
self.categories: List[List[int]] = []
self.categorical_columns: List[int] = []
self.numerical_columns: List[int] = []

self.all_nan_columns: Optional[Set[Union[int, str]]] = None

self._is_fitted = False

def fit(
self,
X_train: SUPPORTED_FEAT_TYPES,
X_test: typing.Optional[SUPPORTED_FEAT_TYPES] = None,
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
) -> BaseEstimator:
"""
Validates and fit a categorical encoder (if needed) to the features.
The supported data types are List, numpy arrays and pandas DataFrames.
CSR sparse data types are also supported

Arguments:
Args:
X_train (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
X_test (typing.Optional[SUPPORTED_FEAT_TYPES]):
X_test (Optional[SUPPORTED_FEAT_TYPES]):
A hold out set of data used for checking
"""

# If a list was provided, it will be converted to pandas
if isinstance(X_train, list):
X_train, X_test = self.list_to_dataframe(X_train, X_test)
X_train, X_test = self.list_to_pandas(X_train, X_test)

self._check_data(X_train)

Expand All @@ -114,14 +115,15 @@ def _fit(
X: SUPPORTED_FEAT_TYPES,
) -> BaseEstimator:
"""
Arguments:
Args:
X (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
Returns:
self:
The fitted base estimator
"""

raise NotImplementedError()

def _check_data(
Expand All @@ -131,19 +133,20 @@ def _check_data(
"""
Feature dimensionality and data type checks

Arguments:
Args:
X (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
"""

raise NotImplementedError()

def transform(
self,
X: SUPPORTED_FEAT_TYPES,
) -> np.ndarray:
"""
Arguments:
Args:
X_train (SUPPORTED_FEAT_TYPES):
A set of features, whose categorical features are going to be
transformed
Expand All @@ -152,4 +155,30 @@ def transform(
np.ndarray:
The transformed array
"""

raise NotImplementedError()

def list_to_pandas(
self,
X_train: SUPPORTED_FEAT_TYPES,
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""
Converts a list to a pandas DataFrame. In this process, column types are inferred.

If test data is provided, we proactively match it to train data

Args:
X_train (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
X_test (Optional[SUPPORTED_FEAT_TYPES]):
A hold out set of data used for checking
Returns:
pd.DataFrame:
transformed train data from list to pandas DataFrame
pd.DataFrame:
transformed test data from list to pandas DataFrame
"""

raise NotImplementedError()
52 changes: 27 additions & 25 deletions autoPyTorch/data/base_target_validator.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import logging
import typing
from typing import List, Optional, Union, cast

import numpy as np

Expand All @@ -12,8 +12,8 @@
from autoPyTorch.utils.logging_ import PicklableClientLogger


SUPPORTED_TARGET_TYPES = typing.Union[
typing.List,
SUPPORTED_TARGET_TYPES = Union[
List,
pd.Series,
pd.DataFrame,
np.ndarray,
Expand All @@ -35,48 +35,50 @@ class BaseTargetValidator(BaseEstimator):
is_classification (bool):
A bool that indicates if the validator should operate in classification mode.
During classification, the targets are encoded.
encoder (typing.Optional[BaseEstimator]):
encoder (Optional[BaseEstimator]):
Host a encoder object if the data requires transformation (for example,
if provided a categorical column in a pandas DataFrame)
enc_columns (typing.List[str])
enc_columns (List[str])
List of columns that where encoded
"""
def __init__(self,
is_classification: bool = False,
logger: typing.Optional[typing.Union[PicklableClientLogger, logging.Logger
]] = None,
logger: Optional[Union[PicklableClientLogger,
logging.Logger
]
] = None,
) -> None:
self.is_classification = is_classification

self.data_type = None # type: typing.Optional[type]
self.data_type: Optional[type] = None

self.encoder = None # type: typing.Optional[BaseEstimator]
self.encoder: Optional[BaseEstimator] = None

self.out_dimensionality = None # type: typing.Optional[int]
self.type_of_target = None # type: typing.Optional[str]
self.out_dimensionality: Optional[int] = None
self.type_of_target: Optional[str] = None

self.logger: typing.Union[
self.logger: Union[
PicklableClientLogger, logging.Logger
] = logger if logger is not None else logging.getLogger(__name__)

# Store the dtype for remapping to correct type
self.dtype = None # type: typing.Optional[type]
self.dtype: Optional[type] = None

self._is_fitted = False

def fit(
self,
y_train: SUPPORTED_TARGET_TYPES,
y_test: typing.Optional[SUPPORTED_TARGET_TYPES] = None,
y_test: Optional[SUPPORTED_TARGET_TYPES] = None,
) -> BaseEstimator:
"""
Validates and fit a categorical encoder (if needed) to the targets
The supported data types are List, numpy arrays and pandas DataFrames.

Arguments:
Args:
y_train (SUPPORTED_TARGET_TYPES)
A set of targets set aside for training
y_test (typing.Union[SUPPORTED_TARGET_TYPES])
y_test (Union[SUPPORTED_TARGET_TYPES])
A hold out set of data used of the targets. It is also used to fit the
categories of the encoder.
"""
Expand All @@ -95,8 +97,8 @@ def fit(
np.shape(y_test)
))
if isinstance(y_train, pd.DataFrame):
y_train = typing.cast(pd.DataFrame, y_train)
y_test = typing.cast(pd.DataFrame, y_test)
y_train = cast(pd.DataFrame, y_train)
y_test = cast(pd.DataFrame, y_test)
if y_train.columns.tolist() != y_test.columns.tolist():
raise ValueError(
"Train and test targets must both have the same columns, yet "
Expand Down Expand Up @@ -127,24 +129,24 @@ def fit(
def _fit(
self,
y_train: SUPPORTED_TARGET_TYPES,
y_test: typing.Optional[SUPPORTED_TARGET_TYPES] = None,
y_test: Optional[SUPPORTED_TARGET_TYPES] = None,
) -> BaseEstimator:
"""
Arguments:
Args:
y_train (SUPPORTED_TARGET_TYPES)
The labels of the current task. They are going to be encoded in case
of classification
y_test (typing.Optional[SUPPORTED_TARGET_TYPES])
y_test (Optional[SUPPORTED_TARGET_TYPES])
A holdout set of labels
"""
raise NotImplementedError()

def transform(
self,
y: typing.Union[SUPPORTED_TARGET_TYPES],
y: Union[SUPPORTED_TARGET_TYPES],
) -> np.ndarray:
"""
Arguments:
Args:
y (SUPPORTED_TARGET_TYPES)
A set of targets that are going to be encoded if the current task
is classification
Expand All @@ -161,8 +163,8 @@ def inverse_transform(
"""
Revert any encoding transformation done on a target array

Arguments:
y (typing.Union[np.ndarray, pd.DataFrame, pd.Series]):
Args:
y (Union[np.ndarray, pd.DataFrame, pd.Series]):
Target array to be transformed back to original form before encoding
Returns:
np.ndarray:
Expand Down
4 changes: 2 additions & 2 deletions autoPyTorch/data/base_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def fit(
+ Checks for dimensionality as well as missing values are performed.
+ If performing a classification task, the data is going to be encoded

Arguments:
Args:
X_train (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks). If this data contains categorical columns, an encoder is going to
Expand Down Expand Up @@ -102,7 +102,7 @@ def transform(
"""
Transform the given target or features to a numpy array

Arguments:
Args:
X (SUPPORTED_FEAT_TYPES):
A set of features to transform
y (typing.Optional[SUPPORTED_TARGET_TYPES]):
Expand Down
Loading