Skip to content

Commit 6543316

Browse files
ArlindKadraravinkohlinabenabe0928
committed
Bug fixes (#249)
* Update implementation * Coding style fixes * Implementation update * Style fix * Turn weighted loss into a constant again, implementation update * Cocktail branch inconsistencies (#275) * To nemo * Revert change in T_curr as results conclusively prove it should be 0 * Revert cutmix change after data from run * Final conclusion after results * FIX bug in shake alpha beta * Updated if is_training condition for shake drop * Remove temp fix in row cutmic * Cocktail fixes time debug (#286) * preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR #286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <[email protected]> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Arlind Kadra <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Addressing Shuhei's comments * flake8 problems fix * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Add indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Add line indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Validate if there is a column transformer since for sparse matrices we will not have one. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/utils/implementations.py Delete uncommented line. Co-authored-by: Ravin Kohli <[email protected]> * Allow the number of threads to be given by the user * Removing unnecessary argument and refactoring the attribute. * Addressing Ravin's comments * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Updating the function documentation according to the agreed style. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Providing information on the wrong method provided for shake-shake regularization. Co-authored-by: nabenabe0928 <[email protected]> * add todo for backend and accept changes from shuhei * Addressing Shuhei's and Ravin's comments * Addressing Shuhei's and Ravin's comments, bug fix * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving code readibility. Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving consistency. Co-authored-by: nabenabe0928 <[email protected]> * bug fix Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Ravin Kohli <[email protected]>
1 parent c4a4565 commit 6543316

35 files changed

+1796
-386
lines changed

autoPyTorch/api/base_task.py

Lines changed: 283 additions & 55 deletions
Large diffs are not rendered by default.

autoPyTorch/api/tabular_classification.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,8 @@ def search(
447447
dataset_compression=self._dataset_compression,
448448
feat_types=feat_types)
449449

450+
if self.dataset is None:
451+
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
450452
return self._search(
451453
dataset=self.dataset,
452454
optimize_metric=optimize_metric,

autoPyTorch/api/tabular_regression.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,6 @@ class TabularRegressionTask(BaseTask):
7979
Search space updates that can be used to modify the search
8080
space of particular components or choice modules of the pipeline
8181
"""
82-
8382
def __init__(
8483
self,
8584
seed: int = 1,
@@ -448,6 +447,8 @@ def search(
448447
dataset_compression=self._dataset_compression,
449448
feat_types=feat_types)
450449

450+
if self.dataset is None:
451+
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
451452
return self._search(
452453
dataset=self.dataset,
453454
optimize_metric=optimize_metric,

autoPyTorch/data/base_feature_validator.py

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import logging
2-
from typing import List, Optional, Union
2+
from typing import List, Optional, Set, Tuple, Union
33

44
import numpy as np
55

@@ -24,24 +24,21 @@ class BaseFeatureValidator(BaseEstimator):
2424
List of the column types found by this estimator during fit.
2525
data_type (str):
2626
Class name of the data type provided during fit.
27-
column_transformer (Optional[BaseEstimator])
27+
encoder (Optional[BaseEstimator])
2828
Host a encoder object if the data requires transformation (for example,
29-
if provided a categorical column in a pandas DataFrame)
30-
transformed_columns (List[str])
31-
List of columns that were encoded.
29+
if provided a categorical column in a pandas DataFrame).
3230
"""
3331
def __init__(
3432
self,
3533
logger: Optional[Union[PicklableClientLogger, logging.Logger]] = None,
36-
):
34+
) -> None:
3735
# Register types to detect unsupported data format changes
3836
self.feat_types: Optional[List[str]] = None
3937
self.data_type: Optional[type] = None
4038
self.dtypes: List[str] = []
4139
self.column_order: List[str] = []
4240

4341
self.column_transformer: Optional[BaseEstimator] = None
44-
self.transformed_columns: List[str] = []
4542

4643
self.logger: Union[
4744
PicklableClientLogger, logging.Logger
@@ -53,6 +50,8 @@ def __init__(
5350
self.categorical_columns: List[int] = []
5451
self.numerical_columns: List[int] = []
5552

53+
self.all_nan_columns: Optional[Set[Union[int, str]]] = None
54+
5655
self._is_fitted = False
5756

5857
def fit(
@@ -75,7 +74,7 @@ def fit(
7574

7675
# If a list was provided, it will be converted to pandas
7776
if isinstance(X_train, list):
78-
X_train, X_test = self.list_to_dataframe(X_train, X_test)
77+
X_train, X_test = self.list_to_pandas(X_train, X_test)
7978

8079
self._check_data(X_train)
8180

@@ -109,6 +108,7 @@ def _fit(
109108
self:
110109
The fitted base estimator
111110
"""
111+
112112
raise NotImplementedError()
113113

114114
def _check_data(
@@ -118,11 +118,12 @@ def _check_data(
118118
"""
119119
Feature dimensionality and data type checks
120120
121-
Arguments:
121+
Args:
122122
X (SUPPORTED_FEAT_TYPES):
123123
A set of features that are going to be validated (type and dimensionality
124124
checks) and a encoder fitted in the case the data needs encoding
125125
"""
126+
126127
raise NotImplementedError()
127128

128129
def transform(
@@ -139,4 +140,30 @@ def transform(
139140
np.ndarray:
140141
The transformed array
141142
"""
143+
144+
raise NotImplementedError()
145+
146+
def list_to_pandas(
147+
self,
148+
X_train: SUPPORTED_FEAT_TYPES,
149+
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
150+
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
151+
"""
152+
Converts a list to a pandas DataFrame. In this process, column types are inferred.
153+
154+
If test data is provided, we proactively match it to train data
155+
156+
Args:
157+
X_train (SUPPORTED_FEAT_TYPES):
158+
A set of features that are going to be validated (type and dimensionality
159+
checks) and a encoder fitted in the case the data needs encoding
160+
X_test (Optional[SUPPORTED_FEAT_TYPES]):
161+
A hold out set of data used for checking
162+
Returns:
163+
pd.DataFrame:
164+
transformed train data from list to pandas DataFrame
165+
pd.DataFrame:
166+
transformed test data from list to pandas DataFrame
167+
"""
168+
142169
raise NotImplementedError()

autoPyTorch/data/base_target_validator.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ def __init__(self,
3636
logging.Logger
3737
]
3838
] = None,
39-
):
39+
) -> None:
4040
self.is_classification = is_classification
4141

4242
self.data_type: Optional[type] = None
@@ -86,6 +86,7 @@ def fit(
8686
np.shape(y_test)
8787
))
8888
if isinstance(y_train, pd.DataFrame):
89+
y_train = cast(pd.DataFrame, y_train)
8990
y_test = cast(pd.DataFrame, y_test)
9091
if y_train.columns.tolist() != y_test.columns.tolist():
9192
raise ValueError(
@@ -131,7 +132,7 @@ def _fit(
131132

132133
def transform(
133134
self,
134-
y: Union[SupportedTargetTypes],
135+
y: SupportedTargetTypes,
135136
) -> np.ndarray:
136137
"""
137138
Args:

0 commit comments

Comments
 (0)