-
Notifications
You must be signed in to change notification settings - Fork 368
feat: specialized dataset classes, fix: datasets refactor #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: specialized dataset classes, fix: datasets refactor #153
Conversation
|
||
@property | ||
def dataset_metadata(self) -> Optional[Dict]: | ||
return self._dataset_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably just return {}
here instead of saving to self._dataset_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, just return None
) | ||
|
||
@classmethod | ||
def create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little strange, I thought the point of the subclass was to only expose the relevant parameters and exclude non-relevant ones (for example, bq_source in the case of image_dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence, in the branch I provided, the signature was:
@classmethod
def create(
cls,
display_name: str,
gcs_source_uris: Sequence[str],
import_schema_uri: str,
data_items_labels: Optional[Dict] = None,
metadata: Sequence[Tuple[str, str]] = (),
labels: Optional[Dict] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[auth_credentials.Credentials] = None,
sync=True,
) -> "Dataset":
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes more sense to use this instead:
def create(
cls,
display_name: str,
datasource: Union[NonTabularDatasource, NonTabularDatasourceImportable],
metadata: Sequence[Tuple[str, str]] = (),
labels: Optional[Dict] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[auth_credentials.Credentials] = None,
sync=True,
) -> "Dataset":
This would automatically give you validation such as:
- dataset_items_labels is only relevant when import_schema_uri is provided
- gcs_source_uris is only relevant when import_schema_uri is provided
It'd also make subclasses even more trivial to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need @sasha-gitg's opinion on this since this would be user-facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanmkc to your first feedback, bq_source was left by mistake, will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input signature LGTM.
|
||
_support_import_schema_classes = ("image",) | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an overridden init method at all in the ImageDataset
subclass since it seems like this just calls the Dataset's init method with no modifications. Wouldn't it be redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason for using an overridden init in the subclass is to check that the returned existing Dataset's metadata_schema_uri
matches Image (or other subclasses). However, I don't see this here as you put in in Dataset class (which I commented on)
) | ||
import_data_config = datasource.import_data_config | ||
|
||
return cls._create_and_import( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we agreed to create a function that encapsulates Tabular/Non-tabular parameters as discussed in https://docs.google.com/document/d/16Nu_jFnGO79mcn83IgtpvTCYYEwxyPLOgdvxdxfILbI/edit?resourcekey=0-89933HPbKWlN6sz31SvBuA#bookmark=id.k68tqmbzwags
That way you wouldn't call _create_and_import
directly which would simplify all the subclasses as _create_and_import
has many optional parameters with potentially invalid combinations.
input file(s). May contain wildcards. For more | ||
information on wildcards, see | ||
https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. | ||
bq_source: Optional[str] = None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant for image datasets
|
||
_support_import_schema_classes = None | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see ImageDataset.init comment
cls.metadata_schema_uri = schema.dataset.metadata.tabular | ||
datasource = TabularDatasource(gcs_source, bq_source) | ||
|
||
return cls._create_and_import( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, see ImageDataset.create comment
sync=sync, | ||
) | ||
|
||
def import_data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I wish there was a cleaner way to do this, like split out import_data
(and other import functionality) from Dataset so that only relevant subclasses would even have this function. This way we wouldn't have to explicitly nullify it. This definitely feels hacky.
Perhaps we can do a separate refactor later:
- DatasetBase only includes Dataset creation functionality
- Dataset inherits DatasetBase and adds import functionality
- ImageDataset would inherit Dataset
- TabularDataset would inherit DatasetBase
This way TabularDataset wouldn't even have an import_data
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems big enough to be a subsequent PR.
return self._metadata_schema_uri | ||
|
||
@metadata_schema_uri.setter | ||
def metadata_schema_uri(self, metadata_schema_uri): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer not to expose a getter, and a setter even less so unless required.
They can pass this in through the create
functions.
Having a setter as well means that the validation has to be carefully synced across all places where metadata_schema_uri is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also can introduce temporal dependencies like I pointed out in the other comment.
f"{cls.metadata_schema_uri} does not support import_schema_uri." | ||
) | ||
datasource = TabularDatasource(gcs_source, bq_source) | ||
elif cls.metadata_schema_uri in [schema.dataset.metadata.image]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably just check for tabular and then assume everything else is non-tabular instead of checking that it matches image, etc.
return True | ||
|
||
@property | ||
def metadata_schema_uri(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're exposing this? I'm of the opinion that we don't need to expose it as a property if not required.
|
||
for import_schema_class in self._support_import_schema_classes: | ||
if ( | ||
self.metadata_schema_uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a temporal dependency/coupling such that the user has to first set self.metadata_schema_uri
before setting self.import_schema_uri
, which can introduce bugs, since both are setters.
This wouldn't be apparent to a user unless they peered into the class, and even then, can be error-prone.
For reasons like this, I would prefer not to have property setters if possible.
We could perhaps provide getters if that's something users care about, although I'd lean towards hiding as many internal workings as possible.
@@ -226,7 +226,7 @@ def test_create_and_import_dataset( | |||
labels=_TEST_LABEL, | |||
metadata_schema_uri=_TEST_METADATA_SCHEMA_URI_NONTABULAR, | |||
import_schema_uri=_TEST_IMPORT_SCHEMA_URI, | |||
data_items_labels=_TEST_DATA_LABEL_ITEMS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting together the PR! Left some comments
# limitations under the License. | ||
# | ||
|
||
from google.cloud.aiplatform.datasets.datasources import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here and below we prefer to import the module instead of the classes. from google.cloud.aiplatform.datasets import datasources
@@ -77,45 +155,37 @@ def create( | |||
metadata_schema_uri: str, | |||
gcs_source: Optional[Sequence[str]] = None, | |||
bq_source: Optional[str] = None, | |||
labels: Optional[Dict] = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of having this might not outweigh the confusion it would cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, labels
is named poorly. Maybe in MBSDK we can rename it as metadata_labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are GCP resource labels then resource_labels
should be OK. I would recommend removing this and creating a ticket to add resource labeling to all MB SDK resources and follow up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a ticket for resource_labels to resources
|
||
|
||
class _Datasource(abc.ABC): | ||
"""An abstract class that sets dataset_metadata""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of the distinction between tabular vs non-tabular, and importable vs non-importable is tentative oddity that will be removed from the service, and as such how much should be completely hidden from the user?
As an example, while tabular datasets do not support an import_data service method, the library could well support it through BQ/GCS, which is the expected mechanism for updating tabular datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The near-term goal is to keep the Tabular vs Non-tabular distinction internal so the users don't have to worry about it if they used the Dataset subclasses. The subclasses will tell them exactly which arguments are relevant.
So the effect is that internally, the _Datasource class abstracts away the Tabular/Non-tabular details so that the Dataset class doesn't need to know about the differences. If there comes a day that the services can handle Tabular/Non-tabular the same way, all we have to do is replace TabularDatasource and NonTabularDatasource with some UnifiedDatasource that also conforms to _Datasource. Dataset will not have to be touched (unless the service changes things there too).
In the case of importing data, once we have a mechanism to import data to tabular datasets, we just conform TabularDatasource to _DatasourceImportable, provide the relevant logic and it should thus gain the ability to import data.
b5431c3
to
b21db22
Compare
dbb9297
to
34cffdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Some minor requested changes and unit tests. Thanks!
Required. A fully-qualified dataset resource name or dataset ID. | ||
Example: "projects/123/locations/us-central1/datasets/456" or | ||
"456" when project and location are initialized or passed. | ||
project (str): | ||
project: (str) = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is not to include the default value in the docstring. These docstring args should be reverted back to:
arg_name (arg_type):
format.
"""The metadata schema uri of this dataset resource.""" | ||
return self._gca_resource.metadata_schema_uri | ||
|
||
def _validate_metadata_schema_uri(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is validating, it's safe to return None and not a bool.
return self._gca_resource.metadata_schema_uri | ||
|
||
def _validate_metadata_schema_uri(self) -> bool: | ||
"""Validate the metadata_schema_uri of retrieved dataset resource.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires raises section.
@@ -77,45 +155,37 @@ def create( | |||
metadata_schema_uri: str, | |||
gcs_source: Optional[Sequence[str]] = None, | |||
bq_source: Optional[str] = None, | |||
labels: Optional[Dict] = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are GCP resource labels then resource_labels
should be OK. I would recommend removing this and creating a ticket to add resource labeling to all MB SDK resources and follow up later.
bq_source: Optional[str] = None, | ||
labels: Optional[Dict] = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side point: Avoid providing a mutable type as a default argument.
@@ -0,0 +1,141 @@ | |||
import abc | |||
from typing import Optional, Dict, Sequence, Union | |||
from google.cloud.aiplatform_v1beta1 import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be module level imports and they should be imported directly from the module they are implemented in.
@@ -0,0 +1,141 @@ | |||
import abc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module should be prefixed with underscore, ie: _datasources
to indicate that we don't intend for it to be used on the API surface.
@@ -42,6 +41,8 @@ class Dataset(base.AiPlatformResourceNounWithFutureManager): | |||
_resource_noun = "datasets" | |||
_getter_method = "get_dataset" | |||
|
|||
_support_metadata_schema_uris = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer _supported_metadata_schema_uris
. Also it would be nice to include the type signature to signal to devs what the supported type is, ie: _supported_metadata_schema_uris: Optional[Tuple[str]] = None
.
) | ||
|
||
@classmethod | ||
def create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input signature LGTM.
google/cloud/aiplatform/utils.py
Outdated
@@ -18,7 +18,7 @@ | |||
|
|||
import re | |||
|
|||
from typing import Any, Match, Optional, Type, TypeVar, Tuple | |||
from typing import Any, Match, Optional, Type, TypeVar, Tuple, Sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was Sequence added?
project: str, | ||
location: str, | ||
credentials: Optional[auth_credentials.Credentials], | ||
datasource: datasources.Datasource, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
- remove Sequence from utils.py - refactor datasources.py to _datasources.py - change docstring format to arg_name (arg_type): convention - change and include the type signature _supported_metadata_schema_uris - change _validate_metadata_schema_uri - refactor _create_encapsulated to _create_and_import - refactor to module level imports - add tests for ImageDataset and TabularDataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added a few more comments.
This branch needs to be updated with the latest dev branch so the builds pass.
"""Creates a tabular datasource | ||
|
||
Args: | ||
gcs_source (Optional[Union[str, Sequence[str]]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK to remove the Optional type hint in the docstring as that's communicated by not including "Required" in the description.
examples: | ||
str: "gs://bucket/file.csv" | ||
Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"] | ||
bq_source (Optional[str]): | ||
BigQuery URI to the input table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also want a sample for bq_source
to match the documentation for gcs_source
.
dataset_metadata = { | ||
"input_config": {"bigquery_source": {"uri": bq_source}} | ||
} | ||
if metadata_schema_uri == schema.dataset.metadata.tabular: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should push this conditional instantiation of correct Datasource class as a helper method in the _datasource module.
assert my_dataset._gca_resource == expected_dataset | ||
|
||
|
||
class TestTabularDataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have one more test to ensure import_data raises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once rest of comments addressed!
Let me know if you need help rebasing.
cb0f75d
to
4d47fe3
Compare
* fix: unblock builds (#132) * chore: Update README with Experimental verbiage (#131) * fix: Fixed comments (#116) Co-authored-by: Ivan Cheung <[email protected]> * feat: Implements a wrapped client that instantiates the client at every API invocation (#139) * feat: Added optional model args for custom training (#129) * Added optional model args * fix: Removed etag * fix: Added predict schemata and fixed type error * fix: Added description and fixed predict_schemata * Added _model_serving_container_command, _model_serving_container_args, env=self._model_serving_container_environment_variables and _model_serving_container_ports * fix: Ran linter * fix: Added tests for model_instance_schema_uri, model_parameters_schema_uri and model_prediction_schema_uri * fix: Fixed env and ports and added tests * fix: Removed model_labels * fix: Moved container spec creation into init function * fix: Fixed docstrings * fix: Moved import to be alphabetical * fix: Moved model creation to init function * fix: Fixed predict_schemata * fix: simplified predict schemata * fix: added linter * fix: Fixed trailing comma * fix: Removed CustomTrainingJob private fields * fix: Fixed model tests * fix: Set managed_model to None Co-authored-by: Ivan Cheung <[email protected]> * Fix: refactor class constructor for retrieving resource (#125) * Added property and abstract method _getter_method and _resource_noun, implemented method _get_gca_resource to class AiPlatformResourceNoun; Added _resource_noun, _getter_method, to Dataset, Model, Endpoint, subclasses of _Job, _TrainingJob, refactored (_)get_* and utils.full_resource_name in class constructor to self._get_gca_resource to Dataset, Model, Endpoint, _Job * Added return value in _get_gca_resource, added method _sync_gca_resource in AiPlatformResourceNoun class; removed job_type, updated status method with _sync_gca_resource in _Job class * fix: added return type and lint issues * fix: merge conflict issue with models.py * fix: F401 'abc' imported but unused * chore: merge main into dev (#154) * test: Dataset integration tests (#126) * Add dataset.metadata.text to schemas * Add first integation tests, Dataset class * Make teardown work if test fails, update asserts * Change test folder name, enable system tests * Hide test_base, test_end_to_end for Kokoro CI bug * Add GCP Project env var to Kokoro presubmit cfg * Restore presubmit cfg, drop --quiet in unit tests * Restore test_base, test_end_to_end to find timeout * Skip tests depending on persistent resources * Use auth default creds for system tests * Drop unused import os * feat: specialized dataset classes, fix: datasets refactor (#153) * feat: Refactored Dataset by removing intermediate layers * Added image_dataset and tabular_dataset subclass * Moved metadata_schema_uri responsibility to subclass to enable forecasting * Moved validation logic for tabular into Dataset._create_tabular * Added validation in image_dataset and fixed bounding_box schema error * Removed import_config * Fixed metadata_schema_uri * Fixed import and subclasses * Added EmptyNontabularDatasource * change import_metadata to ioformat * added datasources.py * added support of multiple gcs_sources * fix: default (empty) dataset_metadata need to be set to {}, not None * 1) imported datasources 2) added _support_metadata_schema_uris and _support_import_schema_classes 3) added getter and setter/validation for resource_metadata_schema_uri, metadata_schema_uri, and import_schema_uri 4) fixed request_metadata, data_item_labels 5) encapsulated dataset_metadata, and import_data_configs 6) added datasource configuration logic * added image_dataset.py and tabular_dataset.py * fix: refactor - create datasets modeule * fix: cleanup __init__.py * fix: data_item_labels * fix: docstring * fix: - changed NonTabularDatasource.dataset_metadata default to None - updated NonTabularDatasource docstring - changed gcs_source type hint with Union - changed _create_and_import to _create_encapsulated with datasource - removed subclass.__init__ and irrelevant parameters in create * fix: import the module instead of the classes for datasources * fix: removed all validation for import_schema_uri * fix: set parameter default to immutable * fix: replaced Datasource / DatasourceImportable abstract class instead of a concrete type * fix: added examples for gcs_source * fix: - remove Sequence from utils.py - refactor datasources.py to _datasources.py - change docstring format to arg_name (arg_type): convention - change and include the type signature _supported_metadata_schema_uris - change _validate_metadata_schema_uri - refactor _create_encapsulated to _create_and_import - refactor to module level imports - add tests for ImageDataset and TabularDataset * fix: remove all labels * fix: remove Optional in docstring, add example for bq_source * test: add import_data raise for tabular dataset test * fix: refactor datasource creation with create_datasource * fix: lint Co-authored-by: Ivan Cheung <[email protected]> * feat: Add AutoML Image Training Job class (#152) * Add AutoMLImageTrainingJob, tests, constants * Address reviewer comments * feat: Add custom container support (#164) * chore: merge main into dev (#162) * fix: suppress no project id warning (#160) * fix: suppress no project id warning * fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials * fix: move default credentials config to credentials property * fix: add property setter for credentials to avoid everytime reset * fix: Fixed wrong key value for multilabel (#168) Co-authored-by: Ivan Cheung <[email protected]> * feat: Add delete methods, add list_models and undeploy_all for Endpoint class (#165) * Endpoint list_models, delete, undeploy_all WIP * Finish delete + undeploy methods, tests * Add global pool teardowns for test timeout issue * Address reviewer comments, add async support * fix: Fixed bug causing training failure for object detection (#171) Co-authored-by: Ivan Cheung <[email protected]> * fix: Support intermediary BQ Table for Custom Training (#166) * chore: add AutoMLImageTrainingJob to aiplatform namespace (#173) * fix: Unblock build (#174) * fix: default credentials config related test failures (#167) * fix: suppress no project id warning * fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials * fix: move default credentials config to credentials property * fix: add property setter for credentials to avoid everytime reset * fix: tests for set credentials to default when default not provided * fix: change credentials with initializer default when not provided in AiPlatformResourceNoun * fix: use credential mock in tests * fix: lint Co-authored-by: sasha-gitg <[email protected]> Co-authored-by: Ivan Cheung <[email protected]> Co-authored-by: Ivan Cheung <[email protected]> Co-authored-by: Morgan Du <[email protected]> Co-authored-by: Vinny Senthil <[email protected]>
* fix: unblock builds (#132) * chore: Update README with Experimental verbiage (#131) * fix: Fixed comments (#116) Co-authored-by: Ivan Cheung <[email protected]> * feat: Implements a wrapped client that instantiates the client at every API invocation (#139) * feat: Added optional model args for custom training (#129) * Added optional model args * fix: Removed etag * fix: Added predict schemata and fixed type error * fix: Added description and fixed predict_schemata * Added _model_serving_container_command, _model_serving_container_args, env=self._model_serving_container_environment_variables and _model_serving_container_ports * fix: Ran linter * fix: Added tests for model_instance_schema_uri, model_parameters_schema_uri and model_prediction_schema_uri * fix: Fixed env and ports and added tests * fix: Removed model_labels * fix: Moved container spec creation into init function * fix: Fixed docstrings * fix: Moved import to be alphabetical * fix: Moved model creation to init function * fix: Fixed predict_schemata * fix: simplified predict schemata * fix: added linter * fix: Fixed trailing comma * fix: Removed CustomTrainingJob private fields * fix: Fixed model tests * fix: Set managed_model to None Co-authored-by: Ivan Cheung <[email protected]> * Fix: refactor class constructor for retrieving resource (#125) * Added property and abstract method _getter_method and _resource_noun, implemented method _get_gca_resource to class AiPlatformResourceNoun; Added _resource_noun, _getter_method, to Dataset, Model, Endpoint, subclasses of _Job, _TrainingJob, refactored (_)get_* and utils.full_resource_name in class constructor to self._get_gca_resource to Dataset, Model, Endpoint, _Job * Added return value in _get_gca_resource, added method _sync_gca_resource in AiPlatformResourceNoun class; removed job_type, updated status method with _sync_gca_resource in _Job class * fix: added return type and lint issues * fix: merge conflict issue with models.py * fix: F401 'abc' imported but unused * chore: merge main into dev (#154) * test: Dataset integration tests (#126) * Add dataset.metadata.text to schemas * Add first integation tests, Dataset class * Make teardown work if test fails, update asserts * Change test folder name, enable system tests * Hide test_base, test_end_to_end for Kokoro CI bug * Add GCP Project env var to Kokoro presubmit cfg * Restore presubmit cfg, drop --quiet in unit tests * Restore test_base, test_end_to_end to find timeout * Skip tests depending on persistent resources * Use auth default creds for system tests * Drop unused import os * feat: specialized dataset classes, fix: datasets refactor (#153) * feat: Refactored Dataset by removing intermediate layers * Added image_dataset and tabular_dataset subclass * Moved metadata_schema_uri responsibility to subclass to enable forecasting * Moved validation logic for tabular into Dataset._create_tabular * Added validation in image_dataset and fixed bounding_box schema error * Removed import_config * Fixed metadata_schema_uri * Fixed import and subclasses * Added EmptyNontabularDatasource * change import_metadata to ioformat * added datasources.py * added support of multiple gcs_sources * fix: default (empty) dataset_metadata need to be set to {}, not None * 1) imported datasources 2) added _support_metadata_schema_uris and _support_import_schema_classes 3) added getter and setter/validation for resource_metadata_schema_uri, metadata_schema_uri, and import_schema_uri 4) fixed request_metadata, data_item_labels 5) encapsulated dataset_metadata, and import_data_configs 6) added datasource configuration logic * added image_dataset.py and tabular_dataset.py * fix: refactor - create datasets modeule * fix: cleanup __init__.py * fix: data_item_labels * fix: docstring * fix: - changed NonTabularDatasource.dataset_metadata default to None - updated NonTabularDatasource docstring - changed gcs_source type hint with Union - changed _create_and_import to _create_encapsulated with datasource - removed subclass.__init__ and irrelevant parameters in create * fix: import the module instead of the classes for datasources * fix: removed all validation for import_schema_uri * fix: set parameter default to immutable * fix: replaced Datasource / DatasourceImportable abstract class instead of a concrete type * fix: added examples for gcs_source * fix: - remove Sequence from utils.py - refactor datasources.py to _datasources.py - change docstring format to arg_name (arg_type): convention - change and include the type signature _supported_metadata_schema_uris - change _validate_metadata_schema_uri - refactor _create_encapsulated to _create_and_import - refactor to module level imports - add tests for ImageDataset and TabularDataset * fix: remove all labels * fix: remove Optional in docstring, add example for bq_source * test: add import_data raise for tabular dataset test * fix: refactor datasource creation with create_datasource * fix: lint Co-authored-by: Ivan Cheung <[email protected]> * feat: Add AutoML Image Training Job class (#152) * Add AutoMLImageTrainingJob, tests, constants * Address reviewer comments * feat: Add custom container support (#164) * chore: merge main into dev (#162) * fix: suppress no project id warning (#160) * fix: suppress no project id warning * fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials * fix: move default credentials config to credentials property * fix: add property setter for credentials to avoid everytime reset * fix: Fixed wrong key value for multilabel (#168) Co-authored-by: Ivan Cheung <[email protected]> * feat: Add delete methods, add list_models and undeploy_all for Endpoint class (#165) * Endpoint list_models, delete, undeploy_all WIP * Finish delete + undeploy methods, tests * Add global pool teardowns for test timeout issue * Address reviewer comments, add async support * fix: Fixed bug causing training failure for object detection (#171) Co-authored-by: Ivan Cheung <[email protected]> * fix: Support intermediary BQ Table for Custom Training (#166) * chore: add AutoMLImageTrainingJob to aiplatform namespace (#173) * fix: Unblock build (#174) * fix: default credentials config related test failures (#167) * fix: suppress no project id warning * fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials * fix: move default credentials config to credentials property * fix: add property setter for credentials to avoid everytime reset * fix: tests for set credentials to default when default not provided * fix: change credentials with initializer default when not provided in AiPlatformResourceNoun * fix: use credential mock in tests * fix: lint Co-authored-by: sasha-gitg <[email protected]> * Fix: pass bq_destination to input data config when using training script (#181) * fix: pass bigquery destination * fix: add tests and formatting Co-authored-by: Ivan Cheung <[email protected]> Co-authored-by: Ivan Cheung <[email protected]> Co-authored-by: Morgan Du <[email protected]> Co-authored-by: Vinny Senthil <[email protected]>
Fixes: