Skip to content

Conversation

@jimfulton
Copy link
Contributor

This saves ~12% in execution time and unecessary BQ churn.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #716 🦕

This mostly just stops creating extra datasets. Only 3 tests used tests used the dataset created in setUp. Those 3 tests now share the dataset created using the dataset_id fixture. This saves ~12% off test-execution time.

More could be saved by converting most of the remaining tests in test_client to pytest and using the dataset_id fixture. That should be done in follow-on PRs.

Jim Fulton added 2 commits July 21, 2021 13:56
This saves ~12% in execution time and unecessary BQ churn.
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 21, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 21, 2021
@tswast
Copy link
Contributor

tswast commented Jul 21, 2021

I see that we already have a session-scope dataset_id fixture in

def dataset_id(bigquery_client):

Maybe there's a way we can use that with our mix of unittest and pytest style tests?

@tswast
Copy link
Contributor

tswast commented Jul 21, 2021

Maybe we could even update that conftest.py file to use the new "prefixer" module?

import test_utils.prefixer

from google.cloud import bigquery
from . import helpers


prefixer = test_utils.prefixer.Prefixer("python-bigquery", "tests/system")


@pytest.fixture(scope="session", autouse=True)
def cleanup_datasets(bigquery_client: bigquery.Client):
    for dataset in bigquery_client.list_datasets():
        if prefixer.should_cleanup(dataset.dataset_id):
            bigquery_client.delete_dataset(
                dataset, delete_contents=True, not_found_ok=True
            )

...

@pytest.fixture(scope="session")
def dataset_id(bigquery_client: bigquery.Client, project_id: str):
    dataset_id = prefixer.create_prefix()
    full_dataset_id = f"{project_id}.{dataset_id}"
    dataset = bigquery.Dataset(full_dataset_id)
    bigquery_client.create_dataset(dataset)
    yield dataset_id
    bigquery_client.delete_dataset(dataset, delete_contents=True, not_found_ok=True)

@jimfulton
Copy link
Contributor Author

I see that we already have a session-scope dataset_id fixture in

def dataset_id(bigquery_client):

I'm using that.

Maybe there's a way we can use that with our mix of unittest and pytest style tests?

You can't use non-auto-use fixtures in unittest tests. I tried. That's why I moved those three tests to pytest tests.

@jimfulton
Copy link
Contributor Author

Maybe we could even update that conftest.py file to use the new "prefixer" module?

Will do.

@jimfulton
Copy link
Contributor Author

Maybe we could even update that conftest.py file to use the new "prefixer" module?

Will do.

done

@jimfulton jimfulton marked this pull request as ready for review July 22, 2021 16:39
@jimfulton jimfulton requested a review from a team July 22, 2021 16:39
@jimfulton jimfulton requested a review from a team as a code owner July 22, 2021 16:39
@jimfulton jimfulton requested review from shollyman and removed request for a team July 22, 2021 16:39
rows = sorted(row.values() for row in rows_iter)
assert rows == [(1, "one"), (2, "two")]

def temp_dataset(self, dataset_id, location=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses the shared dataset_id fixture, but wants to create the dataset itself -- should it be updated to use a different one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Are you referring to test_table_snapshots? It was using the dataset created in setUp. I don't see it creating a dataset. Am I blind? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was referring to the temp_dataset helper (my eye read that as test_dataset). It is used by test_create_dataset, test_update_dataset, etc. Because it isn't a test, it isn't using the dataset_id fixture, but an actual passed-in dataset_id parameter.

I don't know whether all those tests actually need to be using separate datasets, but could imagine that at least some could share.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of the other tests could and should be refactored to use the shared dataset. In the interest of not making this PR too complex, I opted to save updating the other tests for a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does the minimum to get rid of the mostly-unused dataset creation in setUp.

@shollyman shollyman requested review from tswast and removed request for shollyman July 22, 2021 18:56
@jimfulton jimfulton merged commit eef6c8e into master Jul 25, 2021
@jimfulton jimfulton deleted the riversnake-system-tests-fewer-datasets branch July 25, 2021 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: System tests create and destroy way too many datasets

4 participants