-
Notifications
You must be signed in to change notification settings - Fork 58
feat: raise NoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
or filters
set
#631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tswast
merged 39 commits into
main
from
b335727141-clustered-or-partitioned-default-index-error
May 2, 2024
Merged
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
b303999
docs: document index as a best practice
tswast 0ddd86b
docs: set `index_cols` in `read_gbq` as a best practice
tswast 994a8f1
feat: support primary key(s) in `read_gbq` by using as the `index_col…
tswast 5fcc5a0
revert WIP commit
tswast 6b6a5ab
Merge branch 'main' into b335727141-primary_key
tswast 8c4e31c
address type error in tests
tswast dd940bd
Merge branch 'b335727141-primary_key' into b335727141-clustered-or-pa…
tswast b96cba3
document behaviors
tswast fb3b508
Merge branch 'b335727141-docs' into b335727141-clustered-or-partition…
tswast 477a516
update docs to reflect new default index behavior
tswast 2c5a0dd
add DefaultIndexKind to allowed `index_col` values
tswast d485be6
Merge remote-tracking branch 'origin/main' into b335727141-clustered-…
tswast d816db3
refactor: cache table metadata alongside snapshot time
tswast d3f0891
Merge branch 'b335727141-snapshot-save-metadata' into b335727141-clus…
tswast 241dc60
add unit tests
tswast 613e660
parametrize tables with clustered and partitioned
tswast 2c782ca
Merge remote-tracking branch 'origin/main' into b335727141-clustered-…
tswast f437dcf
refactor: split `read_gbq_table` implementation into functions and mo…
tswast 0090dc0
refactor progress
tswast 850db7a
add index_cols function
tswast ab98d4a
maybe ready for review
tswast 5b665dd
Merge remote-tracking branch 'origin/main' into b335727141-refactor-r…
tswast 0577131
Update bigframes/session/__init__.py
tswast f3f6982
Merge branch 'main' into b335727141-refactor-read_gbq_table
tswast 453eece
Merge branch 'b335727141-refactor-read_gbq_table' into b335727141-clu…
tswast 175a23c
Merge remote-tracking branch 'origin/main' into b335727141-clustered-…
tswast 204b2db
remove some todos
tswast adaf664
add error raising plus todos
tswast e8bdded
Merge remote-tracking branch 'origin/main' into b335727141-clustered-…
tswast d028bc5
add TODO for ROW_NUMBER() in the query we generate
tswast 658f61d
remove filters unit test for now
tswast f1b3f88
docstring fixes
tswast 6b0e63c
Merge branch 'main' into b335727141-clustered-or-partitioned-default-…
tswast 40fab82
Merge remote-tracking branch 'origin/main' into b335727141-clustered-…
tswast 9f3e149
feat: support `index_col=False` in `read_csv` and `engine="bigquery"`
tswast 722abbb
Merge remote-tracking branch 'origin/b335727141-clustered-or-partitio…
tswast e7c4d93
revert typo
tswast d136bc0
attempt 2
tswast 586cca2
Merge remote-tracking branch 'origin/main' into b335727141-clustered-…
tswast File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Public enums used across BigQuery DataFrames.""" | ||
|
||
# NOTE: This module should not depend on any others in the package. | ||
|
||
|
||
import enum | ||
|
||
|
||
class DefaultIndexKind(enum.Enum): | ||
"""Sentinel values used to override default indexing behavior.""" | ||
|
||
#: Use consecutive integers as the index. This is ``0``, ``1``, ``2``, ..., | ||
#: ``n - 3``, ``n - 2``, ``n - 1``, where ``n`` is the number of items in | ||
#: the index. | ||
SEQUENTIAL_INT64 = enum.auto() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is enums an intuitive module, or would a domain-related term be better, eg
indexing.IndexType
or directly putting the enum in the main module,bigframes.pandas.IndexType
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find some guidance on this, but Python community doesn't seem particularly prescriptive about module names.
PEP-8 has this to say:
https://peps.python.org/pep-0008/#package-and-module-names
Google Python style guide has a bit more to say:
https://google.github.io/styleguide/pyguide.html#3162-naming-conventions
I tried a few of these options out locally (bigframes.indexes.DefaultIndexKind and bigframes.pandas.DefaultIndexKind), but it feels strange to have something not really mimicking pandas in the pandas sub-package and bigframes.indexes.DefaultIndexKind would imply that we should move the Index and MultiIndex classes there, which is kinda the opposite of what we want to do.
The other option we could try is
bigframes.pandas.core.indexes
, but in pandas "core" is how they signify that an API is private an not to be relied on.IMO, determining if classes are "related" by type for the basic types (e.g. exceptions, enums, ...) will be less effort for us long-term than having to figure out which public package to place these things if it doesn't fit in an existing API.