-
Notifications
You must be signed in to change notification settings - Fork 125
project_id as optional #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #127 +/- ##
=========================================
Coverage ? 34.97%
=========================================
Files ? 8
Lines ? 1521
Branches ? 0
=========================================
Hits ? 532
Misses ? 989
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! For sanity, please add a system test similar to https://github.com/pydata/pandas-gbq/blob/11559583d1451fc64108c12f332d1c93dec4c80c/pandas_gbq/tests/test_gbq.py#L223-L229 which runs when default credentials are available and verifies that a query can be run without setting an explicit project ID.
pandas_gbq/gbq.py
Outdated
@@ -761,7 +761,7 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, | |||
---------- | |||
query : str | |||
SQL-Like Query to return data values | |||
project_id : str | |||
project_id : str (optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project_id
is still required when using user authentication, so we should clarify that it is only optional if using default credentials.
Or maybe also optional with a service account? I think we might have to parse as JSON and grab the project ID from the service account and use https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.service_account.html#google.oauth2.service_account.Credentials.from_service_account_info for that to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with me on the authentication questions
The project_id is still required when using user authentication
Is this right? If I'm running localling with user auth, google.cloud.bigquery
can pick up my default project fine:
In [20]: import google.cloud.bigquery
In [21]: c=google.cloud.bigquery.Client()
In [22]: c
Out[22]: <google.cloud.bigquery.client.Client at 0x1138e44a8>
In [23]: c.project
Out[23]: 'sixty-capital'
Auth:
gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* []@sixtycapital.com
To set the active account, run:
$ gcloud config set account `ACCOUNT`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
google.cloud.bigquery.Client()
does user auth using the Cloud SDK. I'm referring to when the Cloud SDK is not installed, where pandas-gbq does it's user auth flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more specific the credential creation at
does not include a default project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3LO there are two projects we are concerned with:
- The client secrets project. In this case, it is the project for Pandas GBQ. It has the BigQuery API enabled, but users shouldn't have permissions to run queries against this project. If we did use it, it would bill the Pandas team, not the user.
- The user's project. We know this in the case of a service account or when using the
gcloud
command, but we don't know it if using client secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see now. Because we do the app_flow
, those creds don't come with a project.
Though not sure what 3LO is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3LO - 3-legged OAuth 2. (The protocol that app_flow
follows.)
pandas_gbq/tests/test_gbq.py
Outdated
def project(request): | ||
if request.param == 'env': | ||
return _get_project_id() | ||
elif request.param == 'none': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this didn't get used, but could param over it in the future)
Updated! |
pandas_gbq/gbq.py
Outdated
@@ -7,6 +7,7 @@ | |||
from distutils.version import StrictVersion | |||
from time import sleep | |||
|
|||
import google |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
google
is a namespace package for many different packages. Could you import google.auth
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we don't tend to import our direct dependencies here, since we want the code for imports checking to run before crashing on a failed import.
pandas_gbq/gbq.py
Outdated
@@ -170,6 +170,9 @@ def __init__(self, project_id, reauth=False, | |||
from google.api_core.exceptions import GoogleAPIError | |||
from google.api_core.exceptions import ClientError | |||
self.http_error = (ClientError, GoogleAPIError) | |||
if not project_id: | |||
from google.auth import default | |||
_, project_id = default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, this could cause us to go through the whole "application default credentials" flow twice, which could be quite slow. (It checks for env var, checks for GCE, checks for App Engine, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change https://github.com/maxim-lian/pandas-gbq/blob/a7f6c43434bc21e1d2fc7633d3c5cb9c04060a4a/pandas_gbq/gbq.py#L223 to not discard the project instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, and it's not cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not great:
In [4]: %timeit google.auth.default()
573 ms ± 14.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about changing those methods to return 2-tuples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tuples would be reasonable. The other (more disruptive, probably) option would be to pass in the project to the credentials methods and have them return a google.cloud.bigquery.Client
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this PR to tuples as we discussed.
This will close #72 |
pandas_gbq/gbq.py
Outdated
return credentials | ||
|
||
# Try to retrieve Application Default Credentials | ||
credentials, default_project = self.get_application_default_credentials() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (81 > 79 characters)
I think BigQuery stopped checking for valid project on queries with no data access.
Looks like you're updating @tswast - thanks. I shouldn't have let this hang around for so long - then I hit merge conflicts and slowed down even more Feel free to discard anything ancillary to the main task here (e.g. specifying the auth type in the test run rather than the hacky function that skip in travis), and I'll add them separately Thanks! |
Cool. I'll get this to the finish line. Thanks for your work so far, @maxim-lian Re: auth type tests. Yeah, I'll revert back to the wonky |
The underlying Google libraries can read default projects, so there's no need for us to force users to supply one