Skip to content

project_id as optional #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 25, 2018
Merged

project_id as optional #127

merged 25 commits into from
Apr 25, 2018

Conversation

max-sixty
Copy link
Contributor

The underlying Google libraries can read default projects, so there's no need for us to force users to supply one

@codecov-io
Copy link

codecov-io commented Feb 20, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@187a57f). Click here to learn what that means.
The diff coverage is 52.94%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #127   +/-   ##
=========================================
  Coverage          ?   34.97%           
=========================================
  Files             ?        8           
  Lines             ?     1521           
  Branches          ?        0           
=========================================
  Hits              ?      532           
  Misses            ?      989           
  Partials          ?        0
Impacted Files Coverage Δ
pandas_gbq/gbq.py 30.48% <50%> (ø)
pandas_gbq/tests/test_gbq.py 28.44% <53.84%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 187a57f...a3e6d2f. Read the comment docs.

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! For sanity, please add a system test similar to https://github.com/pydata/pandas-gbq/blob/11559583d1451fc64108c12f332d1c93dec4c80c/pandas_gbq/tests/test_gbq.py#L223-L229 which runs when default credentials are available and verifies that a query can be run without setting an explicit project ID.

@@ -761,7 +761,7 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None,
----------
query : str
SQL-Like Query to return data values
project_id : str
project_id : str (optional)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project_id is still required when using user authentication, so we should clarify that it is only optional if using default credentials.

Or maybe also optional with a service account? I think we might have to parse as JSON and grab the project ID from the service account and use https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.service_account.html#google.oauth2.service_account.Credentials.from_service_account_info for that to work.

Copy link
Contributor Author

@max-sixty max-sixty Mar 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me on the authentication questions

The project_id is still required when using user authentication

Is this right? If I'm running localling with user auth, google.cloud.bigquery can pick up my default project fine:

In [20]: import google.cloud.bigquery

In [21]: c=google.cloud.bigquery.Client()

In [22]: c
Out[22]: <google.cloud.bigquery.client.Client at 0x1138e44a8>

In [23]: c.project
Out[23]: 'sixty-capital'

Auth:

gcloud auth list                                             
       Credentialed Accounts
ACTIVE  ACCOUNT
*       []@sixtycapital.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google.cloud.bigquery.Client() does user auth using the Cloud SDK. I'm referring to when the Cloud SDK is not installed, where pandas-gbq does it's user auth flow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific the credential creation at

https://github.com/pydata/pandas-gbq/blob/16747609dc58f0b6df2c4e896db5a3f365c4677c/pandas_gbq/gbq.py#L272-L283

does not include a default project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear with me once more:
Why don't we add it? It's in the JSON file:

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 3LO there are two projects we are concerned with:

  • The client secrets project. In this case, it is the project for Pandas GBQ. It has the BigQuery API enabled, but users shouldn't have permissions to run queries against this project. If we did use it, it would bill the Pandas team, not the user.
  • The user's project. We know this in the case of a service account or when using the gcloud command, but we don't know it if using client secrets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see now. Because we do the app_flow, those creds don't come with a project.
Though not sure what 3LO is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3LO - 3-legged OAuth 2. (The protocol that app_flow follows.)

def project(request):
if request.param == 'env':
return _get_project_id()
elif request.param == 'none':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this didn't get used, but could param over it in the future)

@max-sixty
Copy link
Contributor Author

Updated!

@@ -7,6 +7,7 @@
from distutils.version import StrictVersion
from time import sleep

import google
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google is a namespace package for many different packages. Could you import google.auth instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't tend to import our direct dependencies here, since we want the code for imports checking to run before crashing on a failed import.

@@ -170,6 +170,9 @@ def __init__(self, project_id, reauth=False,
from google.api_core.exceptions import GoogleAPIError
from google.api_core.exceptions import ClientError
self.http_error = (ClientError, GoogleAPIError)
if not project_id:
from google.auth import default
_, project_id = default()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this could cause us to go through the whole "application default credentials" flow twice, which could be quite slow. (It checks for env var, checks for GCE, checks for App Engine, ...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, and it's not cached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not great:


In [4]: %timeit google.auth.default()
573 ms ± 14.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing those methods to return 2-tuples?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tuples would be reasonable. The other (more disruptive, probably) option would be to pass in the project to the credentials methods and have them return a google.cloud.bigquery.Client.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this PR to tuples as we discussed.

@max-sixty
Copy link
Contributor Author

This will close #72

return credentials

# Try to retrieve Application Default Credentials
credentials, default_project = self.get_application_default_credentials()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (81 > 79 characters)

tswast added 3 commits April 25, 2018 11:05
I think BigQuery stopped checking for valid project on queries with no
data access.
@max-sixty
Copy link
Contributor Author

Looks like you're updating @tswast - thanks. I shouldn't have let this hang around for so long - then I hit merge conflicts and slowed down even more

Feel free to discard anything ancillary to the main task here (e.g. specifying the auth type in the test run rather than the hacky function that skip in travis), and I'll add them separately

Thanks!

@tswast
Copy link
Collaborator

tswast commented Apr 25, 2018

Cool. I'll get this to the finish line. Thanks for your work so far, @maxim-lian

Re: auth type tests. Yeah, I'll revert back to the wonky _skip_in_travis() functions for now. I'm thinking of doing a refactor soon that pull the auth functions out of GbqConnector and into their own module. Then I think it'll make more sense to test auth separately from read/to.

@tswast tswast merged commit 7711bb0 into googleapis:master Apr 25, 2018
@max-sixty max-sixty mentioned this pull request Apr 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants