Skip to content

Further cleanup for the auth module #3114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 7, 2023

Conversation

portante
Copy link
Member

@portante portante commented Dec 10, 2022

Supersedes PR #3094, while including its commit with authorship.

While PR #3094 attempted to cleanup the auth module a bit, this PR dives in a bit further.

Suggested commit message to use:

Refactor the authorization sub-system

The refactoring approach taken here covers two broad areas.  The first
being the removal of the `Auth` class replacing it with module
interfaces.  This change removes the need to pass the `Auth` instance
around between modules and objects, where a simple import of the `Auth`
module is sufficient.

The second area of refactoring is simplifying the authorization module
interfaces to hide knowledge of "internal tokens" vs OpenID Connect
tokens.  The refactoring solves a problem facing the code prior to this
change where one needs a working OpenID Connect instance to function,
but the fallback doesn't work to the internal token mechanism.

With these changes, if it is possible to talk to an OpenID Connect
instance, only OpenID Connect tokens will be accepted for authorization.
If that is not possible, only "internal" tokens will be accepted.  This
prevents the two code paths from having to co-exist seamlessly, such
that the unit test environment can explicitly operate in one or the
other.

For now, the unit test environment assumes internal tokens only.  Unit
tests for the OIDC code paths (with 100% coverage) are provided
separately.

Further, a number of other changes are worth noting explicitly:

 * The "authentication" section in the configuration file has been
   renamed to "openid-connect"

   There are four required options in that section required to use
   OpenID Connect for tokens: "server_url", "realm", "client", and
   "secret".  If the section or any of those options are missing then
   "internal" tokens will be expected.

   The `pbench-server-default.cfg` file now contains that section with
   the "server_url" option commented out since we are defaulting to
   "internal" tokens for the time being.

   The Pbench-in-a-Can `pbench-server.cfg` file was updated to use the
   OpenID Connect instance (KeyCloak) by defining the "server_url" and
   providing a "secret".

 * A new "flask-app" section was added to provide the "secret-key"
   option for populating the `Flask` app's `secret_key` property

 * The `Auth` module's `set_oidc_client` was moved to the OpenIDClient
   `create` class factory method to improve OIDC encapsulation

 * The `Auth` module's `get_user_id()` was renamed to
   `get_current_user_id()`

 * The `Auth` module's `verify_auth()` method calls one of two helper
   methods based on whether OpenID Connect or internal tokens are to
   be used

   The presence of the "openid-connect" section with all the options
   configured causes OpenID Connect to be used vs internal tokens

 * Knowledge of OIDC tokens pushed out of the `Auth` module and into the
   `OpenIDClient` class

   We did not make a separate module for internal tokens since the
   intention is to remove that code entire once we can depend on an
   OpenID Connect instance.

 * The `OpenIDClient` was refactored to further its encapsulation and
   to facilitate easier testing via its new `Connection` sister class

 * Finally, the unit tests for the `Auth` module proper were rewritten
   to leverage the new `Connection` class to make the test behaviors

@portante portante added Server Code Infrastructure API Of and relating to application programming interfaces to services and functions labels Dec 10, 2022
@portante portante added this to the v0.72 milestone Dec 10, 2022
@portante portante self-assigned this Dec 10, 2022
dbutenhof
dbutenhof previously approved these changes Dec 12, 2022
@portante portante force-pushed the PBENCH-1028 branch 2 times, most recently from 586d2fe to 92a82af Compare December 13, 2022 03:31
@portante portante force-pushed the PBENCH-1028 branch 2 times, most recently from 0a38abf to b1d84d6 Compare December 22, 2022 05:06
@portante portante marked this pull request as ready for review December 22, 2022 05:06
Copy link
Member Author

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for official review!

@portante portante mentioned this pull request Dec 22, 2022
dbutenhof
dbutenhof previously approved these changes Dec 22, 2022
dbutenhof
dbutenhof previously approved these changes Jan 3, 2023
Copy link
Member Author

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few changes to address additional Google Doc string format usage, or lack there-of, in the users model.

@portante portante force-pushed the PBENCH-1028 branch 2 times, most recently from 602c191 to 031cb69 Compare February 4, 2023 19:21
@portante
Copy link
Member Author

portante commented Feb 4, 2023

This work is now dependent on PR #3211 since I did not want to spend lots of time waiting for CI jobs to complete to find out how the functional tests failed.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I acknowledge that this PR is still in Draft state, but, in the interest of keeping things moving, I went ahead and reviewed it (although, I didn't get to lib/pbench/test/unit/server/auth/test_auth.py). I have a bunch of comments, but they are all small stuff, I think.

@@ -88,6 +88,7 @@ def run_gunicorn(server_config: PbenchServerConfig, logger: Logger) -> int:
workers = str(server_config.get("pbench-server", "workers"))
worker_timeout = str(server_config.get("pbench-server", "worker_timeout"))
crontab_dir = server_config.get("pbench-server", "crontab-dir")
server_config.get("flask-app", "secret-key")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "interesting" -- we're making a "get" call and then ignoring what it returns. I presume that the intention here is to fail early if it was omitted from the configuration, which is good, but this use of get() is slightly odd and perhaps deserves a comment. Also, it seems to impose the requirement that the Pbench Server be configured with a "secret-key" even if the Server is not configured to use an OIDC broker...is that intentional?

If so, then, instead of passing the whole server_config at line 106, consider doing the server_config.get("authentication", "server_url") call here and passing just the URL value to wait_for_oidc_server() there. If appropriate, we can specify a fallback value of None here and test for that at line 105 instead of having it recover from the absence.

Comment on lines +53 to 54
openid-connect: A JSON object containing the OpenID Connect parameters
required for the web client to use OIDC authentication.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth pointing out that openid-connect will be omitted if the OIDC server is not configured.

Comment on lines +59 to +63
Args:
method : The API HTTP method
path : Path for the request.
data : Json data to send with the request in case of the POST
kwargs : Additional keyword args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description for headers is missing.

Comment on lines +68 to +70
final_headers = self.headers.copy()
if headers is not None:
final_headers.update(headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately,

        final_headers = self.headers | (headers if headers else {})

That is, the | operator will return a new dictionary consisting of self.headers overlaid with the entries from headers.

(I liked this better before I had to add the ternary to handle the headers is None case, but I figured I would toss it out there anyway.)

Comment on lines +72 to +79
kwargs = dict(
params=kwargs,
data=data,
headers=final_headers,
verify=self.verify,
)
try:
response = self._connection.request(method, url, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically, are you sure that you want to be overwriting the input parameter, kwargs? Shouldn't we be instantiating a new local variable and using that in the call, instead?

Comment on lines +489 to +490
FIXME - This method appears unused in the rest of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to act on this FIXME prior to merging?

AIUI, the information referenced here is currently provided by the User database table; once that is removed, this function will provide a "bridge" for any existing code which still requires that info. The question is, will there be any such code remaining?

I concur that we should avoid carrying dead code. On the other hand, it's not hard to maintain this code during the transition, and doing so while we have all the OIDC stuff in the front of our heads is probably a good idea (that is, I think it will be cheaper to carry it than to reconstruct it). The issue is, when we turn our attention to other things, what is going prompt us to remove this code if, in fact, it winds up being unused?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we would ever going to call the userinfo endpoint as we get pretty much all the user information when we decode the tokens. I added this when I did some experimentation with Keycloak APIs and I thought this might come handy but I realized token decode is enough to get all the user information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, on that basis, let's rip out this function, then (which will get rid of the FIXME 😉).

Comment on lines +510 to +511
headers = {"Authorization": f"Bearer {token}"} if token else None
return self._connection.get(self._userinfo_endpoint, headers=headers).json()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npalaska, this function is set up with a default value of None for token, and it even has some special-case handling for when the token is omitted...but is there any point to trying to make this call without a token? (Don't we know what Keycloak's response is going to be in that case??)

Copy link
Member

@npalaska npalaska Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the token is None then this one is going to return UNAUTHORIZED. However, I don't think Peter made this change to the function, and I think updating the guts of this function should not be a part of this PR.

I added this functionality long back when I was doing some experiments with OIDC APIs but I don't think we ever going to need to call userinfo endpoint as we pretty much get all the required information about the user when we decode the token so there is no need to call this endpoint at all. I will work on it in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the history of this PR, I really cannot say who made the change, but, looking at the diffs here, someone replaced an if with a ternary...which I would normally support (😉) except that, in this particular case, it seems like immediately returning an error would be better than issuing the request (since we know that that is what will/should be the ultimate result).

Whether we should do that in this PR or not is the next question....

Comment on lines +20 to +21
# Module private constants
_TOKEN_ALG_INT = "HS256"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this definition relate to the definition of OpenIDClient._TOKEN_ALG? Should they have the same value? Should they be referencing the same definition??

If not, then it would be good to have an explanatory comment here (I presume that we're only using this value temporarily during the transition).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current mechanism to encode the auth token is HS256 and I don't think the PR is trying to change the way we currently encode the tokens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. It's just that the respective similarities between _TOKEN_ALG/_TOKEN_ALG_INT and HS256/RS256 seem like they could easily be confused, and, if we're stuck with those names, then it would be good to have come good comments in the code to make sure that readers don't gloss over the differences.

Comment on lines +222 to 229
else InternalUser.create(
# FIXME - `client_id` is the value pulled from the Pbench Server
# "openid-connect" "client" field in the configuration file. This
# needs to be an ID from the OpenID Connect response payload (online
# case) or decoded token (offline case).
client_id=oidc_client.client_id,
token_payload=token_payload,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this is a draft PR, but, in case you've overlooked it, here's another FIXME.

if pbench_client_roles:
payload["resource_access"].update(
{"pbench-client": {"roles": pbench_client_roles}}
)
token_str = jwt.encode(payload, jwt_secret, algorithm="HS256")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a reference to Auth._TOKEN_ALG_INT?

@portante portante marked this pull request as ready for review February 7, 2023 17:09
Copy link
Member Author

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled out of Draft mode now that #3211 and #3225 have merged. No other updates have been made yet from code review feedback.

@portante
Copy link
Member Author

portante commented Feb 7, 2023

CI job failed to due race condition to be solved by PR #3232. Restarted.

@@ -253,19 +253,25 @@ def test_create_w_roles(self):


@pytest.fixture(scope="session")
def rsa_keys() -> Dict[str, str]:
def rsa_keys() -> Dict[str, Union[rsa.RSAPrivateKey, str]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change this, but changing the line highlights it ... now that we're using Python 3.9 we can use dict[...] instead of typing.Dict. I've been trying to gradually fix type hints as I hit them.

@@ -29,7 +29,7 @@ uri = postgresql://pbenchcontainer:pbench@localhost:5432/pbenchcontainer
secret-key = "pbench-in-a-can secret shhh"

[openid-connect]
server_url = http://localhost:8090
#server_url = http://localhost:8090
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While your justification is valid, depending on Nikhil's #3235 this may be short-lived. 😆 Timing may not be everything, but a lot is happening all at once ...

Comment on lines 88 to 95
raise OpenIDClientError(
HTTPStatus.BAD_GATEWAY,
f"Failure to connect to the OpenID client, {self!r}: {exc}",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenIDCClientError (and others) should probably extend APIAbort so they're handled cleanly in _dispatch instead of going to the catch-all.

Also, where you use HTTPStatus.INTERNAL_SERVER_ERROR it'd be better to use APIInternalError so we generate the logger uuid right away rather than, again, going through the Exception catch-all in _dispatch.

The refactoring approach taken here covers two broad areas.  The first
being the removal of the `Auth` class replacing it with module
interfaces.  This change removes the need to pass the `Auth` instance
around between modules and objects, where a simple import of the `Auth`
module is sufficient.

The second area of refactoring is simplifying the authorization module
interfaces to hide knowledge of "internal tokens" vs OpenID Connect
tokens.  The refactoring solves a problem facing the code prior to this
change where one needs a working OpenID Connect instance to function,
but the fallback doesn't work to the internal token mechanism.

With these changes, if it is possible to talk to an OpenID Connect
instance, only OpenID Connect tokens will be accepted for authorization.
If that is not possible, only "internal" tokens will be accepted.  This
prevents the two code paths from having to co-exist seamlessly, such
that the unit test environment can explicitly operate in one or the
other.

For now, the unit test environment assumes internal tokens only.  Unit
tests for the OIDC code paths (with 100% coverage) are provided
separately.

Further, a number of other changes are worth noting explicitly:

 * The "authentication" section in the configuration file has been
   renamed to "openid-connect"

   There are four required options in that section required to use
   OpenID Connect for tokens: "server_url", "realm", "client", and
   "secret".  If the section or any of those options are missing then
   "internal" tokens will be expected.

   The `pbench-server-default.cfg` file now contains that section with
   the "server_url" option commented out since we are defaulting to
   "internal" tokens for the time being.

   The Pbench-in-a-Can `pbench-server.cfg` file was updated to use the
   OpenID Connect instance (KeyCloak) by defining the "server_url" and
   providing a "secret".

 * A new "flask-app" section was added to provide the "secret-key"
   option for populating the `Flask` app's `secret_key` property

 * The `Auth` module's `set_oidc_client` was moved to the OpenIDClient
   `create` class factory method to improve OIDC encapsulation

 * The `Auth` module's `get_user_id()` was renamed to
   `get_current_user_id()`

 * The `Auth` module's `verify_auth()` method calls one of two helper
   methods based on whether OpenID Connect or internal tokens are to
   be used

   The presence of the "openid-connect" section with all the options
   configured causes OpenID Connect to be used vs internal tokens

 * Knowledge of OIDC tokens pushed out of the `Auth` module and into the
   `OpenIDClient` class

   We did not make a separate module for internal tokens since the
   intention is to remove that code entire once we can depend on an
   OpenID Connect instance.

 * The `OpenIDClient` was refactored to further its encapsulation and
   to facilitate easier testing via its new `Connection` sister class

 * Finally, the unit tests for the `Auth` module proper were rewritten
   to leverage the new `Connection` class to make the test behaviors
Until we can support functional tests that use a token from a user
provided by OpenID Connect (i.e. a KeyCloak instance) don't configure
the functional tests to use OpenID Connect.

We also correct `load_keycloak.sh` to use the same values in the Pbench
Server configuration.
@portante
Copy link
Member Author

portante commented Feb 7, 2023

Unless there is some major objection, I'd like to merge this code as is, and follow-up with a separate PR(s) for some of the comments made. I'd like to unblock further in this area.

@portante portante requested review from dbutenhof and webbnh February 7, 2023 21:24
@portante
Copy link
Member Author

portante commented Feb 7, 2023

If you have particular changes that you'd like to see made, feel free to follow-up with your own PRs as well.

Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is some major objection, I'd like to merge this code as is, and follow-up with a separate PR(s) for some of the comments made. I'd like to unblock further in this area.

Well, this has been going on for a long time and I don't see any show-stoppers even though there's a lot of possible cleanup. So, sure.

Copy link
Member

@npalaska npalaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@portante
Copy link
Member Author

portante commented Feb 7, 2023

And if you leave the comments unresolved it will help with future work, thanks.

webbnh

This comment was marked as resolved.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I withdrawal my objection.

@portante portante merged commit 0a04c49 into distributed-system-analysis:main Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Of and relating to application programming interfaces to services and functions Code Infrastructure Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve class variable initialization of OpenIDClient class
5 participants