You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How should we design the user facing part of an authorization layer for the platform?
15
15
Do we create Custom Resources for some of this, or are ConfigMaps sufficient?
16
16
How many, how are the rules split across resources?
17
+
Where do users want to "hook in"?
18
+
How are policies deployed?
17
19
18
20
== Context
19
21
20
22
What is the current state of authorization in the SDP, what do users want to define and which authorization models are widespread already?
21
23
22
-
=== Terminology
23
-
24
-
Resource:: A resource in the authorization context is commonly something that can be accessed, read, edited etc., like a DAG in Airflow, a Table in Trino or a file in a file system. Resources can also be grouped, like a folder in a file system containing multiple files. A resource is specific, so it does not refer to Trino tables in general, but to a specific Foo table (for example).
25
-
Action:: An action is defined in context of a resource. Examples are "Viewing", "Editing", "Deleting", "Creating".
26
-
Permission:: A permission is the combination of an action and a resource. Like "view table Foo". A permission can also be more general, like "view all tables" (i.e. no specific resource is specified, just a class/type of resource).
27
-
Policy:: A policy is a generic term that does not only exist in authorization. It is a rule, like "The cluster should always have 10% free memory left" or "Only the HR team can access the employee database".
28
-
RBAC:: Role-based access control.
29
-
Role:: A role in RBAC generally means a collection of permissions. In RBAC, permissions are assigned to roles. For example, an _admin_ role might have the permission to view and edit all data. A _marketting-employee_ role grants viewing access to a specific set of tables.
30
-
ReBAC:: Relation-based access control.
31
-
Relation:: A relation is pretty generic, and refers to relations between object and and other objects (or resources), between resources and users or between users and other users or user groups. Examples: "Alice is a _reader_ of a table." "Bob is a _member_ of the data science team." "The `pictures` folder is the _parent_ of the `cat.jpg` file."
32
-
Group:: A group is typically a collection of users. Groups can also be organized hierarchically. Groups can sometimes be used to attach roles to, so users can simply be grouped together and their permissions be managed as a whole.
33
-
34
24
=== Current state of authorization and policy in the SDP
35
25
36
26
Currently the Stackable Data Platform supports authorization policies through OPA.
@@ -46,23 +36,6 @@ The products themselves also have access control models:
46
36
* Kafka: RBAC, group based with LDAP, ACLs
47
37
* Airflow: Uses roles to group permissions, and then assign roles to users. Roles can also be assigned to LDAP groups.
48
38
49
-
50
-
=== Authorization settings that users might want to model
51
-
52
-
Some use case examples:
53
-
54
-
* rules for individuals: Alice needs one-of read access to a Trino Table
55
-
* group based access control: Bob joins the company in the data science team and should get access to all the resources he needs to stark working
56
-
* resource grouping and ad-hoc groups: A new data analysis task force is formed that needs access to specific resources. Resources should be grouped and then all task force members need access.
57
-
* group hierarchies: there might be multiple data science teams that share access to some common resources, but also have specific resources that are only relevant to each team.
58
-
* Class based permissions: Andy needs to be able to read _all_ Trino tables, and not just a pre-defined selection of tables.
59
-
60
-
A common complaint seems to be that in RBAC systems, roles end up getting copy pasted.
61
-
A role might have many permissions attached to it, so if you want to modify a particular permission for just one user, you might end up copy-pasting the role.
62
-
63
-
Also, users should be able to treat resources in general the same way across all supported products.
64
-
I.e. there should be an abstraction over resources such as Trino tables, Superset dashboards and Kafka topics.
65
-
66
39
=== Different authorization models: RBAC, ABAC, ReBAC and more
67
40
68
41
Out-of-the-box, OPA uses RegoRules to define policies.
@@ -86,6 +59,34 @@ Learn more:
86
59
* https://www.permit.io/blog/oparebac
87
60
* https://www.permit.io/blog/policy-engines
88
61
62
+
== Requirements
63
+
64
+
The overall design should make it easy for the majority of users to define rules, without needing to write RegoRules.
65
+
This should be done with CRDs that can deployed, and it works out of the Box.
66
+
67
+
For the remaining users it should be possible to hook into various places of the system to write their own more specific rules.
68
+
69
+
* 80% of users can use the CRDs that allow coarse access control in a unified way across the platform, possibly hiding some product specific things.
70
+
* 10% of users can drop down one layer into specifying custom JSON data for the Stackable provided Rego rules,
71
+
allowing a little bit more detailed access to product specific access control rules such as column masking in Trino.
72
+
* 10% of users will want to write completely custom Rego rules, which is currently already possible and will still be supported.
73
+
74
+
=== Authorization settings that users might want to model
75
+
76
+
Some use case examples:
77
+
78
+
* rules for individuals: Alice needs one-of read access to a Trino Table
79
+
* group based access control: Bob joins the company in the data science team and should get access to all the resources he needs to stark working
80
+
* resource grouping and ad-hoc groups: A new data analysis task force is formed that needs access to specific resources. Resources should be grouped and then all task force members need access.
81
+
* group hierarchies: there might be multiple data science teams that share access to some common resources, but also have specific resources that are only relevant to each team.
82
+
* Class based permissions: Andy needs to be able to read _all_ Trino tables, and not just a pre-defined selection of tables.
83
+
84
+
A common complaint seems to be that in RBAC systems, roles end up getting copy pasted.
85
+
A role might have many permissions attached to it, so if you want to modify a particular permission for just one user, you might end up copy-pasting the role.
86
+
87
+
Also, users should be able to treat resources in general the same way across all supported products.
88
+
I.e. there should be an abstraction over resources such as Trino tables, Superset dashboards and Kafka topics.
89
+
89
90
== Decision Drivers
90
91
91
92
* The design should be flexible to allow to easily represent various organizational structures.
@@ -236,3 +237,17 @@ TSA? OCM?
236
237
237
238
"TSA ist OPA, unser Authr. ist OPA, aber bauen wir oder TSA dinge drumherum, die die Integration schwer machen?"
238
239
240
+
== Appendix
241
+
242
+
=== Terminology
243
+
244
+
Resource:: A resource in the authorization context is commonly something that can be accessed, read, edited etc., like a DAG in Airflow, a Table in Trino or a file in a file system. Resources can also be grouped, like a folder in a file system containing multiple files. A resource is specific, so it does not refer to Trino tables in general, but to a specific Foo table (for example).
245
+
Action:: An action is defined in context of a resource. Examples are "Viewing", "Editing", "Deleting", "Creating".
246
+
Permission:: A permission is the combination of an action and a resource. Like "view table Foo". A permission can also be more general, like "view all tables" (i.e. no specific resource is specified, just a class/type of resource).
247
+
Policy:: A policy is a generic term that does not only exist in authorization. It is a rule, like "The cluster should always have 10% free memory left" or "Only the HR team can access the employee database".
248
+
RBAC:: Role-based access control.
249
+
Role:: A role in RBAC generally means a collection of permissions. In RBAC, permissions are assigned to roles. For example, an _admin_ role might have the permission to view and edit all data. A _marketting-employee_ role grants viewing access to a specific set of tables.
250
+
ReBAC:: Relation-based access control.
251
+
ABAC:: Attribute-based access control.
252
+
Relation:: A relation is pretty generic, and refers to relations between object and and other objects (or resources), between resources and users or between users and other users or user groups. Examples: "Alice is a _reader_ of a table." "Bob is a _member_ of the data science team." "The `pictures` folder is the _parent_ of the `cat.jpg` file."
253
+
Group:: A group is typically a collection of users. Groups can also be organized hierarchically. Groups can sometimes be used to attach roles to, so users can simply be grouped together and their permissions be managed as a whole.
0 commit comments