Skip to content

Commit 6d9c40f

Browse files
authored
Add typed filtering (#3385)
* Add typed filtering PBENCH-1124 Support type-cast filter expressions in `GET /datasets`. The primary objective is to support a paginated "Expiring Soon" view in the dashboard, requiring the ability to look for datasets expiring before a fixed timestamp. Previously, `GET /datasets?filter` worked by casting all SQL data to "string" and then comparing against the raw extracted string from the query parameter. Now it's possible to identify a type as well as additional comparison operators. For example, `GET /datasets?filter=server.deletion:<2023-05-01:date` will select all datasets with expiration timestamps earlier than 2023-05-1. To target this specific capability, the functional tests now override the default `server.deletion` on some uploads and verify that those datasets are returned by the filtered query.
1 parent d1977e2 commit 6d9c40f

File tree

6 files changed

+566
-100
lines changed

6 files changed

+566
-100
lines changed

docs/API/V1/list.md

Lines changed: 60 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -45,34 +45,66 @@ specified date.
4545

4646
`filter` metadata filtering \
4747
Select datasets matching the metadata expressions specified via `filter`
48-
query parameters. Each expression is the name of a metadata key (for example,
49-
`dataset.name`), followed by a colon (`:`) and the comparison string. The
50-
comparison string may be prefixed with a tilde (`~`) to make it a partial
51-
("contains") comparison instead of an exact match. For example,
52-
`dataset.name:foo` looks for datasets with the name "foo" exactly, whereas
53-
`dataset.name:~foo` looks for datasets with a name containing the substring
54-
"foo".
55-
56-
These may be combined across multiple `filter` query parameters or as
57-
comma-separated lists in a single query parameter. Multiple filter expressions
58-
form an `AND` expression, however consecutive filter expressions can be joined
59-
in an `OR` expression by using the circumflex (`^`) character prior to each.
60-
(The first expression with `^` begins an `OR` list while the first subsequent
61-
expression outout `^` ends the `OR` list and is combined with an `AND`.)
48+
query parameters. Each expression has the format `[chain]key:[op]value[:type]`:
49+
50+
* `chain` Prefix an expression with `^` (circumflex) to allow combining a set
51+
of expressions with `OR` rather than the default `AND`.
52+
* `key` The name of a metadata key (for example, `dataset.name`)
53+
54+
* `op` An operator to specify how to compare the key value:
55+
56+
* `=` (Default) Compare for equality
57+
* `~` Compare against a substring
58+
* `>` Greater than
59+
* `<` Less than
60+
* `>=` Greater than or equal to
61+
* `<=` Less than or equal to
62+
* `!=` Not equal
63+
64+
* `value` The value to compare against. This will be interpreted based on the specified type.
65+
* `type` The string value will be cast to this type. Any value can be cast to
66+
type `str`. General metadata keys (`server`, `global`, `user`, and
67+
`dataset.metalog` namespaces) that have values incompatible with the specified
68+
type will be ignored. If you specify an incompatible type for a primary
69+
`dataset` key, an error will be returned as these types are defined by the
70+
Pbench schema so no match would be possible. (For example, `dataset.name:2:int`
71+
or `dataset.access:2023-05-01:date`.)
72+
73+
* `str` (Default) Compare as a string
74+
* `bool` Compare as a boolean
75+
* `int` Compare as an integer
76+
* `date` Compare as a date-time string. ISO-8601 recommended, and UTC is
77+
assumed if no timezone is specified.
78+
79+
For example, `dataset.name:foo` looks for datasets with the name "foo" exactly,
80+
whereas `dataset.name:~foo` looks for datasets with a name containing the
81+
substring "foo".
82+
83+
Multiple expressions may be combined across multiple `filter` query parameters
84+
or as comma-separated lists in a single query parameter. Multiple filter
85+
expressions are combined as an `AND` expression, matching only when all
86+
expressions match. However any consecutive set of expressions starting with `^`
87+
are collected into an "`OR` list" that will be `AND`-ed with the surrounding
88+
terms.
6289

6390
For example,
6491
- `filter=dataset.name:a,server.origin:EC2` returns datasets with a name of
6592
"a" and an origin of "EC2".
66-
- `filter=dataset.name:a,^server.origin:EC2,^dataset.metalog.pbench.script:fio`
67-
returns datasets with a name of "a" and *either* an origin of "EC2" or generated
68-
from the "pbench-fio" script.
69-
70-
_NOTE_: `filter` expression values, like the `true` in
71-
`GET /api/v1/datasets?filter=server.archiveonly:true`, are always interpreted
72-
as strings, so be careful about the string representation of the value (in this
73-
case, a boolean, which is represented in JSON as `true` or `false`). Beware
74-
especially when attempting to match a JSON document (such as
75-
`dataset.metalog.pbench`).
93+
- `filter=dataset.name:~andy,^server.origin:EC2,^server.origin:RIYA,
94+
dataset.access:public`
95+
returns only "public" datasets with a name containing the string "andy" which also
96+
have an origin of either "EC2" or "RIYA". As a SQL query, we might write it
97+
as `dataset.name like "%andy%" and (server.origin = 'EC2' or
98+
server.origin = 'RIYA') and dataset.access = 'public'`.
99+
100+
_NOTE_: `filter` expression term values, like the `true` in
101+
`GET /api/v1/datasets?filter=server.archiveonly:true`, are by default
102+
interpreted as strings, so be careful about the string representation of the
103+
value. In this case, `server.archiveonly` is a boolean, which will be matched
104+
as a string value "true" or "false". You can instead specify the expression
105+
term as `server.archiveonly:t:bool` which will treat the specified match value
106+
as a boolean (`t[rue]` or `y[es]` for true, `f[alse]` or `n[o]` for false) and
107+
match against the boolean metadata value.
76108

77109
`keysummary` boolean \
78110
Instead of displaying a list of selected datasets and metadata, use the set of
@@ -105,6 +137,10 @@ Allows filtering for datasets owned by the authenticated client (if the value
105137
is omitted, e.g., `?mine` or `?mine=true`) or owned by *other* users (e.g.,
106138
`?mine=false`).
107139

140+
`name` string \
141+
Select only datasets with a specified substring in their name. The filter
142+
`?name=fio` is semantically equivalent to `?filter=dataset.name:~fio`.
143+
108144
`offset` integer \
109145
"Paginate" the selected datasets by skipping the first `offset` datasets that
110146
would have been selected by the other query terms. This can be used with

jenkins/run-server-func-tests

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,13 @@ elif [[ -n "${1}" ]]; then
2121
exit 2
2222
fi
2323

24+
function dump_journal {
25+
printf -- "+++ journalctl dump +++\n"
26+
# Try to capture the functional test container's logs.
27+
podman exec ${PB_SERVER_CONTAINER_NAME} journalctl
28+
printf -- "\n--- journalctl dump ---\n\n"
29+
}
30+
2431
function cleanup {
2532
if [[ -n "${cleanup_flag}" ]]; then
2633
# Remove the Pbench Server container and the dependencies pod which we
@@ -59,6 +66,7 @@ until curl -s -o /dev/null ${SERVER_API_ENDPOINTS}; do
5966
if [[ $(date +"%s") -ge ${end_in_epoch_secs} ]]; then
6067
echo "Timed out waiting for the reverse proxy to show up!" >&2
6168
exit_status=1
69+
dump_journal
6270
exit ${exit_status}
6371
fi
6472
sleep 1
@@ -84,11 +92,8 @@ else
8492
fi
8593

8694
if [[ ${exit_status} -ne 0 ]]; then
87-
printf -- "\nFunctional tests exited with code %s\n" ${exit_status}
88-
printf -- "+++ journalctl dump +++\n"
89-
# Try to capture the functional test container's logs.
90-
podman exec ${PB_SERVER_CONTAINER_NAME} journalctl
91-
printf -- "\n--- journalctl dump ---\n\n"
95+
dump_journal
96+
printf -- "\nFunctional tests exited with code %s\n" ${exit_status} >&2
9297
fi
9398

9499
if [[ -z "${cleanup_flag}" ]]; then

lib/pbench/client/__init__.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,3 +489,15 @@ def update(
489489
uri_params={"dataset": dataset_id},
490490
params=params,
491491
).json()
492+
493+
def get_settings(self, key: str = "") -> JSONOBJECT:
494+
"""Return requested server setting.
495+
496+
Args:
497+
key: A server settings key; if omitted, return all settings
498+
499+
Returns:
500+
A JSON document containing the requested key values
501+
"""
502+
params = {"key": key}
503+
return self.get(api=API.SERVER_SETTINGS, uri_params=params).json()

0 commit comments

Comments
 (0)