-
Notifications
You must be signed in to change notification settings - Fork 6
SQL Functions for Fields #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- An example query for this use case is: `SELECT o FROM Investigation o WHERE UPPER(o.title) like '%toMAto%'`
- This is done by checking if a function is present before removing it
|
Sorry, @MRichards99, I didn't found the time yet to look into that and, to be honest, I can't make promises when I will get around to do it. |
|
Not a problem, I've been on annual leave last week so I couldn't have made any progress anyway. Let me know when you do find some time :) |
RKrahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few issues with your implementation:
First of all, I'd like to see one or more tests that demonstrate that what you want to achieve actually works. tests/test_06_query.py would be the appropriate place to add that.
Your implementation modifies the checks that are in place for attributes to essentially allow an SQL function to be used everywhere where an attribute or a related object appears, even if this doesn't make sense for the particular case. Properly implemented are the SQL functions only in the case of conditions, in other cases, it results in invalid queries:
# This is what you intended:
>>> query = Query(client, "Investigation", conditions={ "UPPER(title)": "like '%PolICE%'" })
>>> str(query)
"SELECT o FROM Investigation o WHERE UPPER(o.title) like '%PolICE%'"
# That was probably not intended:
>>> query = Query(client, "Investigation", includes=["UPPER(type)"])
>>> str(query)
'SELECT o FROM Investigation o INCLUDE o.UPPER(type)'
>>> query = Query(client, "Investigation", order=["UPPER(name)"])
>>> str(query)
'SELECT o FROM Investigation o ORDER BY o.UPPER(name)'
>>> query = Query(client, "Investigation", attributes=["UPPER(name)"])
>>> str(query)
'SELECT o.UPPER(name) FROM Investigation o'
>>> query = Query(client, "Investigation", join_specs={"UPPER(datasets)": "LEFT JOIN"}, order=["datasets.name"])
/net/home/jsi/bin/icatsh:1: QueryOneToManyOrderWarning: ordering on a one to many relation datasets may surprisingly affect the search result.
#! /usr/bin/python3 -i
>>> str(query)
'SELECT o FROM Investigation o JOIN o.datasets AS s1 ORDER BY s1.name'The current python-icat release throws an error like ValueError: Unknown attribute name 'UPPER(name)' in all of these cases. I'd suggest, it's better to raise a clear error rather than to silently accept it in cases where it is not supported.
I'd suggest therefore that the implementation should deal with that in higher level methods only where it makes sense rather than in the low level helpers. That would require a decision in which cases SQL functions should be allowed first:
- in
includes: this would certainly not make sense. - in
join_specs: certainly not. - in
attributes: maybe, but that would interfere withaggregatein a strange manner. Not sure if it is needed here anyway. - in
order: maybe, could make sense. - in
conditions: this is where you need it.
It would then require a review of the internal data structures to properly represent this.
Finally, we also would need documentation.
| pattr = "" | ||
| if attrname.endswith(')'): | ||
| attrname = (attrname.split("("))[1].split(")")[0] | ||
| for attr in attrname.split('.'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the proper place to deal with that. _attrpath() is a low level helper with a well defined scope: to check an attribute name, iterating over the components of the dotted path to related objects as needed. It should not deal with higher level issues such as whether that attribute is used inside an SQL function. It is called from many places, not in all of them an SQL function would make sense.
Also, I'm not sure about the way how the attribute name and SQL function is parsed here.
| if i < 0: | ||
| continue | ||
| if obj.endswith(')'): | ||
| obj = (obj.split("("))[1].split(")")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above: _makesubst() should rather not need to care about SQL functions.
|
@MRichards99, one additional comment: the review comment above doesn't mean that you have to do all that alone. I may also have a look on it. However, I will be on annual leave starting on Wednesday and will not be able to do that before September. Another question: you made the PR against the |
|
I'll have a look into this further over the next couple of weeks. Having read your comments (which I agree with, the current stage of the changes were more of a 'rough demonstration' than code that's merge-able), I'll try and add this functionality in a higher-level, perhaps in its own function that can be called in I branched off the |
|
@RKrahl which branch (master or develop) would you like to me branch off to work against? I have no reason for this to only be present in version 1.0, this feature being in 0.20.0 would suffice. |
|
@MRichards99, you don't need to change anything in the branches. My comment above was meant to be a question on how urgent that change is. The If you can't wait for 1.0, but need the SQL functions stuff more urgent than that, we can still backport it to master and make a quick intermediate release 0.20. |
|
@RKrahl @MRichards99 Ideally we would like a quick release in 0.20 if possible, so that we can incorporate this fix in datagatway soon |
In this case, I might make a different branch that branches off master just to make the process of releasing 0.20.0 a bit easier (rather than merging this branch into master as I noticed there were a few differences). It might help me to start from scratch too |
|
I'm converting this to a draft PR (and unlinking #86) as I don't intend for this to be merged but it might be useful for all of us to view back at this PR to look at the discussion. |
As per #86, I've made some changes to allow SQL functions to be added to fields. I came across an issue where my first commit broke queries with conditions using related fields, but I've since fixed that. I've ran these changes on DataGateway API's tests (which have use cases of related fields, 'ordinary' queries, included entities etc.) and these all pass.
You might feel these changes require a refactor or could be put in a more appropriate place within Python ICAT and unit tests and documentation from Python ICAT's side. Hopefully this is a good start for this to happen so you don't need to implement our request completely from scratch. I've created a PR so it's easier for yourself (and STFC colleagues) to see the status of this.
For awareness, I'm on leave next week (week commencing 2nd August) so I won't be able to respond/reply to comments.