Skip to content

Conversation

@MRichards99
Copy link
Contributor

As per #86, I've made some changes to allow SQL functions to be added to fields. I came across an issue where my first commit broke queries with conditions using related fields, but I've since fixed that. I've ran these changes on DataGateway API's tests (which have use cases of related fields, 'ordinary' queries, included entities etc.) and these all pass.

You might feel these changes require a refactor or could be put in a more appropriate place within Python ICAT and unit tests and documentation from Python ICAT's side. Hopefully this is a good start for this to happen so you don't need to implement our request completely from scratch. I've created a PR so it's easier for yourself (and STFC colleagues) to see the status of this.

For awareness, I'm on leave next week (week commencing 2nd August) so I won't be able to respond/reply to comments.

- An example query for this use case is: `SELECT o FROM Investigation o WHERE UPPER(o.title) like '%toMAto%'`
- This is done by checking if a function is present before removing it
@RKrahl
Copy link
Member

RKrahl commented Aug 5, 2021

Sorry, @MRichards99, I didn't found the time yet to look into that and, to be honest, I can't make promises when I will get around to do it.

@MRichards99
Copy link
Contributor Author

Not a problem, I've been on annual leave last week so I couldn't have made any progress anyway. Let me know when you do find some time :)

@RKrahl RKrahl linked an issue Aug 16, 2021 that may be closed by this pull request
Copy link
Member

@RKrahl RKrahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few issues with your implementation:

First of all, I'd like to see one or more tests that demonstrate that what you want to achieve actually works. tests/test_06_query.py would be the appropriate place to add that.

Your implementation modifies the checks that are in place for attributes to essentially allow an SQL function to be used everywhere where an attribute or a related object appears, even if this doesn't make sense for the particular case. Properly implemented are the SQL functions only in the case of conditions, in other cases, it results in invalid queries:

# This is what you intended:
>>> query = Query(client, "Investigation", conditions={ "UPPER(title)": "like '%PolICE%'" })
>>> str(query)
"SELECT o FROM Investigation o WHERE UPPER(o.title) like '%PolICE%'"
# That was probably not intended:
>>> query = Query(client, "Investigation", includes=["UPPER(type)"])
>>> str(query)
'SELECT o FROM Investigation o INCLUDE o.UPPER(type)'
>>> query = Query(client, "Investigation", order=["UPPER(name)"])
>>> str(query)
'SELECT o FROM Investigation o ORDER BY o.UPPER(name)'
>>> query = Query(client, "Investigation", attributes=["UPPER(name)"])
>>> str(query)
'SELECT o.UPPER(name) FROM Investigation o'
>>> query = Query(client, "Investigation", join_specs={"UPPER(datasets)": "LEFT JOIN"}, order=["datasets.name"])
/net/home/jsi/bin/icatsh:1: QueryOneToManyOrderWarning: ordering on a one to many relation datasets may surprisingly affect the search result.
  #! /usr/bin/python3 -i
>>> str(query)
'SELECT o FROM Investigation o JOIN o.datasets AS s1 ORDER BY s1.name'

The current python-icat release throws an error like ValueError: Unknown attribute name 'UPPER(name)' in all of these cases. I'd suggest, it's better to raise a clear error rather than to silently accept it in cases where it is not supported.

I'd suggest therefore that the implementation should deal with that in higher level methods only where it makes sense rather than in the low level helpers. That would require a decision in which cases SQL functions should be allowed first:

  • in includes: this would certainly not make sense.
  • in join_specs: certainly not.
  • in attributes: maybe, but that would interfere with aggregate in a strange manner. Not sure if it is needed here anyway.
  • in order: maybe, could make sense.
  • in conditions: this is where you need it.

It would then require a review of the internal data structures to properly represent this.

Finally, we also would need documentation.

pattr = ""
if attrname.endswith(')'):
attrname = (attrname.split("("))[1].split(")")[0]
for attr in attrname.split('.'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the proper place to deal with that. _attrpath() is a low level helper with a well defined scope: to check an attribute name, iterating over the components of the dotted path to related objects as needed. It should not deal with higher level issues such as whether that attribute is used inside an SQL function. It is called from many places, not in all of them an SQL function would make sense.

Also, I'm not sure about the way how the attribute name and SQL function is parsed here.

if i < 0:
continue
if obj.endswith(')'):
obj = (obj.split("("))[1].split(")")[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above: _makesubst() should rather not need to care about SQL functions.

@RKrahl
Copy link
Member

RKrahl commented Aug 16, 2021

@MRichards99, one additional comment: the review comment above doesn't mean that you have to do all that alone. I may also have a look on it. However, I will be on annual leave starting on Wednesday and will not be able to do that before September.

Another question: you made the PR against the develop branch which would mean to target it for version 1.0. If you need that more urgently, we could also backport the implementation once its done and make a 0.20.0 release for it. On the other hand, the only thing that holds back 1.0 at the moment is icat.server: I wanted to include support for upcoming schema changes in 1.0, ref. #73. Since the discussion on the schema currently makes progress, there is a chance that python-icat 1.0 can also be released soon.

@RKrahl RKrahl added the enhancement New feature or request label Aug 16, 2021
@MRichards99
Copy link
Contributor Author

I'll have a look into this further over the next couple of weeks. Having read your comments (which I agree with, the current stage of the changes were more of a 'rough demonstration' than code that's merge-able), I'll try and add this functionality in a higher-level, perhaps in its own function that can be called in addConditions. I will also think about DataGateway use cases and see if it makes sense elsewhere.

I branched off the develop branch because I think it's the default branch. I'm happy for it to be in 0.20.0 rather than 1.0, which branch do I need to use for that?

@MRichards99
Copy link
Contributor Author

@RKrahl which branch (master or develop) would you like to me branch off to work against? I have no reason for this to only be present in version 1.0, this feature being in 0.20.0 would suffice.

@RKrahl
Copy link
Member

RKrahl commented Oct 4, 2021

@MRichards99, you don't need to change anything in the branches. My comment above was meant to be a question on how urgent that change is.

The develop branch already targets version 1.0. That may even be fine for you, because the only thing I'm still waitng before that release is icat.server 5.0. And since we now made good progress there with the schema decisions, that release may come soon anyway.

If you can't wait for 1.0, but need the SQL functions stuff more urgent than that, we can still backport it to master and make a quick intermediate release 0.20.

@agbeltran
Copy link
Member

@RKrahl @MRichards99 Ideally we would like a quick release in 0.20 if possible, so that we can incorporate this fix in datagatway soon

@MRichards99
Copy link
Contributor Author

@RKrahl @MRichards99 Ideally we would like a quick release in 0.20 if possible, so that we can incorporate this fix in datagatway soon

In this case, I might make a different branch that branches off master just to make the process of releasing 0.20.0 a bit easier (rather than merging this branch into master as I noticed there were a few differences). It might help me to start from scratch too

@MRichards99
Copy link
Contributor Author

I'm converting this to a draft PR (and unlinking #86) as I don't intend for this to be merged but it might be useful for all of us to view back at this PR to look at the discussion.

@MRichards99 MRichards99 marked this pull request as draft October 7, 2021 17:00
@RKrahl
Copy link
Member

RKrahl commented Oct 12, 2021

Btw., I suggest we should keep the link with #86 for the sake of documentation but close this one after the merge of #89.

@RKrahl RKrahl linked an issue Oct 12, 2021 that may be closed by this pull request
@RKrahl RKrahl closed this Oct 29, 2021
@RKrahl RKrahl deleted the sql-functions-for-fields branch October 29, 2021 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow SQL Functions to be used on Fields

4 participants