Skip to content

Conversation

@RKrahl
Copy link
Member

@RKrahl RKrahl commented Jul 1, 2021

  • Remove the error condition that prevented to add related objects in a one to many relationships to the ORDER BY clause in class Query. Still, emit a warning rather then an error because this may have surprising effects for the search results.
  • Add a new warning class QueryOneToManyOrderWarning for that purpose and a new class QueryWarning as common base for QueryNullableOrderWarning and QueryOneToManyOrderWarning.

Closes #83.

RKrahl added 2 commits July 1, 2021 11:28
- Add a new class QueryWarning as base for QueryNullableOrderWarning
  and QueryOneToManyOrderWarning
@RKrahl RKrahl added the enhancement New feature or request label Jul 1, 2021
@RKrahl RKrahl added this to the 0.18.2 milestone Jul 1, 2021
@RKrahl RKrahl linked an issue Jul 1, 2021 that may be closed by this pull request
@RKrahl
Copy link
Member Author

RKrahl commented Jul 1, 2021

@MRichards99: please have a look (strange enough: I can't request you for reviewer).

@MRichards99
Copy link
Contributor

I'll look at this tomorrow and test that it works for DataGateway

@MRichards99
Copy link
Contributor

The changes you've made have the desired effect, sorting can now be done on one-to-many related fields. I do have a slight concern about how objects can be missing from the result (as you talk about in the original issue). I saw this happen when testing using some ISIS data on DataGateway. Here's a couple of screenshots to demonstrate what it looks like on the frontend:

image

image

Next week, I'll see if I can adjust the query in a way to prevent this behaviour and perhaps have the missing results at the end of the results (depending on direction of ordering). I highly doubt this will involve any further work for Python ICAT, but I'd like to investigate the reason behind the behaviour further before approving the changes for DataGateway.

Thanks for doing this fairly promptly, it's much appreciated from all of us working on the DataGateway project.

@RKrahl
Copy link
Member Author

RKrahl commented Jul 2, 2021

Regarding the side effects of the ordering on the search result in general and the missing objects in particular, I believe that is a feature (or bug, depending on the perspective) in icat.server and the way that ordering is implemented: apparently, under the hood, icat.server (or rather the JPA) forms a join of the related tables that appear in the order by clause, gets the tuples of attributes from that and sorts that list of tuples. Then it takes the objects being queried from that sorted list. If an objects has multiple related objects in the one to many relation, it will appear as many times in that join. If the object has no related object, it will not appear at all in that join. And that is exactly how the object appears in the final search result. In any case, I'm afraid, there is not much that could be done on the client side about that.

@MRichards99
Copy link
Contributor

I've investigated this using the use case shown in the screenshots. In summary, I found adding LEFT JOINs to the entities being joined for the field used for ordering (which has a one-to-many relationship) fixes the issue:

SELECT DISTINCT(o) FROM Investigation o
JOIN o.facility AS f JOIN f.facilityCycles AS s1 JOIN f.instruments AS s2 JOIN o.investigationInstruments AS ii JOIN ii.instrument AS s3 LEFT JOIN o.studyInvestigations AS s4 LEFT JOIN s4.study AS s5
WHERE s1.endDate >= o.startDate AND s1.id = '89968973' AND s1.startDate <= o.startDate AND s2.id = '3' AND s3.id = '3'
ORDER BY s5.pid ASC, o.id ASC
INCLUDE o.investigationInstruments AS ii, ii.instrument, o.studyInvestigations AS s1, s1.study LIMIT 0, 50

On our ISIS dev instance, that query will return 13 results, with the empty PIDs at the start/end of the list, depending on order direction. If you remove the 'LEFT', only 5 results will be returned (just like the screenshots demonstrate in my previous comment).

I tested this on DataGateway by editing L407 of query.py to have LEFT JOIN in the string. I understand this might not be a feasible solution as this could bring unintended side effects to other queries, but I wonder if there's a way to have LEFT JOINs where a one-to-many relationship occurs for ordering?

@RKrahl
Copy link
Member Author

RKrahl commented Jul 7, 2021

@MRichards99, I'm not sure if such in-depth JPQL tweaks are still within the scope of the Query class. This class is supposed to make it easier to programatically build queries commonly used in scripts. It was never intended to leverage the full extent of JPQL with all possible features. I can immediately cite some JPQL queries that have perfectly valid use cases that the Query class will never be able to be build. (That is what I meant by "can be generated with reasonable effort" in the issue comment.)

Note that you can always pass a string to the search() method so that you can build the queries your own, without using Query.

The problem that I see by introducing different JOIN types is: how would you design the Query class API to consistently control this? The class automatically infers what related objects would need to be joined. How would you tell the class in the constructor arguments which JOIN type should be used for which related object?

@RKrahl
Copy link
Member Author

RKrahl commented Jul 9, 2021

@MRichards99, after contemplating about this a little longer, I believe it is actually feasible. The idea would be to add another optional constructor argument join_specs to class Query. If provided, it must be a mapping of related object names to join specifications. Any entry in there would override how this particular related object is to be joined. The default for any relation not included would be JOIN. The default for the argument would be None which would be equivalent to the empty mapping. See 3b01c05.

E.g. the call to create the query you cite would be:

>>> query = Query(client, "Investigation",
...               conditions={
...                   "facility.facilityCycles.endDate": ">= o.startDate",
...                   "facility.facilityCycles.id": "= '89968973'",
...                   "facility.facilityCycles.startDate": "<= o.startDate",
...                   "facility.instruments.id": "= '3'",
...                   "investigationInstruments.instrument.id": "= '3'",
...               },
...               order=(("studyInvestigations.study.pid", "ASC"), ("id", "ASC")),
...               aggregate="DISTINCT",
...               includes=("investigationInstruments.instrument",
...                         "studyInvestigations.study"),
...               limit=(0,50),
...               join_specs={
...                   "studyInvestigations": "LEFT JOIN",
...                   "studyInvestigations.study": "LEFT JOIN",
...               }
... )
>>> str(query)
"SELECT DISTINCT(o) FROM Investigation o JOIN o.facility AS f JOIN f.facilityCycles AS s1 JOIN f.instruments AS s2 JOIN o.investigationInstruments AS ii JOIN ii.instrument AS s3 LEFT JOIN o.studyInvestigations AS s4 LEFT JOIN s4.study AS s5 WHERE s1.endDate >= o.startDate AND s1.id = '89968973' AND s1.startDate <= o.startDate AND s2.id = '3' AND s3.id = '3' ORDER BY s5.pid ASC, o.id ASC INCLUDE o.investigationInstruments AS ii, ii.instrument, o.studyInvestigations AS s1, s1.study LIMIT 0, 50"

What would you think of that?

Now I have a technical quetion to any knowledgable person: what are the valid join specifications in JPQL? I believe there is only JOIN, INNER JOIN, LEFT JOIN, and LEFT OUTER JOIN, where the first two and the last two are synonyms respectively. SQL has much more, but the others do not make much sense and do not exist in JPQL. Is that correct?

@RKrahl RKrahl merged commit e2cf803 into release/0.18.2 Jul 10, 2021
@RKrahl RKrahl deleted the query-order-many branch July 10, 2021 16:45
@RKrahl
Copy link
Member Author

RKrahl commented Jul 10, 2021

I merged this one and submitted a separated pull request #85 for the controlling the join specifications as these are in fact independent features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ordering on One to Many Relationships

3 participants