-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Subgraph composition sql more entities #5614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subgraph composition sql more entities #5614
Conversation
867fd69 to
5de07ca
Compare
6229fc2 to
d2a4071
Compare
6e4f7be to
b1d0bfa
Compare
b4ad24f to
b35754e
Compare
2af7c45 to
e03a549
Compare
1f66779 to
51950cb
Compare
51950cb to
72c8848
Compare
72c8848 to
eb3f792
Compare
b892793 to
ca47ec7
Compare
bae437b to
ca7824e
Compare
ca7824e to
352852e
Compare
eb3f792 to
943f821
Compare
943f821 to
24487d8
Compare
24487d8 to
d8eb30e
Compare
352852e to
62ccaee
Compare
lutter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully follow how this loads entities and classifies them as create/modify/delete, partly because the interplay of mutable/immutable and is_upper_range is very intricate. I am not sure if code changes can clarify that (would be great); if not, this needs more comments explaining what is going on.
| } | ||
|
|
||
| #[derive(Debug)] | ||
| pub enum EntitySubgraphOperation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this makes sense, but might be better to stick the entity into the operation, i.e. to have Create(Entity), Modify(Entity), Delete(Entity); that way, users of this enum are forced to consider each of these cases, and it would make it easy to, e.g., change to Delete(Id), i.e., not load the entire entity if that's useful.
But I don't understand how all of composition fits together, so I am not sure if this suggestions is really an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here the naming is poor. The enum EntitySubgraphOperation is a kind of operation, with definition more akin to the C style enums. Where the struct just bellow it EntityWithType contains the whole 'entity event' with all the fields including this operation kind and the entity itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some renaming which I believe is closer to the semantics. I done it in a separate PR in order to save on rebasing.
store/postgres/src/block_range.rs
Outdated
| #[derive(Debug, Clone, Copy)] | ||
| pub enum EntityBlockRange { | ||
| Mutable(BlockRange), // TODO: check if this is a proper type here (maybe Range<BlockNumber>?) | ||
| Mutable((BlockRange, bool)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a comment explaining what the bool is for
Actually, since this is to indicate whether we want to compare with the upper or lower of the block range, it might be worth having an enum { Lower, Upper } for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Enum seems reasonable here.
| Bound::Excluded(block) => { | ||
| out.push_bind_param::<Integer, _>(block)?; | ||
| out.push_sql("+1"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just push the bind param block + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to make the compiler borrower happy as the addition creates a temporary variable that can't outlive it's block...
| } | ||
|
|
||
| /// Output SQL that matches only rows whose block range contains `block`. | ||
| pub fn contains<'b>(&'b self, out: &mut AstPass<'_, 'b, Pg>) -> QueryResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this very confusing. It would help if the comment described what the SQL text is that can be generated by this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some sensible comment. Hope it's clear now.
| }; | ||
| Ok((ewt, block)) | ||
| }; | ||
| while lower_now.is_some() || upper_now.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body of this is pretty intricate. There should be a comment about what the strategy here is and how it works.
store/postgres/src/relational.rs
Outdated
| Ok((ewt, block)) | ||
| }; | ||
| while lower_now.is_some() || upper_now.is_some() { | ||
| let (ewt, block) = if lower_now.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole chain of if .. else would probably be clearer if it was turned into a match (lower_now, upper_now) { .. } as that's more symmetric than nested ifs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it with a couple of matches. Not sure that it's more readable, but for sure one level of block nesting is spared and is probably more idiomatic too.
| } | ||
|
|
||
| // Generate | ||
| // select '..' as entity, to_jsonb(e.*) as data, block$ as block_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For mutable tables, what is block_number set to? (Looking at the code below, I think I can work it out, but would be nice to have it in the comment)
| if first { | ||
| // In case we have only immutable entities, the upper range will not create any | ||
| // select statement. So here we have to generate an empty SQL statement. | ||
| out.push_sql("select 1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? If we have clauses in the query, it returns (text, jsonb, int), but this fallback just returns (int) and is therefore the wrong shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I tested it. The intention was to return an empty set. Now when I run it on the command line it returns one row, so maybe diesel silently substituted wrong types with empty rows, but its a bad style for sure. Needs fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed it with a query that has the same structure and added a test for that case.
|
|
||
| // Generate | ||
| // select '..' as entity, to_jsonb(e.*) as data, block$ as block_number | ||
| // from schema.table e where id = $1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this left over from something else? I don't see a clause id = $1 being generated.
d8eb30e to
37bf3cd
Compare
234342c to
501c8ba
Compare
|
@lutter I believe that the comments in the |
5e4df38 to
a213d3a
Compare
store/postgres/src/block_range.rs
Outdated
| EntityBlockRange::Mutable((_, bound_side)) => match bound_side { | ||
| BoundSide::Lower => out.push_sql(" lower(block_range) "), | ||
| BoundSide::Upper => out.push_sql(" upper(block_range) "), | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but you can match on nested data, too, like:
match self {
EntityBlockRange::Mutable((_, BoundSide::Lower)) => out.push_sql(" lower(block_range) "),
EntityBlockRange::Mutable((_, BoundSide::Upper)) => out.push_sql(" upper(block_range) "),
EntityBlockRange::Immutable(_) => out.push_sql(" block$ ")
}|
|
||
| // collect all entities that have their 'lower(block_range)' attribute in the | ||
| // interval of blocks defined by the variable block_range. For the immutable | ||
| // entities the respective attribute is 'block$'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps, but somehow buries the lede: lower_vec contains all entities that were created or updated in block_range
| // collect all entities that have their 'upper(block_range)' attribute in the | ||
| // interval of blocks defined by the variable block_range. For the immutable | ||
| // entities no entries are returned. | ||
| let upper_vec = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, upper_vec contains all entitites that were deleted or updated, but will have the previous versions, i.e. in the case of an update, it's the version before the update, and lower_vec will have a corresponding entry with the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just copy this text.
store/postgres/src/relational.rs
Outdated
| // deduced. For immutable entities the entries in upper_vec are missing hence they are considered | ||
| // having a lower bound at particular block and upper bound at infinity. | ||
| while lower_now.is_some() || upper_now.is_some() { | ||
| let (ewt, block) = match (lower_now.is_some(), upper_now.is_some()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not match on match (lower_now, upper_now) and match arms will then look like (Some(lower), None) etc. I think that would also get rid of the explicit lower and upper variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's beautiful! Did it.
store/postgres/src/relational.rs
Outdated
| // to match entities that have entries in both vectors for a particular block. The match is | ||
| // successfull if an entry in one array has same values for the number of the block, entity | ||
| // name and the entity id. The comparison operation over the EntityDataExt fullfils that check. | ||
| // In addition to that, it also helps to order the elements so the algorith can detect if one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the sentence it also helps ... I think it should also be mentioned somewhere that this algorithm relies on the fact that lower_vec and upper_vec are ordered by (entity type, entity id, block) which allows for mergesort-like processing of the two vecs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did rewording. Hope it's clear now.
store/postgres/src/relational.rs
Outdated
| lower_now = lower_iter.next(); | ||
| lower = lower_now.unwrap_or(&EntityDataExt::default()).clone(); | ||
| upper_now = upper_iter.next(); | ||
| upper = upper_now.unwrap_or(&EntityDataExt::default()).clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if lower_vec contains multiple entries for the same entity? Advancing upper_now might move us to the next entity, and so we think that lower_now which has another version of the entity we just looked at will look like a create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether a better strategy for all this might be to throw upper_vec and lower_vec into one vec, sort that by (entity type, entity id, block) and scan the resulting vec. You'd emit an EntityWithType if block is in the desired block range, and the operation will be determined with
- if previous entry had different type/id, it's a create
- if next entry has different type/id, it's a delete
- otherwise it's an update
That sorta requires a 3 element window to pass over the Vec, but since it's a Vec, that could be done with just using indices, like vec[i-1], vec[i], vec[i+1] with some special care for i=0 and i=vec.len()-1, i.e. best to hide those intricacies in helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it it's possible to have multiple modifications of the entity in a block. That would entail a range of a type [10,10). In that case we might end up with wrong modification since we would have ranges like [3,10), [10,10), [10,15) and there is no guarantee that [10, 10) will be before [10,15). I guess if that's the case the DB request has to be extended with a 4. sorting criteria (for the lower_vec the end of the range, and for the upper_vec the start I believe)... Still if there are more than two changes in a block the order of the inner ones is not guaranteed to be correct, only the final result for a block...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if merging the two vectors would make the algorithm simpler. It would entail having double entries for each entity that is modified. Also likely we would need a flag if it's a start or end of a range in order to deduce if there is a creation or deletion for single entries, also to avoid false creations at the first entry and false deletion at the last one. And to deduce the proper entry for update. Probably with that flag and 3 element window one can create such algorithm, but I believe it would have at least 6 checks if not more, where the current one has 5. Not to mention the additional complexity of the start/end boundaries in the helper function and the additional complexity needed for the SQL query.
|
One other thing I am wondering about with this strategy: if you are looking for changes in |
We don't reason about the situation around 200 since it's > 20. We only check if there is another entry ending at 11 in order to decide if it's creation or modification at 11. |
5910c02 to
841bf7d
Compare
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that confuses me about the sort order (block_number, entity type, entity id) is that entries for different entities can be interleaved, like if there are two entities a and b, you might end up with [ (2, a), (2,b), (3, a), (3, b)] and since find_range processes the entries in the vecs in order, can't you end up comparing an entry for a with one for b?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to keep in mind that matching is always done between elements in the different vectors, and never between elements in the same one. Also a missing element in upper_vec is a creation and missing element in lower_vec is deletion, where a match (a presence of the equivalent respective values in both vectors) is a modification. And finally the two pairs of cases where there is either no matching element on one side or no remaining element at all on that same side, are in essence equivalent and produce the same result. Advancements of iterators are done on the already evaluated element(s).
Probably a simple example would be more explanatory.
Imagine you have elements in lower_vec like this:
[ (2, a), (2, c), (3, a) (3, b), (3, c) ],
and in the upper_vec something like:
[ (2, a), (2, b), (3, a), (3, c), (4, a) ],
the strategy will first find a match of (2, a) in both vectors so modification of a in block 2, then (2, b) is missing in lower_vec so deletion of b in 2, then (2, c) is missed in upper_vec so creation there, then (3, a) matched again so another modification, (3, b) only in lower_vec - creation, (3, c) on both sides - modification, (4, a) only in upper_vec - deletion.
Looking again at the comments at the start of the big while loop I tried to enter the same content as the first paragraph here, but also referred to some other aspects and that might had made it less readable. Should I try to improve it?
lutter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a discussion with Zoran, and since I have been unable to come up with counterexamples to support my worries, this is good to go.
37bf3cd to
1fc017f
Compare
841bf7d to
868060b
Compare
Add support for querying more entities with a single SQL query.