Skip to content

Conversation

@zorancv
Copy link
Contributor

@zorancv zorancv commented Aug 20, 2024

Add support for querying more entities with a single SQL query.

@zorancv zorancv changed the base branch from master to krishna/subgraph-composition-triggers-adapter-refactor August 21, 2024 20:29
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 5 times, most recently from 867fd69 to 5de07ca Compare August 29, 2024 14:36
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 2 times, most recently from 6229fc2 to d2a4071 Compare September 3, 2024 13:36
@zorancv zorancv changed the base branch from krishna/subgraph-composition-triggers-adapter-refactor to master September 3, 2024 13:42
@zorancv zorancv changed the base branch from master to krishna/subgraph-composition-triggers-adapter-refactor September 3, 2024 13:43
@zorancv zorancv changed the base branch from krishna/subgraph-composition-triggers-adapter-refactor to master September 4, 2024 15:02
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 2 times, most recently from 6e4f7be to b1d0bfa Compare September 4, 2024 15:48
@zorancv zorancv changed the base branch from master to krishna/subgraph-composition-triggers-adapter-refactor September 4, 2024 15:48
@incrypto32 incrypto32 added this to the Subgraph Composition milestone Sep 10, 2024
@incrypto32 incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from b4ad24f to b35754e Compare September 12, 2024 08:09
@incrypto32 incrypto32 force-pushed the zoran/subgraph-composition-sql-more-entities branch from 2af7c45 to e03a549 Compare September 12, 2024 08:35
@zorancv zorancv marked this pull request as ready for review November 8, 2024 14:48
@zorancv zorancv requested a review from lutter November 8, 2024 14:48
@zorancv zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch 2 times, most recently from 1f66779 to 51950cb Compare December 4, 2024 19:31
@zorancv zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 51950cb to 72c8848 Compare December 5, 2024 10:19
@incrypto32 incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 72c8848 to eb3f792 Compare December 5, 2024 12:03
@incrypto32 incrypto32 force-pushed the zoran/subgraph-composition-sql-more-entities branch from b892793 to ca47ec7 Compare December 5, 2024 13:55
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from bae437b to ca7824e Compare December 5, 2024 14:53
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from ca7824e to 352852e Compare December 5, 2024 22:26
@zorancv zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from eb3f792 to 943f821 Compare December 5, 2024 22:27
@incrypto32 incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 943f821 to 24487d8 Compare December 6, 2024 06:04
@zorancv zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 24487d8 to d8eb30e Compare December 6, 2024 14:30
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from 352852e to 62ccaee Compare December 6, 2024 14:43
@lutter lutter self-requested a review December 6, 2024 19:17
Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully follow how this loads entities and classifies them as create/modify/delete, partly because the interplay of mutable/immutable and is_upper_range is very intricate. I am not sure if code changes can clarify that (would be great); if not, this needs more comments explaining what is going on.

}

#[derive(Debug)]
pub enum EntitySubgraphOperation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this makes sense, but might be better to stick the entity into the operation, i.e. to have Create(Entity), Modify(Entity), Delete(Entity); that way, users of this enum are forced to consider each of these cases, and it would make it easy to, e.g., change to Delete(Id), i.e., not load the entire entity if that's useful.

But I don't understand how all of composition fits together, so I am not sure if this suggestions is really an improvement.

Copy link
Contributor Author

@zorancv zorancv Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here the naming is poor. The enum EntitySubgraphOperation is a kind of operation, with definition more akin to the C style enums. Where the struct just bellow it EntityWithType contains the whole 'entity event' with all the fields including this operation kind and the entity itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some renaming which I believe is closer to the semantics. I done it in a separate PR in order to save on rebasing.

#[derive(Debug, Clone, Copy)]
pub enum EntityBlockRange {
Mutable(BlockRange), // TODO: check if this is a proper type here (maybe Range<BlockNumber>?)
Mutable((BlockRange, bool)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment explaining what the bool is for

Actually, since this is to indicate whether we want to compare with the upper or lower of the block range, it might be worth having an enum { Lower, Upper } for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Enum seems reasonable here.

Bound::Excluded(block) => {
out.push_bind_param::<Integer, _>(block)?;
out.push_sql("+1");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just push the bind param block + 1

Copy link
Contributor Author

@zorancv zorancv Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to make the compiler borrower happy as the addition creates a temporary variable that can't outlive it's block...

}

/// Output SQL that matches only rows whose block range contains `block`.
pub fn contains<'b>(&'b self, out: &mut AstPass<'_, 'b, Pg>) -> QueryResult<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very confusing. It would help if the comment described what the SQL text is that can be generated by this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some sensible comment. Hope it's clear now.

};
Ok((ewt, block))
};
while lower_now.is_some() || upper_now.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of this is pretty intricate. There should be a comment about what the strategy here is and how it works.

Ok((ewt, block))
};
while lower_now.is_some() || upper_now.is_some() {
let (ewt, block) = if lower_now.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole chain of if .. else would probably be clearer if it was turned into a match (lower_now, upper_now) { .. } as that's more symmetric than nested ifs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it with a couple of matches. Not sure that it's more readable, but for sure one level of block nesting is spared and is probably more idiomatic too.

}

// Generate
// select '..' as entity, to_jsonb(e.*) as data, block$ as block_number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mutable tables, what is block_number set to? (Looking at the code below, I think I can work it out, but would be nice to have it in the comment)

if first {
// In case we have only immutable entities, the upper range will not create any
// select statement. So here we have to generate an empty SQL statement.
out.push_sql("select 1");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? If we have clauses in the query, it returns (text, jsonb, int), but this fallback just returns (int) and is therefore the wrong shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I tested it. The intention was to return an empty set. Now when I run it on the command line it returns one row, so maybe diesel silently substituted wrong types with empty rows, but its a bad style for sure. Needs fixing.

Copy link
Contributor Author

@zorancv zorancv Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it with a query that has the same structure and added a test for that case.


// Generate
// select '..' as entity, to_jsonb(e.*) as data, block$ as block_number
// from schema.table e where id = $1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this left over from something else? I don't see a clause id = $1 being generated.

@zorancv zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from d8eb30e to 37bf3cd Compare December 9, 2024 14:00
@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 5 times, most recently from 234342c to 501c8ba Compare December 9, 2024 23:21
@zorancv
Copy link
Contributor Author

zorancv commented Dec 10, 2024

@lutter I believe that the comments in the find_range() make the matching algorithm for detecting the type of the operation more clear. Not sure if there is an easy way to make it simpler. Also I feel that splitting mutable and immutable cases won't improve things much, as the immutable case is a special case (and simpler one) of the mutable one.

@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from 5e4df38 to a213d3a Compare December 10, 2024 14:13
EntityBlockRange::Mutable((_, bound_side)) => match bound_side {
BoundSide::Lower => out.push_sql(" lower(block_range) "),
BoundSide::Upper => out.push_sql(" upper(block_range) "),
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but you can match on nested data, too, like:

match self {
    EntityBlockRange::Mutable((_, BoundSide::Lower)) =>  out.push_sql(" lower(block_range) "),
    EntityBlockRange::Mutable((_, BoundSide::Upper)) => out.push_sql(" upper(block_range) "),
    EntityBlockRange::Immutable(_) => out.push_sql(" block$ ")
 }


// collect all entities that have their 'lower(block_range)' attribute in the
// interval of blocks defined by the variable block_range. For the immutable
// entities the respective attribute is 'block$'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps, but somehow buries the lede: lower_vec contains all entities that were created or updated in block_range

// collect all entities that have their 'upper(block_range)' attribute in the
// interval of blocks defined by the variable block_range. For the immutable
// entities no entries are returned.
let upper_vec =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, upper_vec contains all entitites that were deleted or updated, but will have the previous versions, i.e. in the case of an update, it's the version before the update, and lower_vec will have a corresponding entry with the new version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just copy this text.

// deduced. For immutable entities the entries in upper_vec are missing hence they are considered
// having a lower bound at particular block and upper bound at infinity.
while lower_now.is_some() || upper_now.is_some() {
let (ewt, block) = match (lower_now.is_some(), upper_now.is_some()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not match on match (lower_now, upper_now) and match arms will then look like (Some(lower), None) etc. I think that would also get rid of the explicit lower and upper variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's beautiful! Did it.

// to match entities that have entries in both vectors for a particular block. The match is
// successfull if an entry in one array has same values for the number of the block, entity
// name and the entity id. The comparison operation over the EntityDataExt fullfils that check.
// In addition to that, it also helps to order the elements so the algorith can detect if one
Copy link
Collaborator

@lutter lutter Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the sentence it also helps ... I think it should also be mentioned somewhere that this algorithm relies on the fact that lower_vec and upper_vec are ordered by (entity type, entity id, block) which allows for mergesort-like processing of the two vecs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did rewording. Hope it's clear now.

lower_now = lower_iter.next();
lower = lower_now.unwrap_or(&EntityDataExt::default()).clone();
upper_now = upper_iter.next();
upper = upper_now.unwrap_or(&EntityDataExt::default()).clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if lower_vec contains multiple entries for the same entity? Advancing upper_now might move us to the next entity, and so we think that lower_now which has another version of the entity we just looked at will look like a create.

Copy link
Collaborator

@lutter lutter Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether a better strategy for all this might be to throw upper_vec and lower_vec into one vec, sort that by (entity type, entity id, block) and scan the resulting vec. You'd emit an EntityWithType if block is in the desired block range, and the operation will be determined with

  • if previous entry had different type/id, it's a create
  • if next entry has different type/id, it's a delete
  • otherwise it's an update

That sorta requires a 3 element window to pass over the Vec, but since it's a Vec, that could be done with just using indices, like vec[i-1], vec[i], vec[i+1] with some special care for i=0 and i=vec.len()-1, i.e. best to hide those intricacies in helper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it it's possible to have multiple modifications of the entity in a block. That would entail a range of a type [10,10). In that case we might end up with wrong modification since we would have ranges like [3,10), [10,10), [10,15) and there is no guarantee that [10, 10) will be before [10,15). I guess if that's the case the DB request has to be extended with a 4. sorting criteria (for the lower_vec the end of the range, and for the upper_vec the start I believe)... Still if there are more than two changes in a block the order of the inner ones is not guaranteed to be correct, only the final result for a block...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if merging the two vectors would make the algorithm simpler. It would entail having double entries for each entity that is modified. Also likely we would need a flag if it's a start or end of a range in order to deduce if there is a creation or deletion for single entries, also to avoid false creations at the first entry and false deletion at the last one. And to deduce the proper entry for update. Probably with that flag and 3 element window one can create such algorithm, but I believe it would have at least 6 checks if not more, where the current one has 5. Not to mention the additional complexity of the start/end boundaries in the helper function and the additional complexity needed for the SQL query.

@lutter
Copy link
Collaborator

lutter commented Dec 11, 2024

One other thing I am wondering about with this strategy: if you are looking for changes in [10,20) and you find an entity version with a range of [11, 200) will this properly check if there is a version [200,_) to tell whether this is a delete or an update?

@zorancv
Copy link
Contributor Author

zorancv commented Dec 11, 2024

One other thing I am wondering about with this strategy: if you are looking for changes in [10,20) and you find an entity version with a range of [11, 200) will this properly check if there is a version [200,_) to tell whether this is a delete or an update?

We don't reason about the situation around 200 since it's > 20. We only check if there is another entry ending at 11 in order to decide if it's creation or modification at 11.

@zorancv zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from 5910c02 to 841bf7d Compare December 12, 2024 15:47
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that confuses me about the sort order (block_number, entity type, entity id) is that entries for different entities can be interleaved, like if there are two entities a and b, you might end up with [ (2, a), (2,b), (3, a), (3, b)] and since find_range processes the entries in the vecs in order, can't you end up comparing an entry for a with one for b?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to keep in mind that matching is always done between elements in the different vectors, and never between elements in the same one. Also a missing element in upper_vec is a creation and missing element in lower_vec is deletion, where a match (a presence of the equivalent respective values in both vectors) is a modification. And finally the two pairs of cases where there is either no matching element on one side or no remaining element at all on that same side, are in essence equivalent and produce the same result. Advancements of iterators are done on the already evaluated element(s).

Probably a simple example would be more explanatory.
Imagine you have elements in lower_vec like this:
[ (2, a), (2, c), (3, a) (3, b), (3, c) ],

and in the upper_vec something like:
[ (2, a), (2, b), (3, a), (3, c), (4, a) ],

the strategy will first find a match of (2, a) in both vectors so modification of a in block 2, then (2, b) is missing in lower_vec so deletion of b in 2, then (2, c) is missed in upper_vec so creation there, then (3, a) matched again so another modification, (3, b) only in lower_vec - creation, (3, c) on both sides - modification, (4, a) only in upper_vec - deletion.

Looking again at the comments at the start of the big while loop I tried to enter the same content as the first paragraph here, but also referred to some other aspects and that might had made it less readable. Should I try to improve it?

Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a discussion with Zoran, and since I have been unable to come up with counterexamples to support my worries, this is good to go.

@incrypto32 incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 37bf3cd to 1fc017f Compare January 31, 2025 12:22
@incrypto32 incrypto32 force-pushed the zoran/subgraph-composition-sql-more-entities branch from 841bf7d to 868060b Compare January 31, 2025 12:35
@incrypto32 incrypto32 closed this Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants