Subgraph composition sql more entities #5614

zorancv · 2024-08-20T14:50:36Z

Add support for querying more entities with a single SQL query.

store/postgres/src/relational_queries.rs

store/postgres/src/writable.rs

lutter

I don't fully follow how this loads entities and classifies them as create/modify/delete, partly because the interplay of mutable/immutable and is_upper_range is very intricate. I am not sure if code changes can clarify that (would be great); if not, this needs more comments explaining what is going on.

graph/src/blockchain/block_stream.rs

lutter · 2024-12-07T00:17:40Z

graph/src/blockchain/block_stream.rs

 }

+#[derive(Debug)]
+pub enum EntitySubgraphOperation {


I am not sure if this makes sense, but might be better to stick the entity into the operation, i.e. to have Create(Entity), Modify(Entity), Delete(Entity); that way, users of this enum are forced to consider each of these cases, and it would make it easy to, e.g., change to Delete(Id), i.e., not load the entire entity if that's useful.

But I don't understand how all of composition fits together, so I am not sure if this suggestions is really an improvement.

I guess here the naming is poor. The enum EntitySubgraphOperation is a kind of operation, with definition more akin to the C style enums. Where the struct just bellow it EntityWithType contains the whole 'entity event' with all the fields including this operation kind and the entity itself.

I did some renaming which I believe is closer to the semantics. I done it in a separate PR in order to save on rebasing.

lutter · 2024-12-07T00:19:22Z

store/postgres/src/block_range.rs

 #[derive(Debug, Clone, Copy)]
 pub enum EntityBlockRange {
-    Mutable(BlockRange), // TODO: check if this is a proper type here (maybe Range<BlockNumber>?)
+    Mutable((BlockRange, bool)),


There should be a comment explaining what the bool is for

Actually, since this is to indicate whether we want to compare with the upper or lower of the block range, it might be worth having an enum { Lower, Upper } for this

Yeah. Enum seems reasonable here.

lutter · 2024-12-07T00:25:52Z

store/postgres/src/block_range.rs

            Bound::Excluded(block) => {
                out.push_bind_param::<Integer, _>(block)?;
                out.push_sql("+1");
            }


This could just push the bind param block + 1

I wasn't able to make the compiler borrower happy as the addition creates a temporary variable that can't outlive it's block...

lutter · 2024-12-07T00:31:42Z

store/postgres/src/block_range.rs

    }

    /// Output SQL that matches only rows whose block range contains `block`.
    pub fn contains<'b>(&'b self, out: &mut AstPass<'_, 'b, Pg>) -> QueryResult<()> {


I find this very confusing. It would help if the comment described what the SQL text is that can be generated by this method

Added some sensible comment. Hope it's clear now.

lutter · 2024-12-07T00:34:04Z

store/postgres/src/relational.rs

+            };
+            Ok((ewt, block))
+        };
+        while lower_now.is_some() || upper_now.is_some() {


The body of this is pretty intricate. There should be a comment about what the strategy here is and how it works.

lutter · 2024-12-07T00:35:09Z

store/postgres/src/relational.rs

+            Ok((ewt, block))
+        };
+        while lower_now.is_some() || upper_now.is_some() {
+            let (ewt, block) = if lower_now.is_some() {


This whole chain of if .. else would probably be clearer if it was turned into a match (lower_now, upper_now) { .. } as that's more symmetric than nested ifs

I did it with a couple of matches. Not sure that it's more readable, but for sure one level of block nesting is spared and is probably more idiomatic too.

lutter · 2024-12-07T00:38:42Z

store/postgres/src/relational_queries.rs

+                }
+
+                // Generate
+                //    select '..' as entity, to_jsonb(e.*) as data, block$ as block_number


For mutable tables, what is block_number set to? (Looking at the code below, I think I can work it out, but would be nice to have it in the comment)

lutter · 2024-12-07T00:41:45Z

store/postgres/src/relational_queries.rs

+        if first {
+            // In case we have only immutable entities, the upper range will not create any
+            // select statement. So here we have to generate an empty SQL statement.
+            out.push_sql("select 1");


Does this work? If we have clauses in the query, it returns (text, jsonb, int), but this fallback just returns (int) and is therefore the wrong shape

I believe I tested it. The intention was to return an empty set. Now when I run it on the command line it returns one row, so maybe diesel silently substituted wrong types with empty rows, but its a bad style for sure. Needs fixing.

Addressed it with a query that has the same structure and added a test for that case.

lutter · 2024-12-07T00:42:46Z

store/postgres/src/relational_queries.rs

+
+                // Generate
+                //    select '..' as entity, to_jsonb(e.*) as data, block$ as block_number
+                //      from schema.table e where id = $1


Is this left over from something else? I don't see a clause id = $1 being generated.

zorancv · 2024-12-10T00:33:06Z

@lutter I believe that the comments in the find_range() make the matching algorithm for detecting the type of the operation more clear. Not sure if there is an easy way to make it simpler. Also I feel that splitting mutable and immutable cases won't improve things much, as the immutable case is a special case (and simpler one) of the mutable one.

lutter · 2024-12-11T19:41:56Z

store/postgres/src/block_range.rs

+            EntityBlockRange::Mutable((_, bound_side)) => match bound_side {
+                BoundSide::Lower => out.push_sql(" lower(block_range) "),
+                BoundSide::Upper => out.push_sql(" upper(block_range) "),
+            },


Not a big deal, but you can match on nested data, too, like:

match self { EntityBlockRange::Mutable((_, BoundSide::Lower)) => out.push_sql(" lower(block_range) "), EntityBlockRange::Mutable((_, BoundSide::Upper)) => out.push_sql(" upper(block_range) "), EntityBlockRange::Immutable(_) => out.push_sql(" block$ ") }

lutter · 2024-12-11T19:48:45Z

store/postgres/src/relational.rs

+
+        // collect all entities that have their 'lower(block_range)' attribute in the
+        // interval of blocks defined by the variable block_range. For the immutable
+        // entities the respective attribute is 'block$'.


This helps, but somehow buries the lede: lower_vec contains all entities that were created or updated in block_range

lutter · 2024-12-11T19:49:55Z

store/postgres/src/relational.rs

+        // collect all entities that have their 'upper(block_range)' attribute in the
+        // interval of blocks defined by the variable block_range. For the immutable
+        // entities no entries are returned.
+        let upper_vec =


Similarly, upper_vec contains all entitites that were deleted or updated, but will have the previous versions, i.e. in the case of an update, it's the version before the update, and lower_vec will have a corresponding entry with the new version.

I'll just copy this text.

lutter · 2024-12-11T19:54:53Z

store/postgres/src/relational.rs

+        // deduced. For immutable entities the entries in upper_vec are missing hence they are considered
+        // having a lower bound at particular block and upper bound at infinity.
+        while lower_now.is_some() || upper_now.is_some() {
+            let (ewt, block) = match (lower_now.is_some(), upper_now.is_some()) {


Why not match on match (lower_now, upper_now) and match arms will then look like (Some(lower), None) etc. I think that would also get rid of the explicit lower and upper variables

That's beautiful! Did it.

lutter · 2024-12-11T20:02:44Z

store/postgres/src/relational.rs

+        // to match entities that have entries in both vectors for a particular block. The match is
+        // successfull if an entry in one array has same values for the number of the block, entity
+        // name and the entity id. The comparison operation over the EntityDataExt fullfils that check.
+        // In addition to that, it also helps to order the elements so the algorith can detect if one


I don't quite understand the sentence it also helps ... I think it should also be mentioned somewhere that this algorithm relies on the fact that lower_vec and upper_vec are ordered by (entity type, entity id, block) which allows for mergesort-like processing of the two vecs.

I did rewording. Hope it's clear now.

lutter · 2024-12-11T20:04:43Z

store/postgres/src/relational.rs

+                            lower_now = lower_iter.next();
+                            lower = lower_now.unwrap_or(&EntityDataExt::default()).clone();
+                            upper_now = upper_iter.next();
+                            upper = upper_now.unwrap_or(&EntityDataExt::default()).clone();


What happens here if lower_vec contains multiple entries for the same entity? Advancing upper_now might move us to the next entity, and so we think that lower_now which has another version of the entity we just looked at will look like a create.

I wonder whether a better strategy for all this might be to throw upper_vec and lower_vec into one vec, sort that by (entity type, entity id, block) and scan the resulting vec. You'd emit an EntityWithType if block is in the desired block range, and the operation will be determined with

if previous entry had different type/id, it's a create

if next entry has different type/id, it's a delete

otherwise it's an update

That sorta requires a 3 element window to pass over the Vec, but since it's a Vec, that could be done with just using indices, like vec[i-1], vec[i], vec[i+1] with some special care for i=0 and i=vec.len()-1, i.e. best to hide those intricacies in helper functions.

Not sure it it's possible to have multiple modifications of the entity in a block. That would entail a range of a type [10,10). In that case we might end up with wrong modification since we would have ranges like [3,10), [10,10), [10,15) and there is no guarantee that [10, 10) will be before [10,15). I guess if that's the case the DB request has to be extended with a 4. sorting criteria (for the lower_vec the end of the range, and for the upper_vec the start I believe)... Still if there are more than two changes in a block the order of the inner ones is not guaranteed to be correct, only the final result for a block...

Not sure if merging the two vectors would make the algorithm simpler. It would entail having double entries for each entity that is modified. Also likely we would need a flag if it's a start or end of a range in order to deduce if there is a creation or deletion for single entries, also to avoid false creations at the first entry and false deletion at the last one. And to deduce the proper entry for update. Probably with that flag and 3 element window one can create such algorithm, but I believe it would have at least 6 checks if not more, where the current one has 5. Not to mention the additional complexity of the start/end boundaries in the helper function and the additional complexity needed for the SQL query.

lutter · 2024-12-11T20:34:29Z

One other thing I am wondering about with this strategy: if you are looking for changes in [10,20) and you find an entity version with a range of [11, 200) will this properly check if there is a version [200,_) to tell whether this is a delete or an update?

zorancv · 2024-12-11T22:49:41Z

One other thing I am wondering about with this strategy: if you are looking for changes in [10,20) and you find an entity version with a range of [11, 200) will this properly check if there is a version [200,_) to tell whether this is a delete or an update?

We don't reason about the situation around 200 since it's > 20. We only check if there is another entry ending at 11 in order to decide if it's creation or modification at 11.

lutter · 2024-12-14T02:34:04Z

store/postgres/src/relational_queries.rs

+            }
+        }
+    }
+}


One thing that confuses me about the sort order (block_number, entity type, entity id) is that entries for different entities can be interleaved, like if there are two entities a and b, you might end up with [ (2, a), (2,b), (3, a), (3, b)] and since find_range processes the entries in the vecs in order, can't you end up comparing an entry for a with one for b?

You have to keep in mind that matching is always done between elements in the different vectors, and never between elements in the same one. Also a missing element in upper_vec is a creation and missing element in lower_vec is deletion, where a match (a presence of the equivalent respective values in both vectors) is a modification. And finally the two pairs of cases where there is either no matching element on one side or no remaining element at all on that same side, are in essence equivalent and produce the same result. Advancements of iterators are done on the already evaluated element(s).

Probably a simple example would be more explanatory.
Imagine you have elements in lower_vec like this:
[ (2, a), (2, c), (3, a) (3, b), (3, c) ],

and in the upper_vec something like:
[ (2, a), (2, b), (3, a), (3, c), (4, a) ],

the strategy will first find a match of (2, a) in both vectors so modification of a in block 2, then (2, b) is missing in lower_vec so deletion of b in 2, then (2, c) is missed in upper_vec so creation there, then (3, a) matched again so another modification, (3, b) only in lower_vec - creation, (3, c) on both sides - modification, (4, a) only in upper_vec - deletion.

Looking again at the comments at the start of the big while loop I tried to enter the same content as the first paragraph here, but also referred to some other aspects and that might had made it less readable. Should I try to improve it?

lutter

Had a discussion with Zoran, and since I have been unable to come up with counterexamples to support my worries, this is good to go.

zorancv changed the base branch from master to krishna/subgraph-composition-triggers-adapter-refactor August 21, 2024 20:29

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 5 times, most recently from 867fd69 to 5de07ca Compare August 29, 2024 14:36

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 2 times, most recently from 6229fc2 to d2a4071 Compare September 3, 2024 13:36

zorancv changed the base branch from krishna/subgraph-composition-triggers-adapter-refactor to master September 3, 2024 13:42

zorancv changed the base branch from master to krishna/subgraph-composition-triggers-adapter-refactor September 3, 2024 13:43

zorancv changed the base branch from krishna/subgraph-composition-triggers-adapter-refactor to master September 4, 2024 15:02

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 2 times, most recently from 6e4f7be to b1d0bfa Compare September 4, 2024 15:48

zorancv changed the base branch from master to krishna/subgraph-composition-triggers-adapter-refactor September 4, 2024 15:48

incrypto32 added this to the Subgraph Composition milestone Sep 10, 2024

incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from b4ad24f to b35754e Compare September 12, 2024 08:09

incrypto32 force-pushed the zoran/subgraph-composition-sql-more-entities branch from 2af7c45 to e03a549 Compare September 12, 2024 08:35

zorancv mentioned this pull request Nov 7, 2024

Subgraph Composition: Reading the entities for subgraph as a datasource #5544

Closed

zorancv commented Nov 8, 2024

View reviewed changes

store/postgres/src/relational_queries.rs Outdated Show resolved Hide resolved

zorancv marked this pull request as ready for review November 8, 2024 14:48

zorancv requested a review from lutter November 8, 2024 14:48

zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch 2 times, most recently from 1f66779 to 51950cb Compare December 4, 2024 19:31

lutter reviewed Dec 4, 2024

View reviewed changes

store/postgres/src/relational_queries.rs Show resolved Hide resolved

store/postgres/src/writable.rs Outdated Show resolved Hide resolved

zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 51950cb to 72c8848 Compare December 5, 2024 10:19

incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 72c8848 to eb3f792 Compare December 5, 2024 12:03

incrypto32 force-pushed the zoran/subgraph-composition-sql-more-entities branch from b892793 to ca47ec7 Compare December 5, 2024 13:55

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from bae437b to ca7824e Compare December 5, 2024 14:53

fordN added the composition label Dec 5, 2024

fordN assigned zorancv Dec 5, 2024

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from ca7824e to 352852e Compare December 5, 2024 22:26

zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from eb3f792 to 943f821 Compare December 5, 2024 22:27

incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 943f821 to 24487d8 Compare December 6, 2024 06:04

zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 24487d8 to d8eb30e Compare December 6, 2024 14:30

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from 352852e to 62ccaee Compare December 6, 2024 14:43

lutter self-requested a review December 6, 2024 19:17

lutter requested changes Dec 7, 2024

View reviewed changes

zorancv force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from d8eb30e to 37bf3cd Compare December 9, 2024 14:00

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch 5 times, most recently from 234342c to 501c8ba Compare December 9, 2024 23:21

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from 5e4df38 to a213d3a Compare December 10, 2024 14:13

lutter requested changes Dec 11, 2024

View reviewed changes

zorancv force-pushed the zoran/subgraph-composition-sql-more-entities branch from 5910c02 to 841bf7d Compare December 12, 2024 15:47

lutter reviewed Dec 14, 2024

View reviewed changes

lutter approved these changes Dec 18, 2024

View reviewed changes

incrypto32 force-pushed the krishna/subgraph-composition-triggers-adapter-refactor branch from 37bf3cd to 1fc017f Compare January 31, 2025 12:22

Subgraph composition: sql more entities

868060b

incrypto32 force-pushed the zoran/subgraph-composition-sql-more-entities branch from 841bf7d to 868060b Compare January 31, 2025 12:35

incrypto32 closed this Feb 25, 2025

Subgraph composition sql more entities #5614

Subgraph composition sql more entities #5614

Uh oh!

Conversation

zorancv commented Aug 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lutter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zorancv Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zorancv Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zorancv Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zorancv commented Dec 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lutter Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lutter Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

zorancv Dec 10, 2024 •

edited

Loading

zorancv Dec 9, 2024 •

edited

Loading

zorancv Dec 10, 2024 •

edited

Loading

lutter Dec 11, 2024 •

edited

Loading

lutter Dec 11, 2024 •

edited

Loading