diff --git a/content/post/gist.md b/content/post/gist.md index 499f7d4d..b9b0bd40 100644 --- a/content/post/gist.md +++ b/content/post/gist.md @@ -36,7 +36,7 @@ In the Greenplum Database, there are two query optimizers: Planner and GPORCA (d Say that we had two tables called foo and bar that each had a column called `geom` of type geometry. Geometry is a GiST-indexable data type from PostGIS that is commonly used for spatial and geographical queries. We now want to find the number of points that are within 0.0005 meters of each other. - +{{< responsive-figure src="/images/gist/postgis.jpg" class="center">}} Since it is not GIST-aware, the optimal plan generated by GPORCA uses two Table Scans inside a nested loop join. This can be significantly slow in execution if the tables have a large number of rows. ## Original GPORCA Generated Plan @@ -179,7 +179,7 @@ Additionally, there are other indexes that are not yet supported in GPORCA such # Conclusion GiST indexes are a versatile template index structure that allows for the creation of indexes on custom data types. In the Greenplum Database, GPORCA originally did not handle GiST indexes, making any GPORCA generated plan extremely slow when the input grew large. We compared two different alternatives and chose the path that avoided excessive code duplication. Our final fix took advantage of existing index paths in GPORCA to allow the creation of GiST index plans. This created no/minor differences in the time it took to optimize, but is 1000x faster to run than the original plan. - +{{< responsive-figure src="/images/gist/GiST Indexes.jpg" class="center">}} ## Footnotes [1] Consistent returns false if, given a predicate on a tree page, the user query and predicate is not true, and returns maybe otherwise. diff --git a/content/post/mergejoin.md b/content/post/mergejoin.md index 2d5528e6..78d87fb9 100644 --- a/content/post/mergejoin.md +++ b/content/post/mergejoin.md @@ -33,7 +33,7 @@ SELECT * FROM foo FULL JOIN bar ON a = c; ``` Would give the following results: - +{{< responsive-figure src="/images/mergejoin/table.png" class="left">}} # Introduction to Full Outer Joins in the Query Optimizer There are a few ways of creating full outer joins in an optimizer. Currently since GPORCA does not have any native full join operator, GPORCA creates a union of a left outer join and a left anti-semi join. One such plan can be seen below. @@ -90,7 +90,7 @@ This plan generated by GPORCA takes a total of 1413 milliseconds in execution, w # Implementing Merge Join support in ORCA GPDB has native implementations for full outer joins, one of them is a merge full outer join. Since GPORCA did not have any native full join operator, the first step was to add the merge join operator to GPORCA, allowing such a plan to be generated. Such an operator requires quite a few things to consider. One such thing is that in order to use a merge join, both the inner and outer tables need to be sorted. - +{{< responsive-figure src="/images/mergejoin/mergejoin.png" class="left">}} # Performance Improvements Now that merge joins can be generated in GPORCA, we see quite a bit of improvement when generating full outer join plans. @@ -136,14 +136,14 @@ For example, `EXPLAIN SELECT * FROM t1 FULL JOIN t2 on a = c WHERE a > 2` is nul The conversion of a full join to a left join is done during exploration and allows for GPORCA to then optimize the query using both the full join and left join alternatives. - +{{< responsive-figure src="/images/mergejoin/optimization1.png" class="center" >}} The above query was run on two tables: t1 had 10 million rows, t2 had 1. We can see that the execution time decreased from 6234 ms to 428 ms, around a 15x improvement. ## Optimization 2: Full Outer Join → Inner Join Similarly, if there exists a predicate where both the right side and left side tables are null-rejecting, then the FULL join can be converted into an inner join. This optimization is actually a by-product of the first, since full joins can be converted into left joins, and left joins can be optimized into inner joins. It is possible for a full join to be optimized into an inner join as well. - +{{< responsive-figure src="/images/mergejoin/optimization2.png" class="center" >}} Even in the simplest query, this provides a great improvement for the execution time. Here we can see that the execution time decreased from 6659 milliseconds to 362 milliseconds, resulting in a performance gain of around 20x.