Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/post/gist.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ In the Greenplum Database, there are two query optimizers: Planner and GPORCA (d

Say that we had two tables called foo and bar that each had a column called `geom` of type geometry. Geometry is a GiST-indexable data type from PostGIS that is commonly used for spatial and geographical queries. We now want to find the number of points that are within 0.0005 meters of each other.

<img src="/images/gist/postgis.jpg" class="left">
{{< responsive-figure src="/images/gist/postgis.jpg" class="center">}}

Since it is not GIST-aware, the optimal plan generated by GPORCA uses two Table Scans inside a nested loop join. This can be significantly slow in execution if the tables have a large number of rows.
## Original GPORCA Generated Plan
Expand Down Expand Up @@ -179,7 +179,7 @@ Additionally, there are other indexes that are not yet supported in GPORCA such
# Conclusion
GiST indexes are a versatile template index structure that allows for the creation of indexes on custom data types. In the Greenplum Database, GPORCA originally did not handle GiST indexes, making any GPORCA generated plan extremely slow when the input grew large. We compared two different alternatives and chose the path that avoided excessive code duplication. Our final fix took advantage of existing index paths in GPORCA to allow the creation of GiST index plans. This created no/minor differences in the time it took to optimize, but is 1000x faster to run than the original plan.

<img src="/images/gist/GiST Indexes.jpg" width=1050px class="left">
{{< responsive-figure src="/images/gist/GiST Indexes.jpg" class="center">}}

## Footnotes
<a name="1">[1]</a> Consistent returns false if, given a predicate on a tree page, the user query and predicate is not true, and returns maybe otherwise.
Expand Down
8 changes: 4 additions & 4 deletions content/post/mergejoin.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ SELECT * FROM foo FULL JOIN bar ON a = c;
```
Would give the following results:

<img src="/images/mergejoin/table.png" class="left">
{{< responsive-figure src="/images/mergejoin/table.png" class="left">}}

# Introduction to Full Outer Joins in the Query Optimizer
There are a few ways of creating full outer joins in an optimizer. Currently since GPORCA does not have any native full join operator, GPORCA creates a union of a left outer join and a left anti-semi join. One such plan can be seen below.
Expand Down Expand Up @@ -90,7 +90,7 @@ This plan generated by GPORCA takes a total of 1413 milliseconds in execution, w
# Implementing Merge Join support in ORCA
GPDB has native implementations for full outer joins, one of them is a merge full outer join. Since GPORCA did not have any native full join operator, the first step was to add the merge join operator to GPORCA, allowing such a plan to be generated. Such an operator requires quite a few things to consider. One such thing is that in order to use a merge join, both the inner and outer tables need to be sorted.

<img src="/images/mergejoin/mergejoin.png" class="left">
{{< responsive-figure src="/images/mergejoin/mergejoin.png" class="left">}}

# Performance Improvements
Now that merge joins can be generated in GPORCA, we see quite a bit of improvement when generating full outer join plans.
Expand Down Expand Up @@ -136,14 +136,14 @@ For example, `EXPLAIN SELECT * FROM t1 FULL JOIN t2 on a = c WHERE a > 2` is nul

The conversion of a full join to a left join is done during exploration and allows for GPORCA to then optimize the query using both the full join and left join alternatives.

<img src="/images/mergejoin/optimization1.png" class="left" width="100%">
{{< responsive-figure src="/images/mergejoin/optimization1.png" class="center" >}}

The above query was run on two tables: t1 had 10 million rows, t2 had 1. We can see that the execution time decreased from 6234 ms to 428 ms, around a 15x improvement.

## Optimization 2: Full Outer Join → Inner Join
Similarly, if there exists a predicate where both the right side and left side tables are null-rejecting, then the FULL join can be converted into an inner join. This optimization is actually a by-product of the first, since full joins can be converted into left joins, and left joins can be optimized into inner joins. It is possible for a full join to be optimized into an inner join as well.

<img src="/images/mergejoin/optimization2.png" class="left" width="100%">
{{< responsive-figure src="/images/mergejoin/optimization2.png" class="center" >}}

Even in the simplest query, this provides a great improvement for the execution time. Here we can see that the execution time decreased from 6659 milliseconds to 362 milliseconds, resulting in a performance gain of around 20x.

Expand Down