Inner Join
Inner join, outer joy
Intro🔗
Joining tables is the most basic skill that is needed when someone works with SQL.
The most common ones are the INNER JOIN
, the LEFT OUTER JOIN
and sometimes the RIGH OUTER JOIN
.
This post is about the INNER JOIN
, and it's variants.
DB2, traditional join🔗
IIRCC when Iseries DB2 went from V5.3 to V5.4, the order in which the tables where declared suddenly mattered. And we had to change the following (bad) code.
file: i5_bad.sql
SELECT * FROM main a, subsub c, sub b WHERE a.id = b.main_id AND b.id = c.sub_id
Into the better:
file: i5_better.sql
SELECT * FROM main a, sub b, subsub c WHERE a.id = b.main_id AND b.id = c.sub_id
Of course this is a very simple example. The queries that we made (cough, generated by php, couch), were much more complicated: many tables, Group By's, COALESCE's, CASE, .. And because the database was multi-tennant, the relational model of these tables was more complex. To join 2 tables, we had to match 2-4 columns.
Here is a more realistic example; we'll just show the joining conditions and leave the rest simple.
file: real.sql
SELECT * FROM ororta a, bebeta b, opopta c, plplta d, repota e WHERE a.ororaa = b.ororbb AND a.cocoaa = b.cocobb AND b.opopbb = c.opopcc AND b.cocobb = c.cococc AND d.orordd = b.ororbb AND d.opopdd = b.opopbb AND b.ororbb = e.ororee AND e.opopee = b.opopee AND b.typebb = 'AU' AND b.statee <> 'L' AND d.typdd = 'P' AND e.typeee = 'P' AND a.cocoaa = 15 AND a.stataa IN ( 'Z', 'A' ) AND a.qtyaaa - a.delaaa > 0
That is horrible to read and a nigthmare for maintenance. Let's improve that a bit by using better aliases then 'a', 'b', 'c', 'd', 'e'.
file: improved.sql
SELECT * FROM ororta order, bebeta treatment, opopta operation, plplta planning, repota reporting WHERE order.ororaa = treatment.ororbb AND order.cocoaa = treatment.cocobb AND treatment.opopbb = operation.opopcc AND treatment.cocobb = operation.cococc AND planning.orordd = treatment.ororbb AND planning.opopdd = treatment.opopbb AND treatment.ororbb = reporting.ororee AND reporting.opopee = treatment.opopbb AND treatment.typebb = 'AU' AND treatment.statbb <> 'L' AND planning.typedd = 'P' AND reporting.typeee = 'P' AND order.cocoaa = 15 AND order.stataa IN ( 'Z', 'A' ) AND order.qtyaaa - order.delaaa > 0
Now that we know what table is what, lets move the conditions from the WHERE clause into a INNER JOIN
file: inner.sql
SELECT * FROM ororta order INNER JOIN bebeta treatment ON order.ororaa = treatment.ororbb AND order.cocoaa = treatment.cocobb AND treatment.typebb = 'AU' AND treatment.statbb <> 'L' INNER JOIN opopta operation ON AND treatment.opopbb = operation.opopcc AND treatment.cocobb = operation.cococc INNER JOIN plplta planning ON AND planning.orordd = treatment.ororbb AND planning.opopdd = treatment.opopbb AND planning.typedd = 'P' INNER JOIN repota reporting ON AND treatment.ororbb = reporting.ororee AND reporting.opopdd = treatment.opopbb AND reporting.typeee = 'P' WHERE AND order.cocoaa = 15 AND order.stataa IN ( 'Z', 'A' ) AND order.qtyaaa - order.delaaa > 0
Wow, we can see immediately what table is by what criteria. Nevertheless, it is still difficult to read as our mind needs to parse the aliases on both sides of the '='.
file: readable.sql
SELECT * FROM ororta order INNER JOIN bebeta treatment ON treatment.ororbb = order.ororaa AND treatment.cocobb = order.cocoaa AND treatment.typebb = 'AU' AND treatment.statbb <> 'L' INNER JOIN opopta operation ON AND operation.opopcc = treatment.opopbb AND operation.cococc = treatment.cocobb INNER JOIN plplta planning ON AND planning.orordd = treatment.ororbb AND planning.opopdd = treatment.opopbb AND planning.typedd = 'P' INNER JOIN repota reporting ON AND reporting.ororee = treatment.ororbb AND reporting.opopee = treatment.opopbb AND reporting.typeee = 'P' WHERE AND order.cocoaa = 15 AND order.stataa IN ( 'Z', 'A' ) AND order.qtyaaa - order.delaaa > 0
Now we have query that is is rather easy to read and to maintain. I lik this way of organizing queries, and if you are consistent, the step to 'LEFT/RIGHT OUTER JOIN` is realy small.
Side effect: Speedup🔗
By moving a complex, query from the WHERE clause to INNER JOIN, we sometimes measured speed ups from around 13 seconds to 3 seconds. Of course, with better indexes there would probably be even more improvements possible. It could be that this is a DB2 specific thing and that the optimizer could make a better execution plan. I do not know, the report got a bit faster.
Disadvantage🔗
The real problem that I see is that the oracle die-hards have problems with the positioning of the critaria fields because of the (+)
(-)
feature.