If you happen to be an SQL developer, you will know that joins are really at the core of the language. Joins come in various flavors: Inner joins, left joins, full joins, natural joins, self joins, semi-joins, lateral joins, and so on. However, one of the most important distinctions is the difference between implicit and explicit joins. Over the years, flame wars have been fought over this issue. Still, not many people know what’s really going on. I hope this post helps to shed some light on the situation.

 
graphical representation of joins of tables in a PostgreSQL database

Using implicit joins

Before I dig into practical examples, let’s create some tables that we can later use to perform our joins:

test=# CREATE TABLE a (id int, aid int);
CREATE TABLE
test=# CREATE TABLE b (id int, bid int);
CREATE TABLE

In the next step, some rows are added to the tables:


test=# INSERT INTO a
         VALUES (1, 1), (2, 2), (3, 3);
INSERT 0 3
test=# INSERT INTO b
         VALUES (2, 2), (3, 3), (4, 4);
INSERT 0 3

An implicit join is the simplest way to join data. The following example shows an implicit join:

test=# SELECT *
       FROM  a, b
       WHERE a.id = b.id;
 id | aid | id | bid
----+-----+----+-----
  2 |   2 |  2 |   2
  3 |   3 |  3 |   3
(2 rows)

In this case, all tables are listed in the FROM clause and are later connected in the WHERE clause. In my experience, an implicit join is the most common way to connect two tables. However, my observation might be heavily biased, because an implicit join is the way I tend to write things in my daily work.

Using explicit joins

Some people prefer the explicit join syntax over implicit joins due to its readability.

The following example shows an explicit join.

test=# SELECT *
       FROM a JOIN b
            ON (aid = bid);
 id | aid | id | bid
----+-----+----+-----
  2 |   2 |  2 |   2
  3 |   3 |  3 |   3
(2 rows)

In this case, tables are connected directly using an ON-clause. The ON-clause simply contains the conditions we want to use to join the tables together.

Explicit joins support two types of syntax constructs: ON-clauses and USING-clauses. An ON-clause is perfect in case you want to connect different columns with each other. A using clause is different: It has the same meaning, but it can only be used if the columns on both sides have the same name. Otherwise, a syntax error is issued:

test=# SELECT *
       FROM a JOIN b
            USING (aid = bid);
ERROR: syntax error at or near "="
LINE 1: SELECT * FROM a JOIN b USING (aid = bid);

USING is often implemented to connect keys with each other, as shown in the next example:

test=# SELECT *
       FROM a JOIN b USING (id);
 id | aid | bid
----+-----+-----
  2 |   2 |   2
  3 |   3 |   3
(2 rows)

In my tables, both column have a column called “id”, which makes it possible to implement USING here. Keep in mind: USING is mostly syntactic sugar – there is no deeper meaning.

Often, an explicit join is used to join more than two tables. To show how that works, I have added another table:

test=# CREATE TABLE c (id int, cid int);
CREATE TABLE

Let’s add some data to this table:

test=# INSERT INTO c VALUES (3, 3), (4, 4), (5, 5);
INSERT 0 2

To perform an explicit join, just add additional JOIN and USING clauses (respectively ON clauses) to the statement.

Here’s an example of an explicit join:

test=# SELECT *
       FROM a INNER JOIN b USING (id)
            JOIN c USING (id);
 id | aid | bid | cid
----+-----+-----+-----
  3 |   3 |   3 | 3
(1 row)

The same can be done with an implicit join:


test=# SELECT *
       FROM  a, b, c
       WHERE a.id = b.id
             AND b.id = c.id;
 id | aid | id | bid | id | cid
----+-----+----+-----+----+-----
  3 |   3 |  3 |   3 |  3 |   3
(1 row)

However, as you can see, there is a small difference. Check the number of columns returned by the query. You will notice that the implicit join returns more. The “id” column will show up more frequently in this case, because the implicit join handles the column list in a slightly different way.

The column list is a nasty detail, because in a real application it is always better to explicitly list all columns. This little detail should be kept in mind.

join_collapse_limit: What the optimizer does

When I am on the road working as PostgreSQL consultant or PostgreSQL support guy, people often ask if there is a performance difference between implicit and explicit joins. The answer is: “Usually not”. Let’s take a look at the following statement:

test=# explain SELECT * FROM a INNER JOIN b USING (id);
                          QUERY PLAN
-----------------------------------------------------------------
Merge Join (cost=317.01..711.38 rows=25538 width=12)
   Merge Cond: (a.id = b.id)
   -> Sort (cost=158.51..164.16 rows=2260 width=8)
      Sort Key: a.id
      -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8)
   -> Sort (cost=158.51..164.16 rows=2260 width=8)
      Sort Key: b.id
      -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8)
(8 rows)

The explicit join produces exactly the same plan as the implicit plan shown below:

test=# explain SELECT * FROM a, b WHERE a.id = b.id;
                           QUERY PLAN
-----------------------------------------------------------------
Merge Join (cost=317.01..711.38 rows=25538 width=16)
   Merge Cond: (a.id = b.id)
    -> Sort (cost=158.51..164.16 rows=2260 width=8)
       Sort Key: a.id
       -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8)
    -> Sort (cost=158.51..164.16 rows=2260 width=8)
       Sort Key: b.id
       -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8)
(8 rows)

So in the majority of all cases, an implicit join does exactly the same thing as an explicit join.

join_collapse_limit

However, this is not always the case. In PostgreSQL there is a variable called join_collapse_limit:

test=# SHOW join_collapse_limit;
 join_collapse_limit
---------------------
 8
(1 row)

What does it all mean? If you prefer explicit over implicit joins, the planner will always plan the first couple of joins automatically – regardless of which join order you have used inside the query. The optimizer will simply reorder joins the way they seem to be most promising. But if you keep adding joins, the ones exceeding join_collapse_limit will be planned the way you have put them into the query. As you can easily imagine, we are already talking about fairly complicated queries. Joining 9 or more tables is quite a lot and beyond the typical operation in most cases.

from_collapse_limit

There is another parameter called from_collapse_limit that does the same thing for implicit joins and has the same default value. If a query lists more than from_collapse_limit tables in its FROM clause, the ones exceeding the limit will not be re-ordered, but joined in the order they appear in the statement.

However, for the typical, “normal” query, the performance and the execution plans stay the same: it makes no difference which type of join you prefer.

If you want to read more about joins, consider reading some of our other blog posts: