How (NOT) to Write a Patch for PostgreSQL


In this quick and dirty article, we will be describing a syntax and implementing the new syntax in postgresql backend. The syntax to be implemented is CORRESPONDING clause, which is referenced in the SQL20nn standard draft as “Feature F301, CORRESPONDING in query expressions”.

CORRESPONDING clause can be seen as a modifier for set operations in SELECT statements. It filters the projected columns to either match the same column names, or match the column names with a list (CORRESPONDING BY).

Note: [] denotes optionality

query1 and query2 are select statements.

query1 UNION [ ALL ] [ CORRESPONDING [ BY ( column_list ) ] ] query2

query1 EXCEPT [ ALL ] [ CORRESPONDING [ BY ( column_list ) ] ] query2

query1 INTERSECT [ ALL ] [ CORRESPONDING [ BY ( column_list ) ] ] query2

  • CORRESPONDING returns all columns that are in both query1 and query2with the same name.
  • CORRESPONDING BY returns all columns in the column_list that are also in both query1 and query2 with the same name.

We will use UNION operator, INTERSECT and EXCEPT works the same with CORRESPONDING clause. Here are some examples to clarify what CORRESPONDING means.

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING SELECT 4 a, 5 b, 6 c;
 a | b | c 
 1 | 2 | 3
 4 | 5 | 6
(2 rows)

Now the fun part, we have removed column b from query2:

 a | c  
 1 | 3 
 4 | 6 
(2 rows)

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING SELECT 4 d; ERROR: UNION queries with a CORRESPONDING clause must have at least one column with the same name

The idea with CORRESPONDING is, unmatched columns are removed. If no columns left to project, then it is a syntax error.

For the CORRESPONDING BY(column_list) part, one more step of filtering is done to ensure that columns in the column_list are present in query1 and query2.

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a, b, c) SELECT 4 a, 5 b, 6 c;
 a | b | c 
 1 | 2 | 3
 4 | 5 | 6
(2 rows)

We remove a column name from column_list:

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a, c) SELECT 4 a, 5 b, 6 c;
 a | c 
 1 | 3
 4 | 6
(2 rows)

Although we can suggest that CORRESPONDING BY is more useful with * queries as in:



There is a single main reason why I pursued to write a patch for CORRESPONDING in postgresql, I decided to delve into postgresql code base with a simple project and it was in the TODO list. As it turns out it is not simple, and I cannot go without the saying “If it is in the TODO list, it ain’t simple”.

Development Environment

After setting up a vmware virtual machine with Pardus Linux installed, complete all the updates and make sure you have at least the following software configuration.

  • Pardus Linux 2011.2 (kernel i686 GNU/Linux)
  • gcc 4.5.3 (compiler)
  • flex 2.5.35 (lexer)
  • bison 2.4.3 (parser generator)
  • autotools
  • gdb
  • ccache (compiler cache)
  • git (version control)
  • eclipse CDT (IDE)

For eclipse configuration please see:

For other details about setting up an environment for postgresql development, please see


In this section, we will touch to the very basics of postgres query execution relating to the execution of a set operation(UNION, INTERSECT, EXCEPT)

  1. Parser
    1. Raw Parser – Generate a raw parse tree from the sql statement.
    2. Analyzer – Generate a query tree from a raw parse tree. (This is what we will modify mostly)
  2. Optimizer – Choose an efficient plan.
  3. Executor – Execute the chosen plan.

How Exactly

There are two simple paths of implementing the CORRESPONDING clause, one is modifying the query tree such that query1 and query2 are wrapped in subqueries, and the other is modifying the way that UNION works which is a big deal involving many parts of the code from analyzer to optimizer. We chose to implement it by modifying the query tree which I will describe in a moment.

  1. Modify parser to accept new syntax.
    1. Introduce keywords.
    2. Generate a relevant parse tree for the set operation.
    3. Add metadata about CORRESPONDING operation to node structures in the raw parse tree. (column_list, is it CORRESPONDING or CORRESPONDING BY, etc.)
  2. Update analyzer to process the new syntax.
    1. While processing the set operation, verify validity of the CORRESPONDING semantics.
      1. Does the columns exist in both tables
      2. Is there any non-existent column names in column_list.
      3. Is output projection non-empty, etc.
    2. If CORRESPONDING clause is valid, modify query1 and query2.
  3. Write documentation.
  4. Write regressions.

Modifying parser is the easiest part, you would think. What is the big deal? We are only going to add a keyword, what could go wrong? Well, keep on.


gram.y is a bison grammar file. Bison produces a c file from this input. It’s syntax is intuitive and you will feel familiar if you check a few SQL statements you know by heart and inspect their gram.y counterparts. For the hasty it gram.y looks like this:

Here, uppercase words are keywords, lower case words are further definitions. statement between {} is the raw parse tree description of GrantStmt. privileges, privilege_target, grantee_list, opt_grant_grant_option are defined below this definition of GrantStmt.

Between the curly braces, we refer to the positional arguments of the statement.

n->privileges = $2;

$2 refers to privileges in the first line for example.


To use CORRESPONDING BY as a valid clause, we need to add CORRESPONDING and BY keywords to the parser. BY keyword is already used in GROUP BY, ORDER BY etc. so we only need to add CORRESPONDING token.

Add CORRESPONDING to select statement as a valid clause.

Now we need to define opt_corresponding_clause.

We have modified the parameter list of makeSetOp. Let’s go over there.

Now that we have used SelectStmt->correspondingClause, let’s add it to parsenodes.h

Code will compile, but there is one tidbit missing. Postgres will shout an error to you about unrecognized keywords if you use a CORRESPONDING query.

You have to add all new keywords to kwlist.h

At this point we can compile postgres and see that it dismisses the CORRESPONDING or CORRESPONDING BY from a query without an error.


In transformSetOperationTree function, we determine the output column names, and verify their existence in tables of both sides. Here CORRESPONDING part is given. CORRESPONDING BY is similar.

To create a subquery with given column names and a query to me used as a subquery, we use createSubqueryForCorresponding function. This function returns a SelectStmt for the following statement: “SELECT outputColumns FROM (main_arg)


For documentation, please see changes in

  • doc/src/sgml/queries.sgml
  • doc/src/sgml/sql.sgml

For regression, please see the excellent article from postgresql, and see the changes in

  • src/test/regress/sql/corresponding_union.sql
  • src/test/regress/serial_schedule
  • src/test/regress/parallel_schedule
  • src/test/regress/expected/corresponding_union.out

The (not) part

This is the main reason the patch was not accepted by the postgresql community: When we modify the parse tree in the analyzer.c, we modify it for good. Optimizer, planner, executor never sees a CORRESPONDING clause, they only see the subquery we generated.


SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a, c) SELECT 4 a, 5 b, 6 c

=# select * from v1;
 a | c 
 1 | 3
 4 | 6
(2 rows)

=# select * from pg_views WHERE viewname = 'v1';
 schemaname | viewname | viewowner |                                                                  definition                                                                  
 public     | v1       | kerem     |

SELECT alias.a, alias.c FROM (SELECT 1 AS a, 2 AS b, 3 AS c) alias UNION SELECT alias.a, alias.c FROM (SELECT 4 AS a, 5 AS b, 6 AS c) alias

(1 row)

Patch File