index improvements postgresql 13Indexes are one of the core features of all the database management systems (DBMS). Indexes have a very long history in PostgreSQL, which has quite a rich set of index features. PostgreSQL has B-Tree, Hash,  GIN, GIST, and BRIN indexes. And because the development around indexes is still going on, PostgreSQL 13 provides some enhancements.

We can divide the overall improvements into two categories:

  1. Changes which are transparent to the user. There won’t be any visible changes, but they will get the benefit automatically after the upgrade, probably after a rebuild of the index.  No application change required.
  2. The second set of improvements require the user to explicitly use the new feature. 

Both types of improvements have been introduced in PostgreSQL Version 13. Sometimes it is difficult to extract the information from the release notes and comprehend for an end-user or even convince an end-user, so this blog contains examples of improvements done in PostgreSQL 13.

1. Deduplication of B-Tree Index [1]

Non-unique/primary B-Tree indexes generally contain a lot of duplicate values. The storage of duplicates plays a vital role, especially in B-Tree, which allows aggregate operations like COUNT or GROUP BY to use these indexes. However, a smart way of packing can eliminate storing the actual duplicate value by just maintaining the reference.

The process of deduplication eliminates the redundant/duplicate values from the tree. This “Deduplication” process reduces the storage requirement of the index, as each key will be stored only once to save space.

 

[Note: This is just an example to show the reduction in size, not a benchmark. You may get different numbers.]

A new index parameter is introduced called deduplicate_items which can be specified while creating the index. This parameter is used to enable/disable the deduplication. It is “ON” by default, which means that the benefit of the deduplication will be available without explicitly making any change*.

Note: If you are upgrading PostgreSQL from older versions using the pg_upgrade, all indexes need to be REINDEX to avail the benefit of deduplication, regardless of which version you are upgrading from. 

2. Allow GiST [2.1] and SP-GiST [2.2] Indexes for Box/Point Distance Lookups

The GiST index is a template for developing further indexes over any kind of data, supporting any lookup over that data. By default, it supports a wide range of operators. In PostgreSQL 13, this new patch adds support for the missing “<-> (box, point)” operator to GiST box_ops as the ordering operator.

Let’s consider the example of a table with a BOX field, a POINT, and a CIRCLE field.

Insert Data to the table:

Analyze the table:

Create a GiST index on the Box Field:

Now let’s have a query to check all the boxes contained in another box area but sorted by distance to a point.

Please note the ordering operator usage in the above query like “b <-> POINT(5.2, 5.91);”. This was not possible in the previous PostgreSQL versions.

3. Allow GIN Indexes to More Efficiently Handle NOT Restrictions [3]

This is a performance enhancement for queries that use GIN indexes. Now the GIN index will be more efficient in handling the Negation restrictions. This improvement avoids the full scanning of GIN indexes.

4. Index Operator Class Parameters [4]

PostgreSQL has a variety of index access methods such as (1) GiST, (2) GIN, (3) SP-GiST, and (4) BRIN. While creating an index there is already an option to specify the operator class. The operator class contains the comparison function to be used for the index. Normally when we create an index without specifying the operator class, the default operator class is used and most of the time it is sufficient. But in some cases there is a need to have more than one meaningful behavior, therefore we need to specify the operator class.

These opclasses define the representation of keys and operations on them. Along with that it also defines the supported search strategies. To add some user-side decisions to opclass opclass_parameter is introduced. New syntax in INDEX creation is added to specify the operator class options.

5. GiST Signature Length

Allow CREATE INDEX to specify the GiST signature length and the maximum number of integer ranges. Now a user can specify the GIST Index parameter. 

6. Prevent Indexes That Use Non-Default Collations From Being Added as a Table’s Unique or Primary Key Constraint [5]

The index and column collations must now match so the index’s uniqueness matches the column’s uniqueness.

*[Currently this is on by default, but a final decision will be on GA]

Note: All the information is based on PostgreSQL 13 beta.