EXPLAIN keyword

EXPLAIN displays the execution plan of an INSERT, SELECT, or UPDATE statement.

Syntax

Description

A query execution plan shows how a statement will be implemented: which table is going to be accessed and how, what join method are employed, and which predicates are JIT-compiled etc. EXPLAIN output is a tree of nodes containing properties and subnodes (aka child nodes).

In a plan such as:

QUERY PLAN
Async JIT Filter
filter: 100 `<` l
workers: 1
PageFrame
Row forward scan
Frame forward scan on: tab

there are:

4 nodes:
- Async JIT Filter
- PageFrame
- Row forward scan
- Frame forward scan
2 properties (both belong to Async JIT Filter node):
- filter
- workers

For simplicity, some nodes have special properties shown on the same line as type; for example, Filter filter: b.age=10 or Limit lo: 10.

The following list contains some plan node types:

Async Filter - a parallelized filter that evaluates expressions with Java code. In certain scenarios, it also implements the LIMIT keyword.
Async JIT Filter - a parallelized filter that evaluates expressions with Just-In-Time-compiled filter. In certain scenarios, it also implements the LIMIT keyword.
Interval forward - scans one or more table data ranges based on the designated timestamp predicates. Scan endpoints are found via a binary search on timestamp column.
CachedWindow - container for window functions that copies data to memory and sorts it, e.g. row_number()
Window - container for window functions optimized for frames ordered by designated timestamp. Instead of copying the underlying dataset to memory it buffers just enough per-partition values to compute function result.
Count - returns the count of records in subnode.
Cursor-order scan - scans table records using row ids taken from an index, in index order - first all row ids linked to index value A, then B, etc.
PageFrame - full or partial table scan. It contains two children:
- row cursor - which iterates over rows inside a frame (e.g. Row forward scan).
- frame cursor - which iterates over table partitions or partition chunks (e.g. Frame forward scan).
Filter - standalone (non-JIT-compiled, non-parallelized) filter.
Frame forward/backward scan - scans table partitions in a specified direction.
GroupBy - group by with or without key(s). If vectorized field shows true, then the node is parallelized and uses vectorized calculations.
Hash - subnode of this node is used to build a hash table that is later looked up (usually in a JOIN clause but also applies to EXCEPT or INTERSECT).
Index forward/backward scan - scans all row ids associated with a given symbol value from start to finish or vice versa.
Limit - standalone node implementing the LIMIT keyword. Other nodes can implement LIMIT internally, e.g. the Sort node.
Row forward/backward scan - scans data frame (usually partitioned) records in a specified direction.
Sort - sorts data. If low or hi property is specified, then the sort buffer size is limited and a number of rows are skipped after sorting.
SampleBy - SAMPLE BY keyword implementation. If the fill is not shown, it means fill(none).
Selected Record - used to reorder or rename columns. It does not do any significant processing on its own.
Table-order scan - scans table records using row ids taken from an index in table (physical) order - from the lowest to highest row id.
VirtualRecord - adds expressions to a subnode's columns.

Other node types should be easy to link to SQL and database concepts, e.g. Except, Hash Join or Lt Join.

Many nodes, especially join and sort, have 'light' and 'heavy' variants, e.g. Hash Join Light and Hash Join. The former is used when child node(s) support efficient random access lookups (e.g. PageFrame) so storing row id in the buffer is enough; otherwise, the whole record needs to be copied and the 'heavy' factory is used.

Examples

To illustrate how EXPLAIN works, consider the trades table in the QuestDB demo instance:

CREATE TABLE trades (
  symbol SYMBOL CAPACITY 256 CACHE,
  side SYMBOL CAPACITY 256 CACHE,
  price DOUBLE,
  amount DOUBLE,
  timestamp TIMESTAMP
) TIMESTAMP (timestamp) PARTITION BY DAY

Using `EXPLAIN` for the plan for `SELECT`

The following query highlight the plan for ORDER BY for the table:

Explain Order ByDemo this query

EXPLAIN SELECT * FROM trades ORDER BY timestamp DESC;

PageFrame
    Row backward scan
    Frame backward scan on: trades

The plan shows that no sort is required and the result is produced by scanning the table backward. The scanning direction is possible because the data in the trades table is stored in timestamp order.

Now, let's check the plan for trades with a simple filter:

Explain Simple FilterDemo this query

EXPLAIN SELECT * FROM trades WHERE amount > 100.0;

Async JIT Filter workers: 47
  filter: 100.0<amount [pre-touch]
    PageFrame
        Row forward scan
        Frame forward scan on: trades

In this example, the plan shows that the trades table undergoes a full scan (PageFrame and subnodes) and the data is processed by the parallelized JIT-compiled filter.

Using `EXPLAIN` for the plan for `CREATE` and `INSERT`

Apart from SELECT, EXPLAIN also works on CREATE and INSERT statements. Single-row inserts are straightforward. The examples in this section show the plan for more complicated CREATE and INSERT queries.

Explain Create TableDemo this query
EXPLAIN CREATE TABLE trades AS
(
  SELECT
    rnd_symbol('a', 'b') symbol,
    rnd_symbol('Buy', 'Sell') side,
    rnd_double() price,
    rnd_double() amount,
    x::timestamp timestamp
  FROM long_sequence(10)
) TIMESTAMP(timestamp) PARTITION BY DAY;

Create table: trades
    VirtualRecord
      functions: [rnd_symbol([a,b]),rnd_symbol([Buy,Sell]),memoize(rnd_double()),memoize(rnd_double()),x::timestamp]
        long_sequence count: 10

The plan above shows that the data is fetched from a long_sequence cursor, with random data generating functions called in VirtualRecord.

The same applies to the following query:

Explain Insert Into
EXPLAIN INSERT INTO trades
  SELECT
    rnd_symbol('a', 'b') symbol,
    rnd_symbol('Buy', 'Sell') side,
    rnd_double() price,
    rnd_double() amount,
    x::timestamp timestamp
  FROM long_sequence(10);

Insert into table: trades
    VirtualRecord
      functions: [rnd_symbol([a,b]),rnd_symbol([Buy,Sell]),memoize(rnd_double()),memoize(rnd_double()),x::timestamp]
        long_sequence count: 10

Of course, statements could be much more complex than that. Consider the following UPDATE query:

EXPLAIN UPDATE trades SET amount = 0 WHERE timestamp IN '2022-11-11';

Update table: trades
  VirtualRecord
    functions: [0]
      PageFrame
        Row forward scan
        Interval forward scan on: trades
          intervals: [static=[1668124800000000,1668211199999999]

The important bit here is Interval forward scan. It means that the table is forward scanned only between points designated by the timestamp IN '2022-11-11' predicate, that is between 2022-11-11 00:00:00,000000 and 2022-11-11 23:59:59,999999 (shown as raw epoch micro values in the plan above). VirtualRecord is only used to pass 0 constant for each row coming from PageFrame.

Limitations:

To minimize resource usage, the EXPLAIN command does not execute the statement, to avoid paying a potentially large upfront cost for certain queries (especially those involving hash join or sort).

EXPLAIN provides a useful indication of the query execution, but it does not guarantee to show the actual execution plan. This is because elements determined during query runtime are missing.

While EXPLAIN shows the number of workers that could be used by a parallelized node it is only the upper limit. Depending on the data volume and system load, a query can use fewer workers.

note

Under the hood, the plan nodes are called Factories. Most plan nodes can be mapped to implementation by adding the RecordCursorFactory or FrameCursorFactory suffix, e.g.

PageFrame -> PageFrameRecordCursorFactory
Async JIT Filter -> AsyncJitFilteredRecordCursorFactory
SampleByFillNoneNotKeyed -> SampleByFillNoneNotKeyedRecordCursorFactory while some are a bit harder to identify, e.g.
GroupByRecord vectorized: false -> io.questdb.griffin.engine.groupby.GroupByRecordCursorFactory
GroupByRecord vectorized: true -> io.questdb.griffin.engine.groupby.vect.GroupByRecordCursorFactory

Other classes can be identified by searching for the node name in the toPlan() methods.

Syntax​

Description​

Examples​

Using EXPLAIN for the plan for SELECT​

Using EXPLAIN for the plan for CREATE and INSERT​

Limitations:​

See also​