EXPLAIN keyword
EXPLAIN displays the execution plan of an INSERT, SELECT, or UPDATE
statement.
Syntax
Description
A query execution plan shows how a statement will be implemented: which table is
going to be accessed and how, what join method are employed, and which
predicates are JIT-compiled etc. EXPLAIN output is a tree of nodes containing
properties and subnodes (aka child nodes).
In a plan such as:
| QUERY PLAN |
|---|
| Async JIT Filter |
filter: 100 < l |
| workers: 1 |
| PageFrame |
| Row forward scan |
| Frame forward scan on: tab |
there are:
- 4 nodes:
- Async JIT Filter
- PageFrame
- Row forward scan
- Frame forward scan
- 2 properties (both belong to Async JIT Filter node):
- filter
- workers
For simplicity, some nodes have special properties shown on the same line as
type; for example, Filter filter: b.age=10 or Limit lo: 10.
The following list contains some plan node types:
Async Filter- a parallelized filter that evaluates expressions with Java code. In certain scenarios, it also implements theLIMITkeyword.Async JIT Filter- a parallelized filter that evaluates expressions with Just-In-Time-compiled filter. In certain scenarios, it also implements theLIMITkeyword.Interval forward- scans one or more table data ranges based on the designated timestamp predicates. Scan endpoints are found via a binary search on timestamp column.CachedWindow- container for window functions that copies data to memory and sorts it, e.g. row_number()Window- container for window functions optimized for frames ordered by designated timestamp. Instead of copying the underlying dataset to memory it buffers just enough per-partition values to compute function result.Count- returns the count of records in subnode.Cursor-order scan- scans table records using row ids taken from an index, in index order - first all row ids linked to index value A, then B, etc.PageFrame- full or partial table scan. It contains two children:- row cursor - which iterates over rows inside a frame (e.g.
Row forward scan). - frame cursor - which iterates over table partitions or partition chunks
(e.g.
Frame forward scan).
- row cursor - which iterates over rows inside a frame (e.g.
Filter- standalone (non-JIT-compiled, non-parallelized) filter.Frame forward/backward scan- scans table partitions in a specified direction.GroupBy- group by with or without key(s). Ifvectorizedfield showstrue, then the node is parallelized and uses vectorized calculations.Hash- subnode of this node is used to build a hash table that is later looked up (usually in aJOINclause but also applies toEXCEPTorINTERSECT).Index forward/backward scan- scans all row ids associated with a givensymbolvalue from start to finish or vice versa.Limit- standalone node implementing theLIMITkeyword. Other nodes can implementLIMITinternally, e.g. theSortnode.Row forward/backward scan- scans data frame (usually partitioned) records in a specified direction.Sort- sorts data. If low or hi property is specified, then the sort buffer size is limited and a number of rows are skipped after sorting.SampleBy-SAMPLE BYkeyword implementation. If thefillis not shown, it meansfill(none).Selected Record- used to reorder or rename columns. It does not do any significant processing on its own.Table-order scan- scans table records using row ids taken from an index in table (physical) order - from the lowest to highest row id.VirtualRecord- adds expressions to a subnode's columns.
Other node types should be easy to link to SQL and database concepts, e.g.
Except, Hash Join or Lt Join.
Many nodes, especially join and sort, have 'light' and 'heavy' variants, e.g.
Hash Join Light and Hash Join. The former is used when child node(s) support
efficient random access lookups (e.g. PageFrame) so storing row id in the
buffer is enough; otherwise, the whole record needs to be copied and the 'heavy'
factory is used.
Examples
To illustrate how EXPLAIN works, consider the trades table
in the QuestDB demo instance:
CREATE TABLE trades (
symbol SYMBOL CAPACITY 256 CACHE,
side SYMBOL CAPACITY 256 CACHE,
price DOUBLE,
amount DOUBLE,
timestamp TIMESTAMP
) TIMESTAMP (timestamp) PARTITION BY DAY
Using EXPLAIN for the plan for SELECT
The following query highlight the plan for ORDER BY for the table:
EXPLAIN SELECT * FROM trades ORDER BY timestamp DESC;
PageFrame
Row backward scan
Frame backward scan on: trades
The plan shows that no sort is required and the result is produced by scanning
the table backward. The scanning direction is possible because the data in the
trades table is stored in timestamp order.
Now, let's check the plan for trades with a simple filter:
EXPLAIN SELECT * FROM trades WHERE amount > 100.0;
Async JIT Filter workers: 47
filter: 100.0<amount [pre-touch]
PageFrame
Row forward scan
Frame forward scan on: trades
In this example, the plan shows that the trades table undergoes a full scan
(PageFrame and subnodes) and the data is processed by the parallelized
JIT-compiled filter.
Using EXPLAIN for the plan for CREATE and INSERT
Apart from SELECT, EXPLAIN also works on CREATE and INSERT statements.
Single-row inserts are straightforward. The examples in this section show the
plan for more complicated CREATE and INSERT queries.
EXPLAIN CREATE TABLE trades AS
(
SELECT
rnd_symbol('a', 'b') symbol,
rnd_symbol('Buy', 'Sell') side,
rnd_double() price,
rnd_double() amount,
x::timestamp timestamp
FROM long_sequence(10)
) TIMESTAMP(timestamp) PARTITION BY DAY;
Create table: trades
VirtualRecord
functions: [rnd_symbol([a,b]),rnd_symbol([Buy,Sell]),memoize(rnd_double()),memoize(rnd_double()),x::timestamp]
long_sequence count: 10
The plan above shows that the data is fetched from a long_sequence cursor,
with random data generating functions called in VirtualRecord.
The same applies to the following query:
EXPLAIN INSERT INTO trades
SELECT
rnd_symbol('a', 'b') symbol,
rnd_symbol('Buy', 'Sell') side,
rnd_double() price,
rnd_double() amount,
x::timestamp timestamp
FROM long_sequence(10);
Insert into table: trades
VirtualRecord
functions: [rnd_symbol([a,b]),rnd_symbol([Buy,Sell]),memoize(rnd_double()),memoize(rnd_double()),x::timestamp]
long_sequence count: 10
Of course, statements could be much more complex than that. Consider the
following UPDATE query:
EXPLAIN UPDATE trades SET amount = 0 WHERE timestamp IN '2022-11-11';
Update table: trades
VirtualRecord
functions: [0]
PageFrame
Row forward scan
Interval forward scan on: trades
intervals: [static=[1668124800000000,1668211199999999]
The important bit here is Interval forward scan. It means that the table is
forward scanned only between points designated by the
timestamp IN '2022-11-11' predicate, that is between
2022-11-11 00:00:00,000000 and 2022-11-11 23:59:59,999999 (shown as raw
epoch micro values in the plan above). VirtualRecord is only used to pass 0
constant for each row coming from PageFrame.
Limitations:
To minimize resource usage, the EXPLAIN command does not execute the
statement, to avoid paying a potentially large upfront cost for certain queries
(especially those involving hash join or sort).
EXPLAIN provides a useful indication of the query execution, but it does not
guarantee to show the actual execution plan. This is because elements determined
during query runtime are missing.
While EXPLAIN shows the number of workers that could be used by a parallelized
node it is only the upper limit. Depending on the data volume and system load, a
query can use fewer workers.
Under the hood, the plan nodes are called Factories. Most plan nodes can be
mapped to implementation by adding the RecordCursorFactory or
FrameCursorFactory suffix, e.g.
PageFrame->PageFrameRecordCursorFactoryAsync JIT Filter->AsyncJitFilteredRecordCursorFactorySampleByFillNoneNotKeyed->SampleByFillNoneNotKeyedRecordCursorFactorywhile some are a bit harder to identify, e.g.GroupByRecord vectorized: false->io.questdb.griffin.engine.groupby.GroupByRecordCursorFactoryGroupByRecord vectorized: true->io.questdb.griffin.engine.groupby.vect.GroupByRecordCursorFactory
Other classes can be identified by searching for the node name in the toPlan()
methods.
See also
This section includes links to additional information such as tutorials: