Symbol
QuestDB introduces a data type called symbol; a data structure used to store
repetitive strings. Internally, symbol types are stored as a table of integers
and their corresponding string values.
This page presents the concept, optional setting, and their indication for
symbol types.
Advantages of symbol types
- Greatly improved query performance as string operations compare and write
inttypes instead ofvarchar. - Greatly improved storage efficiency as
intmaps tovarchartypes. - Unobtrusive to the user because SQL execution has the same result as handling string values.
- Reduced complexity of database schemas by removing the need for explicit additional tables or joins.
Properties
- Symbol tables are stored separately from column data.
- Fast conversion from
varchartointand vice-versa when reading or writing data. - Columns defined as
symboltypes support indexing. - By default, QuestDB caches
symboltypes in memory for improved query speed and InfluxDB Line Protocol ingestion speed. The setting is configurable.
Usage of symbols
Symbol columns
Columns can be specified as SYMBOL using
CREATE TABLE, similar to other types:
CREATE TABLE my_table
(symb SYMBOL CAPACITY 128 NOCACHE, price DOUBLE, ts TIMESTAMP)
timestamp(ts);
The following additional symbol settings are defined, either globally as part of the server configuration or locally when a table is created:
-
Symbol capacity: Optional setting used to indicate how many distinct values this column is expected to have. Based on the value used, the data structures will resize themselves when necessary, to allow QuestDB to function correctly. Underestimating the symbol value count may result in drop of performance whereas over-estimating may result in higher disk space and memory consumption. Symbol capacity is also used to set the initial symbol cache size when the cache is enabled.
- Server-wide setting:
cairo.default.symbol.capacitywith a default of256 - Column-wide setting: The
CAPACITYoption forCREATE TABLE
- Server-wide setting:
-
Cache: Optional setting specifying whether a symbol should be cached. When a
symbolcolumn is cached, QuestDB will use a Java heap-based hash table to resolve symbol values and keys. When a column has a large number of distinct symbol values (over 100,000, for example), the heap impact might be significant and may cause OutOfMemory errors, depending on the heap size. Not caching leverages a memory-mapped structure which can deal with larger value counts but is slower.- Server-wide setting:
cairo.default.symbol.cache.flagwith a default oftrue - Column-wide setting when a table is created: The
CACHE | NOCACHEkeyword forCREATE TABLE
- Server-wide setting:
Symbols for column indexing
Symbols may also be indexed for faster query execution. See
Index for more information.