ALTER TABLE RESUME WAL
Restarts transactions of a WAL table after recovery from errors.
Syntax
Description
sequencerTxn is the unique txn identification that the Sequencer issues to
transactions.
When sequencerTxn is not specified, the operation resumes the WAL apply job
from the next uncommitted transaction, including the failed one.
When sequencerTxn is specified, the operation resumes the WAL apply job
from the provided sequencerTxn number explicitly.
ALTER TABLE RESUME WAL is used to restart WAL table transactions after
resolving errors. When transactions are stopped, the suspended status from the
wal_tables() function is marked
as true, and the sequencerTxn value indicates the last successful commit in
the Sequencer. Once the error is resolved, ALTER TABLE RESUME WAL restarts the
suspended WAL transactions from the failed transaction. Alternatively, an
optional sequencerTxn value can be provided to skip the failed transaction.
Examples
Using the wal_tables() function
to investigate the table status:
wal_tables();
| name | suspended | writerTxn | sequencerTxn |
|---|---|---|---|
| trades | true | 3 | 5 |
The table trades is suspended. The last successful commit in the table is
3.
The following query restarts transactions from the failed transaction, 4:
ALTER TABLE trades RESUME WAL;
Alternatively, specifying the sequencerTxn to skip the failed commit (4 in
this case):
ALTER TABLE trades RESUME WAL FROM TRANSACTION 5;
-- This is equivalent to
ALTER TABLE trades RESUME WAL FROM TXN 5;
Diagnosing corrupted WAL transactions
If you have data deduplication enabled on your tables and you have access to the original events (for instance, they're stored in Apache Kafka, or other replayable source), you may reingest the data after skipping the problematic transactions.
Sometimes a table may get suspended due to full disk or kernel limits. In this case, an entire WAL segment may be corrupted. This means that there will be multiple transactions that rely on the corrupted segment, and finding the transaction number to resume from may be difficult.
When you run RESUME WAL on such suspended table, you may see an error like this:
2024-07-10T01:01:01.131720Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=trades~3, error=could not open read-only [file=/home/my_user/.questdb/db/trades~3/wal45/101/_event], errno=2]
In such a case, you should try skipping all transactions that rely on the corrupted WAL segment. To do that, first you need to find the last applied transaction number for the trades table:
SELECT writerTxn
FROM wal_tables()
WHERE name = 'trades';
| writerTxn |
|---|
| 1223 |
Next, query the problematic transaction number:
SELECT max(sequencertxn)
FROM wal_transactions('trades')
WHERE sequencertxn > 1223
AND walId = 45
AND segmentId = 101;
Here, 1223 stands for the last applied transaction number, 45 stands for the WAL ID that may be seen in the error log above (trades~3/wal45), and 101 stands for the WAL segment ID from the log (trades~3/wal45/101).
| max |
|---|
| 1242 |
Since the last problematic transaction is 1242, you can resume the table from transaction 1243:
ALTER TABLE trades RESUME WAL FROM TXN 1243;
Note that in rare cases, subsequent transactions may also have corrupted WAL segments, so you may have to repeat this process.