Binary logs are essentially a recovery mechanism for Real-Time
index data and also of attributes updates of plain indexes that would
otherwise only be stored in RAM until flush. With binary logs enabled,
searchd writes every given transaction to the binlog file,
and uses that for recovery after an unclean shutdown. On clean shutdown,
RAM chunks are saved to disk, and then all the binlog files are
unlinked.
Binary logging is enabled by default. The default location for
binlog.* files in Linux is
/var/lib/manticore/data/. In RT
mode the binary logs are saved in the data_dir folder,
unless specifed differently.
Binary logging can be disabled by setting binlog_path to
empty:
searchd {
...
binlog_path = # disable logging
...Disabling binary logging improves performance for Real-Time indexes, but puts their data at risk.
The directive can be used to set a custom path:
searchd {
...
binlog_path = /var/data
...When logging is enabled, every transaction committed into RT index gets written into a log file. Logs are then automatically replayed on startup after an unclean shutdown, recovering the logged changes.
During normal operation, a new binlog file will be opened every time
when binlog_max_log_size limit is reached. Older, already
closed binlog files are kept until all of the transactions stored in
them (from all indexes) are flushed as a disk chunk. Setting the limit
to 0 pretty much prevents binlog from being unlinked at all while
searchd is running; however, it will still be unlinked on
clean shutdown. By default, there is no limit of the log file size.
binlog_max_log_size = 16MThere are 3 different binlog flushing strategies, controlled by
directive binlog_flush:
Default mode is flush every transaction, sync every second (mode 2).
searchd {
...
binlog_flush = 1 # ultimate safety, low speed
...
}On recovery after an unclean shutdown, binlogs are replayed and all logged transactions since the last good on-disk state are restored. Transactions are checksummed so in case of binlog file corruption garbage data will not be replayed; such a broken transaction will be detected and will stop replay. Transactions also start with a magic marker and timestamped, so in case of binlog damage in the middle of the file, it is technically possible to skip broken transactions and keep replaying from the next good one, and/or it is possible to replay transactions until a given timestamp (point-in-time recovery), but none of that is implemented yet.
Intensive updating of a small RT index that fully fits into a RAM
chunk will lead to an ever-growing binlog that can never be unlinked
until clean shutdown. Binlogs are essentially append-only deltas against
the last known good saved state on disk, and unless RAM chunk gets
saved, they can not be unlinked. An ever-growing binlog is not very good
for disk use and crash recovery time. To avoid this, you can configure
searchd to perform a periodic RAM chunk flush to fix that
problem using rt_flush_perioddirective. With periodic
flushes enabled, searchd will keep a separate thread,
checking whether RT indexes RAM chunks need to be written back to disk.
Once that happens, the respective binlogs can be (and are) safely
unlinked.
searchd {
...
rt_flush_period = 3600 # 1 hour
...
}By default the RT flush period is set to 10 hours.
Note that rt_flush_period only controls the frequency at
which the checks happen. There are no guarantees that
the particular RAM chunk will get saved. For instance, it does not make
sense to regularly re-save a huge RAM chunk that only gets a few rows
worth of updates. The search server determine whether to actually
perform the flush with a few heuristics.