index <index_name>[:<parent index name>] {
...
}index <index_name> {
type = plain
path = /path/to/index
source = <source_name>
source = <another source_name>
[stored_fields = <comma separated list of full-text fields that should be stored>]
}index <index name> {
type = rt
path = /path/to/index
rt_field = <full-text field name>
rt_field = <another full-text field name>
[rt_attr_uint = <integer field name>]
[rt_attr_uint = <another integer field name, limit by N bits>:N]
[rt_attr_bigint = <bigint field name>]
[rt_attr_bigint = <another bigint field name>]
[rt_attr_multi = <multi-integer (MVA) field name>]
[rt_attr_multi = <another multi-integer (MVA) field name>]
[rt_attr_multi_64 = <multi-bigint (MVA) field name>]
[rt_attr_multi_64 = <another multi-bigint (MVA) field name>]
[rt_attr_float = <float field name>]
[rt_attr_float = <another float field name>]
[rt_attr_bool = <boolean field name>]
[rt_attr_bool = <another boolean field name>]
[rt_attr_string = <string field name>]
[rt_attr_string = <another string field name>]
[rt_attr_json = <json field name>]
[rt_attr_json = <another json field name>]
[rt_attr_timestamp = <timestamp field name>]
[rt_attr_timestamp = <another timestamp field name>]
[stored_fields = <comma separated list of full-text fields that should be stored>]
[rt_mem_limit = <RAM chunk max size, default 128M>]
}type = plain
type = rtIndex type: “plain” or “rt” (real-time)
Value: plain (default), rt
path = path/to/indexAbsolute or relative path without extension where to store the index or where to look for it
Value: path to the index, mandatory
stored_fields = title, contentBy default when an index is defined in a configuration file, full-text fields’ original content is not stored, but just indexed. If this option is set, values from the fields will be both indexed and stored.
Value: comma separated list of full-text fields that
should be stored. Default is empty (i.e. does not store original field
text) for Plain
mode, but is enabled for every field for RT
mode as long as it’s declared as just text.
Note, in case of a real-time index the fields listed in
stored_only_fields should be also declared as rt_field.
Note also, that you don’t need to list attributes in
stored_fields, since their original values are stored
anyway. stored_fields is only for full-text fields.
See also docstore_block_size, docstore_compression for document storage compression options.
CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)POST /cli -d "
CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)"
$params = [
'body' => [
'columns' => [
'title'=>['type'=>'text', 'options' => ['indexed', 'stored']],
'content'=>['type'=>'text', 'options' => ['indexed', 'stored']],
'name'=>['type'=>'text', 'options' => ['indexed']],
'price'=>['type'=>'float']
]
],
'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);utilsApi.sql('mode=raw&query=CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)')res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)');utilsApi.sql("mode=raw&query=CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)");index products {
stored_fields = title,content
type = rt
path = idx
rt_field = title
rt_field = content
rt_field = name
rt_attr_uint = price
}stored_only_fields = title,contentA list of fields that will be stored in the index but will be not
indexed. Similar to stored_fields
except when a field is specified in stored_only_fields it
is only stored, not indexed and can’t be searched with fulltext queries.
It can only be returned with search results.
Value: comma separated list of fields that should be stored only, not
indexed. Default is empty. Note, in case of a real-time index the fields
listed in stored_only_fields should be also declared as rt_field.
Note also, that you don’t need to list attributes in
stored_only_fields, since their original values are stored
anyway. If to compare stored_only_fields to string
attributes the former (stored field): * is stored on disk and doesn’t
require memory * is stored compressed * can be only fetched, you can’t
sort/filter/group by the value
The latter (string attribute) is: * stored on disk and in memory * stored uncompressed * can be used for sorting, grouping, filtering and anything else you want to do with attributes.
rt_field = subjectFull-text fields to be indexed. The names must be unique. The order
is preserved; and so field values in INSERT statements
without an explicit list of inserted columns will have to be in the same
order as configured.
Value: at least one full-text field should be specified in an index, multiple records allowed.
rt_attr_uint = gidUnsigned integer attribute declaration
Value: field_name or field_name:N, can be multiple records. N is the max number of bits to keep.
rt_attr_bigint = gidBIGINT attribute declaration
Value: field name, multiple records allowed
rt_attr_multi = tagsMulti-valued attribute (MVA) declaration. Declares the UNSIGNED INTEGER (unsigned 32-bit) MVA attribute. Multi-value (ie. there may be more than one such attribute declared), optional.
Value: field name, multiple records allowed.
rt_attr_multi_64 = wide_tagsMulti-valued attribute (MVA) declaration. Declares the BIGINT (signed 64-bit) MVA attribute. Multi-value (ie. there may be more than one such attribute declared), optional.
Value: field name, multiple records allowed.
rt_attr_float = lat
rt_attr_float = lonFloating point attribute declaration. Multi-value (an arbitrary number of attributes is allowed), optional. Declares a single precision, 32-bit IEEE 754 format float attribute.
Value: field name, multiple records allowed.
rt_attr_bool = availableBoolean attribute declaration. Multi-value (there might be multiple attributes declared), optional. Declares a 1-bit unsigned integer attribute.
Value: field name, multiple records allowed.
rt_attr_string = titleString attribute declaration. Multi-value (an arbitrary number of attributes is allowed), optional.
Value: field name, multiple records allowed.
rt_attr_json = propertiesJSON attribute declaration. Multi-value (ie. there may be more than one such attribute declared), optional.
Value: field name, multiple records allowed.
rt_attr_timestamp = date_addedTimestamp attribute declaration. Multi-value (an arbitrary number of attributes is allowed), optional.
Value: field name, multiple records allowed.
rt_mem_limit = 512MRAM chunk size limit. Optional, default is 128M.
RT index keeps some data in memory (“RAM chunk”) and also maintains a number of on-disk indexes (“disk chunks”). This directive lets you control the RAM chunk size. Once there’s too much data to keep in RAM, RT index will flush it to disk, activate a newly created disk chunk, and reset the RAM chunk.
The limit is pretty strict; RT index should never allocate more memory than it’s limited to. The memory is not preallocated either, hence, specifying 512 MB limit and only inserting 3 MB of data should result in allocating 3 MB, not 512 MB.
The RAM chunk should be sized depending on the size of the data, rate
of inserts/updates and hardware. A small rt_mem_limit and
frequent insert/updates can lead to creation of many disk chunks,
requiring more frequent optimizations of the index.
In RT mode the RAM chunk size limit can be changed using ALTER
TABLE . To set rt_mem_limit to 1 Gb for index ‘t’
run query ALTER TABLE t rt_mem_limit='1G'.
In plain mode rt_mem_limit can be changed using the
following steps:
rt_mem_limit value in configurationALTER TABLE <index_name> RECONFIGURErt_mem_limit).rt_mem_limit setting..ram
filert_mem_limit you have, the longer it will take to replay
the binlog on start to recover the RAM chunk.source = srcpart1
source = srcpart2
source = srcpart3Specifies document source to get documents from when the current index is indexed. There must be at least one source. The sources can be of different types (e.g. one - mysql, another - postgresql). Read more about indexing from external storages here
Value: name of the source to build the index from, mandatory. Can be multiple records.
killlist_target = main:klSets the index(es) that the kill-list will be applied to. Suppresses
matches in the targeted index that are updated or deleted in the current
index. In :kl mode the documents to suppress are taken from
the kill-list.
In :id mode all document ids from the current index are
suppressed in the targeted one. If neither is specified the both modes
take effect. Read
more about kill-lists here
Value: not specified (default), target_index_name:kl, target_index_name:id, target_index_name. Multiple values are allowed
columnar_attrs = id, attr1, attr2, attr3Specifies what attributes should be stored in the columnar storage instead of the default row-wise storage.
id is also supported.
CREATE TABLE [IF NOT EXISTS] name ( <field name> <field data type> [data type options] [, ...]) [table_options]Besides using CREATE TABLE via MySQL protocol using any
MySQL client you can also create a table via HTTP if you use the
/cli endpoint:
http[s]://manticore_host:port/cli
POST: CREATE TABLE [IF NOT EXISTS] name ( <field name> <field data type> [data type options] [, ...]) [table_options]
Read more about data types here.
| Type | Equivalent in a configuration file | Notes | Aliases |
|---|---|---|---|
| text | rt_field | Options: indexed, stored. Default - both. To keep text stored, but indexed specify “stored” only. To keep text indexed only specify only “indexed”. At least one “text” field should be specified in an index | string |
| integer | rt_attr_uint | integer | int, uint |
| bigint | rt_attr_bigint | big integer | |
| float | rt_attr_float | float | |
| multi | rt_attr_multi | multi-integer | |
| multi64 | rt_attr_multi_64 | multi-bigint | |
| bool | rt_attr_bool | boolean | |
| json | rt_attr_json | JSON | |
| string | rt_attr_string | string. Option: indexed - also index the strings in a full-text field with same name. | |
| timestamp | rt_attr_timestamp | timestamp | |
| bit(n) | rt_attr_uint field_name:N | N is the max number of bits to keep |
CREATE TABLE products (title text, price float) morphology='stem_en'creates table “products” with two fields: “title” (full-text) and “price” (float) and setting “morphology” with value “stem_en”
CREATE TABLE products (title text indexed, description text stored, author text, price float)creates table “products” with three fields: * field “title” - indexed, but not stored * field “description” - stored, but not indexed * field “author” - both stored and indexed
create table ... engine='columnar';
create table ... engine='rowwise';Changes default attribute
storage for all attributes in the index. Can be overridden by
specifying engine separately
for each attribute.
See columnar_attrs on how to enable columnar storage for a plain index.
Values: * columnar - enables columnar storage for all index attributes except for json * rowwise (default) - doesn’t change anything, i.e. makes Manticore use the traditional row-wise storage for the index
The following settings are similar for both real-time and plain index
in either mode: whether specified in a configuration file or online via
CREATE or ALTER command.
Manticore uses two access modes to read index data - seek+read and mmap.
In seek+read mode the server performs system call pread
to read document lists and keyword positions, i.e. *.spd
and *.spp files. Internal read buffers are used to optimize
reading. The size of these buffers can be tuned with options read_buffer_docs
and read_buffer_hits.
There is also option preopen
that allows to control how Manticore opens files at start.
In the mmap access mode the search server just maps index’s file into
memory with mmap system call and OS caches file contents by
itself. Options read_buffer_docs
and read_buffer_hits
have no effect for corresponding files in this mode. The mmap reader can
also lock index’s data in memory via mlock privileged call
which prevents swapping out the cached data to disk by OS.
To control what access mode will be used access_plain_attrs, access_blob_attrs, access_doclists and access_hitlists options are available with the following values:
| Value | Description |
|---|---|
| file | server reads index file from disk with seek+read using internal buffers on file access |
| mmap | server maps index file into memory and OS caches up its contents on file access |
| mmap_preread | server maps index file into memory and a background thread reads it once to warm up the cache |
| mlock | server maps index file into memory and then issues mlock system call to cache up the file contents and lock it into memory to prevent it being swapped out |
| Setting | Values | Description |
|---|---|---|
| access_plain_attrs | mmap, mmap_preread (default), mlock | controls how *.spa (plain attributes)
*.spe (skip lists) *.spi (word lists)
*.spt (lookups) *.spm (killed docs) will be
read |
| access_blob_attrs | mmap, mmap_preread (default), mlock | controls how *.spb (blob attributes) (string, mva and
json attributes) will be read |
| access_doclists | file (default), mmap, mlock | controls how *.spd (doc lists) data will be read |
| access_hitlists | file (default), mmap, mlock | controls how *.spp (hit lists) data will be read |
Here is a table which can help you select your desired mode:
| index part | keep it on disk | keep it in memory | cached in memory on server start | lock it in memory |
|---|---|---|---|---|
| plain attributes in row-wise (non-columnar) storage, skip lists, word lists, lookups, killed docs | mmap | mmap | mmap_preread (default) | mlock |
| row-wise string, multi-value attributes (MVA) and json attributes | mmap | mmap | mmap_preread (default) | mlock |
| columnar numeric, string and multi-value attributes | always | only by means of OS | no | not supported |
| doc lists | file (default) | mmap | no | mlock |
| hit lists | file (default) | mmap | no | mlock |
mlock for attributes and for
doclists/hitlistsmmap_prereadmlock, then your OS
will decide what should be in memory at any given moment of time
depending on what is read from disk more frequentlyaccess_doclists/access_hitlists=fileThe default mode is to: * mmap * preread non-columnar attributes * seek+read columnar attributes with no preread * seek+read doclists/hitlists with no preread
which provides decent search performance, optimal memory usage and faster searchd restart in most cases.
attr_update_reserve = 256kSets the space to be reserved for blob attribute updates. Optional,
default value is 128k. When blob attributes (multi-value attributes
(MVA), strings, JSON) are updated, their length may change. If the
updated string (or MVA or JSON) is shorter than the old one, it
overwrites the old one in the *.spb file. But if the
updated string is longer, the updates are written to the end of the
*.spb file. This file is memory mapped, that’s why resizing
it may be a rather slow process, depending on the OS implementation of
memory mapped files. To avoid frequent resizes, you can specify the
extra space to be reserved at the end of the .spb file by using this
setting.
Value: size, default 128k.
docstore_block_size = 32kSize of the block of documents used by document storage. Optional, default is 16kb. When stored_fields or stored_only_fields are specified, original document text is stored inside the index. To use less disk space, documents are compressed. To get more efficient disk access and better compression ratios on small documents, documents are concatenated into blocks. When indexing, documents are collected until their total size reaches the threshold. After that, this block of documents is compressed. This option can be used to get better compression ratio (by increasing block size) or to get faster access to document text (by decreasing block size).
Value: size, default 16k.
docstore_compression = lz4hcType of compression used to compress blocks of documents used by document storage. When stored_fields or stored_only_fields are specified, document storage stores compressed document blocks. ‘lz4’ has fast compression and decompression speeds, ‘lz4hc’ (high compression) has the same fast decompression but compression speed is traded for better compression ratio. ‘none’ disables compression.
Value: lz4 (default), lz4hc, none.
docstore_compression_level = 12Compression level in document storage when ‘lz4hc’ compression is used. When ‘lz4hc’ compression is used, compression level can be fine-tuned to get better performance or better compression ratio. Does not work with ‘lz4’ compression.
Value: 1-12 (default 9).
preopen = 1This option tells searchd that it should pre-open all index files on startup (or rotation) and keep them open while it runs. Currently, the default mode is not to pre-open the files. Pre-opened indexes take a few (currently 2) file descriptors per index. However, they save on per-query open() calls; and also they are invulnerable to subtle race conditions that may happen during index rotation under high load. On the other hand, when serving many indexes (100s to 1000s), it still might be desired to open them on per-query basis in order to save file descriptors
Value: 0 (default), 1.
read_buffer_docs = 1MPer-keyword read buffer size for document lists. The higher the value the higher per-query RAM use is, but possibly lower IO time
Value: size, default 256k, min 8k.
read_buffer_hits = 1MPer-keyword read buffer size for hit lists. The higher the value the higher per-query RAM use is, but possibly lower IO time
Value: size, default 256k, min 8k.
inplace_enable = {0|1}Whether to enable in-place index inversion. Optional, default is 0 (use separate temporary files).
inplace_enable greatly reduces indexing disk footprint
for a plain index, at a cost of slightly slower indexing (it uses around
2x less disk, but yields around 90-95% the original performance).
Indexing involves two major phases. The first phase collects,
processes, and partially sorts documents by keyword, and writes the
intermediate result to temporary files (.tmp*). The second phase fully
sorts the documents, and creates the final index files. Thus, rebuilding
a production index on the fly involves around 3x peak disk footprint:
1st copy for the intermediate temporary files, 2nd copy for newly
constructed copy, and 3rd copy for the old index that will be serving
production queries in the meantime. (Intermediate data is comparable in
size to the final index.) That might be too much disk footprint for big
data collections, and inplace_enable allows to reduce it.
When enabled, it reuses the temporary files, outputs the final data back
to them, and renames them on completion. However, this might require
additional temporary data chunk relocation, which is where the
performance impact comes from.
This directive does not affect searchd in any way, it only affects indexer.
index products {
inplace_enable = 1
path = products
source = src_base
}inplace_hit_gap = sizeIn-place inversion fine-tuning option. Controls preallocated hitlist gap size. Optional, default is 0.
This directive does not affect searchd in any way, it only affects indexer.
index products {
inplace_hit_gap = 1M
inplace_enable = 1
path = products
source = src_base
}inplace_reloc_factor = 0.1Controls relocation buffer size within indexing memory arena. Optional, default is 0.1.
This directive does not affect searchd in any way, it only affects indexer.
index products {
inplace_reloc_factor = 0.1
inplace_enable = 1
path = products
source = src_base
}inplace_write_factor = 0.1Controls in-place write buffer size within indexing memory arena. Optional, default is 0.1.
This directive does not affect searchd in any way, it only affects indexer.
index products {
inplace_write_factor = 0.1
inplace_enable = 1
path = products
source = src_base
}The following settings are supported. They are all described in section NLP and tokenization. * bigram_freq_words * bigram_index * blend_chars * blend_mode * charset_table * dict * embedded_limit * exceptions * expand_keywords * global_idf * hitless_words * html_index_attrs * html_remove_elements * html_strip * ignore_chars * index_exact_words * index_field_lengths * index_sp * index_token_filter * index_zones * infix_fields * killlist_target * max_substring_len * min_infix_len * min_prefix_len * min_stemming_len * min_word_len * morphology * morphology_skip_fields * ngram_chars * ngram_len * overshort_step * phrase_boundary * phrase_boundary_step * prefix_fields * regexp_filter * stopwords * stopword_step * stopwords_unstemmed * stored_fields * stored_only_fields * wordforms