fragment and copy
An index consists of multiple slices, each containing part of the document set. The replica is mainly designed to enhance fault tolerance and increase throughput.
The number of copies can be adjusted at any time, but the number of fragments can not be modified after the number is set.
type determination mechanism
The number type and date type can be set by automatic detection, which is usually opened by default.
dynamics_data_formats:Settings can be identified as a list of date formats.
You can also disable type guessing, and when type guessing is disabled, you can’t arbitrarily add fields to an existing index. Commands such as:
"mappings":{
"map":{
"dynamic":"false",
"properities":{
"a":"b"
.
.
.
}
}
}
Forget it, there seems to be some way out.
value typeUnique properties:
- precision_step:This property is set to the number of items generated by each value of the field. The lower the value is, the more items are generated, the faster the range query is (but the index is also larger), and the default value is 4.
- ignore_malformed:The value can be true or false, and the default value is false. If you want to ignore the formatted value, you should set it to true.
Binary type
Binary fields refer to the use of Base64 to represent the binary data stored in the index, which can be used to store binary data, such as images. By default, the fields of this type are stored only without index. Binary type only supports index_name attribute.
Date typeUnique attributes
- format:Used to specify the date format. The default value is dateOperationalTime.
- precision_step:This property is set to the number of items generated by each value of the field. The lower the value is, the more items are generated, the faster the range query is (but the index is also larger), and the default value is 4.
- ignore_malformed:The value can be true or false, and the default value is false. If you want to ignore the formatted value, you should set it to true.
Date type is saved by default using UTC.
Custom analyzer
"settings": {
"index": {
"analysis": {
"analyzer": {
"en": {
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter": {
"ourEnglishFilter": {
"type": "kstem"
}
}
}
}
}
"settings": {
"index": {
"analysis": {
"analyzer": {
"en": {
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter": {
"ourEnglishFilter": {
"type": "kstem"
}
}
}
}
}
Specify a parser named en consisting of a word segmentation and multiple filters.
The mapping file using custom parser is as follows:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"en":{
"tokenizer":"standard",
"filter":[
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter":{
"ourEnglishFilter":{
"type":"kstem"
}
}
}
}
},
"mappings":{
"post":{
"properties":{
"id":{
"type":"long",
"store":"yes",
"precision_step":"0"
},
"name":{
"type":"string",
"store":"yes",
"index":"analyzed",
"analyzer":"en"
}
}
}
}
}
By adding _analyze in the request, you can show the working process of the parser.
similarity model
elasticsearchDifferent similarity models can be used for each field. The alternative similarity model is as follows:
- BM25Model: probability based model. This model is suitable for dealing with short text documents. Using this model, you need to set the similarity property of the field to be BM25, and do not need to set additional attributes.
- Random Deviation Model (DFR): Based on a probability model with the same name, it is suitable for processing text of natural language classes. When using this model, the property value of the setting field is DFR.
- Information base model (IB): very similar to DFR, it also applies to natural language class texts.
You need to set some additional attributes when using DFR and IB models.。Here is an example of a IB model:
"similarity":{
"esserverbook_ib_similarity":{
"type":"IB",
"distribution":"ll",
"lambda":"df",
"normalization":"z",
"normalization.z.z":"0.25"
}
}
See the official document for the scope of the property.
Set the path property to specify that the identifier is retrieved from the book_id field (which is slightly slower due to extra parsing), where the value of the _id field is the value of the book_id field.
route introduction
By default, elasticsearch allocates documents evenly across all index fragments. In order to obtain documents, elasticsearch must query all fragments and merge results. Data can be divided according to certain criteria, that is, using routing strategy to control index and search.Ask for speed.
Routing settings are built using routing parameters or routing fields, and routing fields are more common and flexible. see_routingField introduction.