Elastic Stack从入门到实践之Elasticsearch

Elasticsearch 入门

# /etc/sysctl.conf
sysctl -w vm.max_map_count=262144
sysctl -p

# /etc/sysctl.conf

sysctl -w vm.max_map_count=262144

sysctl -p

常见术语

文档 Document：用户存储在 es 中的数据文档（类似数据库中的行）
索引 Index：由具有相同字段的文档列表组成（类似数据库的表）
节点 Node：一个 Elasticsearch 的运行实例，是集群的构成单元
集群 Cluster：由一个或多个节点组成，对外提供服务

Document

Json Object，由字段（Field）组成，常见数据类型如下：

字符串：text, keyword
数值型：long, integer, short, byte, double, float, half_float, scaled_float
布尔：boolean
日期：date
二进制：binary
范围类型：integer_range, float_range, long_range, double_range, date_range

每个文档有唯一的id 标识

Document MetaData

元数据

_index：文档所在的索引名
_type：文档所在的类型名
_id：文档唯一 id
_uid：组合 id，由_type 和_id 组成（6.x _type 不再起作用，同_id 一样）
_source：文档的原始 Json 数据，可以从这里获取每个字段的内容
_all：整合所有字段内容到该字段，默认禁用

Index

索引中存储具有相同结构的文档（Document）
每个索引都有自己的 mapping 定义，用于定义字段名和类型
一个集群可以有多个索引，比如：
Nginx 日志在存储的时候可以按照日期每天生成一个索引来存储：
- nginx-log-01
- nginx-log-02

Rest API

Elasticsearch 集群对外提供 RESTful API

REST – REpresentational State Transfer
URI指定资源，如 Index, Document等
Http Method 指明资源操作类型，如 GET、POST、PUT、DELETE等

常用两种交互方式

Curl 命令行

curl -XPUT 'http://localhost:9200/employee/doc/1' -i -H "Content-Type:application/json" -d ' { "username": "xxxxx", "job": "xxxxx" } '

1
2
3
4
5
6

curl -XPUT 'http://localhost:9200/employee/doc/1' -i -H "Content-Type:application/json" -d '
{
"username": "xxxxx",
"job": "xxxxx"
}
'
Kibana DevTools

索引API

es 有专门的 Index API，用于创建、更新、删除索引配置等

创建索引 API：PUT /test_index (蓝色部分为索引名)

查看现有索引：GET _cat/indices

删除索引：DELETE /test_index

文档 Document API

es 有专门的 Document API

创建文档

指定 id 创建文档(新版中type 已被取消，可使用/test_index/_doc/1, /test_index/_create/1)

PUT /test_index/doc/1 { "username":"xxx", "age": xxx }

1
2
3
4
5

PUT /test_index/doc/1
{
"username":"xxx",
"age": xxx
}
不指定 id 创建文档 api（新版中为/test_index/_doc）：

POST /test_index/doc { "username":"xxx", "age": xxx }

1
2
3
4
5

POST /test_index/doc
{
"username":"xxx",
"age": xxx
}

批量创建文档(_bulk, index 与 create的区别是即使文档存在也不会报错，而是进行覆盖，新版中无需再指定_type)

POST _bulk
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"username": "xxx", "age": xxx}
{"delete":{"_index":"test_index","_type":"doc","_id":"1"}}
{"update":{"_id":"2","_index":"test_index","_type":"doc"}}
{"doc":{"age":"20"}}

POST _bulk

{"index":{"_index":"test_index","_type":"doc","_id":3}}

{"username": "xxx", "age": xxx}

{"delete":{"_index":"test_index","_type":"doc","_id":"1"}}

{"update":{"_id":"2","_index":"test_index","_type":"doc"}}

{"doc":{"age":"20"}}

查询文档

指定要查询的文档 id（同样在新版中使用_doc）

GET /test_index/doc/1

1

GET /test_index/doc/1
搜索所有文档，使用_search:

GET /test_index/doc/_search { "query":{ "term":{ "_id": "1" } } }

1
2
3
4
5
6
7
8

GET /test_index/doc/_search
{
  "query":{
    "term":{
      "_id": "1"
    }
  }
}

一次查询多个文档(_mget)

GET /_mget
{
  "docs":[
    {
      "_index":"test_index",
      "_type":"doc",
      "_id":"1"
    }
    {
      "_index":"test_index",
      "_type":"doc",
      "_id":"2"
    }
  ]
}

GET /_mget

{

"docs":[

{

"_index":"test_index",

"_type":"doc",

"_id":"1"

}

{

"_index":"test_index",

"_type":"doc",

"_id":"2"

}

]

}

更新文档

POST /test_index/_doc/2 { "username" : "xxx", "age" : xxx }

1
2
3
4
5

POST /test_index/_doc/2
{
"username" : "xxx",
"age" : xxx
}
删除文档

DELETE /test_index/_doc/1

1

DELETE /test_index/_doc/1

Elasticsearch倒排索引与分词

正排索引：文档 id 到文档内容、单词的关联关系（下图左侧）

倒排索引：单词到文档 id 的关联关系（下图右侧）

Elastic Stack从入门到实践之Elasticsearch

倒排索引组成

倒排索引是搜索引擎的核心，主要包含两部分：

单词词典（Term Dictionary）
- 记录所有文档的单词，一般都比较大
- 记录单词到倒排列表的关联信息
- https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
倒排列表（Posting List）

倒排列表（Posting ）主要包含如下信息：

文档 id，用于获取原始信息
单词频率（TF, Term Frequency），记录该单词在该文档中的出现次数，用于后续相关性算分
位置（Position），记录单词在文档中的分词位置（多个），用于做词语搜索（Phrase Query）
偏移（Offset），记录单词在文档的开始和结束位置，用于做高亮显示

es 存储的是一个 json 格式的文档，其中包含多个字段，每个字段会有自己的倒排索引

分词

分词是指将文本转换成一系列单词（term or token）的过程，也可以叫做文本分析，在 es 里称为 Analysis

分词器是 es 中专门处理分词的组件，英文为 Analyzer，组成如下（按从上到下的顺序）：

Character Filters：针对原始文本进行处理，比如去除 html 特殊标记符
Tokenizer：将原始文本按照一定规则切分为单词
Token Filters：针对 tokenizer 处理的单词进行再加工，比如转小写、删除（如 stop words）或新增（如近义词、同义词）等处理

Analyze API

es 提供了一个测试分词的 API 接口，方便验证分词效果，endpoint 是_analyze

可以直接指定 analyzer 进行测试

POST _analyze { "analyzer": "standard", "text": "hello world!" }

1
2
3
4
5

POST _analyze
{
"analyzer": "standard",
"text": "hello world!"
}
可以直接指定索引中的字段进行测试

POST test_index/_analyze { "field": "username", "text": "hello world!" }

1
2
3
4
5

POST test_index/_analyze
{
"field": "username",
"text": "hello world!"
}
可以自定义分词器进行测试

POST _analyze { "tokenizer": "standard", "filter": ["lowercase"], "text": "Hello World!" }

1
2
3
4
5
6

POST _analyze
{
  "tokenizer": "standard",
  "filter": ["lowercase"],
  "text": "Hello World!"
}

预定义分词器

es 自带分词器

Standard
- 默认分词器
- 按词切分，支持多语言
- 小写处理
Simple
- 按照非字母切分
- 小写处理
Whitespace
- 按照空格切分
Stop
- Stop Word 指语气助词等修饰性的词语，比如 the、an、的、这等等
- 相比 Simple Analyzer 多了 Stop Word 处理
Keyword
- 不分词，直接将输入作为一个单词输出
Pattern
- 通过正则表达式自定义分割符
- 默认是\W+，即非字词的符号作为分隔符
Language
- 提供了30+常见语言的分词器
- arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english…

中文分词

难点

中文分词指的是将一个汉字序列切分成一个一个单独的词。在英文中，单词之间是以空格作为自然分界符，汉语中词没有一个形式上的分界符。
上下文不同，分词结果迥异，比如交叉歧义问题，下面两种分词都合理：
- 乒乓球拍/卖/完了
- 乒乓球/拍卖/完了
https://mp.weixin.qq.com/s/uCpuJPQ6UDfPG44hiUlb3g

常用分词系统

IK
- 实现中英文单词的切分，支持 ik_smart、ik_maxword 等模式
- 可自定义词库，支持热更新分词词典
- https://github.com/medcl/elasticsearch-analysis-ik
jieba
- Python 中最流行的分词系统，支持分词和词性标注
- 支持繁体分词、自定义词典、并行分词等
- https://github.com/sing1ee/elasticsearch-jieba-plugin

基于自然语言处理的分词系统

HanLP
- 由一系列模型与算法组成的 Java 工具包，目标是普及自然语言处理在生产环境中的应用
- https://github.com/hankcs/HanLP
THULAC
- THU Lexical Analyzer for Chinese，由清华大学自然语言处理与社会人文计算实验室研制推出的一套中文词法分析工具包，具有中文分词和词性标功能
- https://github.com/microbun/elasticsearch-thulac-plugin

自定义分词

当自带的分词无法满足需求时，可以自定义分词

通过自定义 Character Filters、Tokenizer 和 Token Filter 实现

Character Filters

在 Tokenizer 之前对原始文本进行处理，例如增加、删除或替换字符等
自带的如下
- HTML Strip 去除 html 标签和转换 html 实体
- Mapping 进行字符替换操作
- Pattern Replace 进行正则匹配替换
会影响后续 tokenizer 解析的 position 和 offset 信息

POST _analyze
{
  "tokenizer": "keyword",
  "char_filter": ["html_strip"],
  "text": "<p>I&apos;m so <b>happy</b>!</p>"
}

POST _analyze

{

"tokenizer": "keyword",

"char_filter": ["html_strip"],

"text": "I'm so happy!"

}

Tokenizer

将原始文本按照一定规则切分为单词（term or token）

自带的如下：

standard 按照单词进行分割
letter 按照非字符类进行分割
whitespace 按照空格进行分割
UAX URL Email 按照 standard 分割，但不会分割邮箱和 URL
NGram 和 Edge NGram 连词分割
Path Hierarchy按照文件路径进行切割

POST _analyze { "tokenizer": "path_hierarchy", "text": "one/two/three" }

1
2
3
4
5

POST _analyze
{
"tokenizer": "path_hierarchy",
"text": "one/two/three"
}

Token Filters

对于 tokenizer 输出的单词（term）进行增加、删除、修改等操作

自带的如下：

lowercase 将所有 term 转换为小写
stop 删除 stop words
NGram 和 Edge NGram 连词分割
Synonym 添加近义词 term

POST _analyze
{
  "text": "a Hello,world!",
  "tokenizer": "standard",
  "filter": [
  "stop",
  "lowercase",
  {
    "type": "ngram",
    "min_gram": 4,
    "max_gram": 4
  }
  ]
}

POST _analyze

{

"text": "a Hello,world!",

"tokenizer": "standard",

"filter": [

"stop",

"lowercase",

{

"type": "ngram",

"min_gram": 4,

"max_gram": 4

}

]

}

自定义分词 API

自定义分词需要在索引的配置中设定

PUT test_index
{
  "settings": {
    "analysis": {
      "char_filter": {},
      "tokenizer": {},
      "filter": {},
      "analyzer": {}
    }
  }
}

PUT test_index

{

"settings": {

"analysis": {

"char_filter": {},

"tokenizer": {},

"filter": {},

"analyzer": {}

}

自定义分词验证

示例1

PUT test_index_1
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"  
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

POST test_index_1/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Is this <b>a box</b>?"
}

PUT test_index_1

{

"settings": {

"analysis": {

"analyzer": {

"my_custom_analyzer": {

"type": "custom",

"tokenizer": "standard",

"char_filter": [

"html_strip"

"filter": [

"lowercase",

"asciifolding"

]

}

POST test_index_1/_analyze

{

"analyzer": "my_custom_analyzer",

"text": "Is this a box?"

}

示例2

PUT test_index2
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer":{
          "type": "custom",
          "char_filter":[
            "emoticons"
          ],
          "tokenizer": "punctuation",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation":{
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons":{
          "type": "mapping",
          "mappings":[
            ":) => _happy_",
            ":( => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop":{
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST test_index2/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "I'm a :) person, and you?"
}

PUT test_index2

{

"settings": {

"analysis": {

"analyzer": {

"my_custom_analyzer":{

"type": "custom",

"char_filter":[

"emoticons"

"tokenizer": "punctuation",

"filter": [

"lowercase",

"english_stop"

]

}

"tokenizer": {

"punctuation":{

"type": "pattern",

"pattern": "[ .,!?]"

}

"char_filter": {

"emoticons":{

"type": "mapping",

"mappings":[

":) => _happy_",

":( => _sad_"

]

}

"filter": {

"english_stop":{

"type": "stop",

"stopwords": "_english_"

}

POST test_index2/_analyze

{

"analyzer": "my_custom_analyzer",

"text": "I'm a :) person, and you?"

}

分词使用说明

分词使用的两个时机：

创建或更新文档时（Index Time），会对相应的文档进行分词处理
查询时（Search Time），会对查询语句进行分词

索引时分词是通过配置 Index Mapping 中的每个字段的 analyzer 属性实现的，不指定分词时，使用默认 standard，如下：

PUT test_index
{
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "whitespace" 
        }
      }
    }
  }
}

PUT test_index

{

"mappings": {

"doc": {

"properties": {

"title": {

"type": "text",

"analyzer": "whitespace"

}

查询时分词的指定方式有如下几种：

查询时通过 analyzer 指定分词器
通过 index mapping 设置 search_analyzer 实现

POST test_index/_search
{
  "query": {
    "match": {
      "query": "hello",
      "analyzer": "standard"
    }
  }
}

POST test_index/_search

{

"query": {

"match": {

"query": "hello",

"analyzer": "standard"

}

一般不需要特别指定查询时分词器，直接使用索引时分词器即可，否则会出现与索引无法匹配的情况

分词的使用建议

明确字段是否需要分词，不需要分词的字段将 type 设置为 keyword，可以节省空间、提高写性能
善用_analyze API，查看文档的具体分词结果
动手测试

Mapping 设置

Mapping类似数据库中的表结构定义，主要作用如下：

定义 Index 下的字段名（Field Name）
定义字段的类型，比如数值型、字符串型、布尔型等
定义倒排索引相关的配置，比如是否索引、记录 position 等

GET /test_index/_mapping

1	GET /test_index/_mapping

自定义 Mapping

PUT my_index
{
  "mappings": {
    "dynamic": false,
    "properties": {
      "title":{
        "type": "text"
      },
      "name":{
        "type": "keyword"
      },
      "age":{
        "type": "integer"
      }
    }
  }
}

PUT my_index

{

"mappings": {

"dynamic": false,

"properties": {

"title":{

"type": "text"

"name":{

"type": "keyword"

"age":{

"type": "integer"

}

Mapping 中的字段类型一旦设定后，禁止直接修改，原因是 Lucene 实现的倒排索引生成后不允许修改。修改需重新建立新的索引，然后做 reindex 操作。

允许新增字段，通过 dynamic 参数来控制字段的新增

true（默认）允许自动新增字段
false 不允许自动新增字段，但是文档可以正常写入，但无法对字段进行查询操作
strict 文档不能写入，报错

copy_to

将该字段的值复制到目标字段，实现类似_all 的作用
不会出现在_source 中，只用来搜索

DELETE my_index

PUT my_index
{
  "mappings": {
    "properties": {
      "first_name":{
        "type": "text",
        "copy_to": "full_name"
      },
      "last_name":{
        "type": "text",
        "copy_to": "full_name"
      },
      "full_name":{
        "type": "text"
      }
    }
  }
}

DELETE my_index

PUT my_index

{

"mappings": {

"properties": {

"first_name":{

"type": "text",

"copy_to": "full_name"

"last_name":{

"type": "text",

"copy_to": "full_name"

"full_name":{

"type": "text"

}

index

控制当前字段是否索引，默认为 true，即记录索引，false 不记录，即不可搜索

PUT my_index
{
  "mappings": {
    "properties": {
      "cookie":{
        "type": "text",
        "index": false
      }
    }
  }
}

PUT my_index

{

"mappings": {

"properties": {

"cookie":{

"type": "text",

"index": false

}

index_options

index_options 用于控制倒排索引记录的内容，有如下4种配置
- docs 只记录 doc id
- freqs 记录 doc id 和 term frequencies
- positions 记录 doc id、term frequencies 和 term position
- offsets 记录 doc id、term frequencies、term position 和 character offsets
text 类型默认配置为 positions，其他默认为 docs
记录内容越多，占用空间越大

PUT my_index
{
  "mappings": {
    "properties":{
      "cookie":{
        "type":"text",
        "index_options": "offsets"
      }
    }
  }
}

PUT my_index

{

"mappings": {

"properties":{

"cookie":{

"type":"text",

"index_options": "offsets"

}

null_value

当字段遇到 null 值时的处理策略，默认为 null，即空值，此时 es 会忽略该值。可以通过设定该值设定字段的默认值

{
  "mappings": {
    "properties":{
      "status_code":{
        "type": "keyword",
        "null_value": "NULL"
      }
    }
  }
}

{

"mappings": {

"properties":{

"status_code":{

"type": "keyword",

"null_value": "NULL"

}

有关 mapping 更多参数请见官方文档

数据类型

核心数据类型

字符串型 text、keyword
数值型 long, integer, short, byte, double, float, half_float, scaled_float
日期类型 date
布尔类型 boolean
二进制类型 binary
范围类型 integer_range, float_range, long_range, double_range, date_range

复杂数据类型

数组类型 array
对象类型 object
嵌套类型 nested object

地理位置数据类型

geo_point
geo_shape

专用类型

记录 ip 地址 ip
实现自动补全 completion
记录分词数 token_count
记录字符串 hash 值 murmur3
percolator
join

多字段特性 multi-fields

允许对同一字段采用不同的配置，比如分词，常见例子如对人名实现拼音搜索，只需要在人名中新增一个子字段为 pinyin 即可

PUT my_index
{
  "mappings": {
    "properties":{
      "username":{
        "type": "text",
        "fields": {
          "pinyin":{
            "type": "text",
            "analyzer": "pinyin"
          }
        }
      }
    }
  }
}

PUT my_index

{

"mappings": {

"properties":{

"username":{

"type": "text",

"fields": {

"pinyin":{

"type": "text",

"analyzer": "pinyin"

}

更多有关数据类型的介绍请见官方文档

Dynamic Mapping

es 可以自动识别文档字段类型，从而降低用户使用成本

PUT /test_index/_doc/1
{
  "username":"alfred",
  "age":1
}

GET /test_index/_mapping

PUT /test_index/_doc/1

{

"username":"alfred",

"age":1

}

GET /test_index/_mapping

es 是依靠 JSON 文档的字段类型来实现自动识别字段类型，支持的类型如下：

PUT /test_index/_doc/1
{
  "username":"alfred",
  "age":14,
  "birth": "1988-10-10",
  "married": false,
  "year": "18",
  "tags": ["boy", "fashion"],
  "money": 100.1
}

PUT /test_index/_doc/1

{

"username":"alfred",

"age":14,

"birth": "1988-10-10",

"married": false,

"year": "18",

"tags": ["boy", "fashion"],

"money": 100.1

}

日期的自动识别可以自行配置日期格式，以满足各种需求

默认是[“strict_date_optional_time”,”yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z”]
strict_date_optional_time是 ISO datetime 的格式，完整格式类似下面：
- YYYY-MM-DDThh:mm:ssTZD(eg 1997-07-16T19:20:30+01:00)
dynamic_date_formats 可以自定义日期类型
date_detection 可以关闭日期自动识别的机制

PUT my_index
{
  "mappings": {
    "dynamic_date_formats": ["MM/dd/yyyy"]
  }
}

PUT my_index/_doc/1
{
  "create_date": "09/22/2015"
}

PUT my_index
{
  "mappings": {
    "date_detection": false
  }
}

PUT my_index

{

"mappings": {

"dynamic_date_formats": ["MM/dd/yyyy"]

}

PUT my_index/_doc/1

{

"create_date": "09/22/2015"

}

PUT my_index

{

"mappings": {

"date_detection": false

}

字符串是数字时，默认不会自动识别为整型，因为字符串中出现数字是完全合理的

numeric_detection 可以开启字符串中数字的自动识别

PUT my_index
{
  "mappings": {
    "numeric_detection": true
  }
}

PUT my_index/_doc/1
{
  "my_float": "1.0",
  "my_integer": "1"
}

GET my_index/_mapping

PUT my_index

{

"mappings": {

"numeric_detection": true

}

PUT my_index/_doc/1

{

"my_float": "1.0",

"my_integer": "1"

}

GET my_index/_mapping

Dynamic Templates

允许根据 es 自动识别的数据类型、字段名等来动态设定字段类型，可以实现如下效果

所有字符串类型都设定为 keyword 类型，即默认不分词

所有以 message 开头的字段都设定为text 类型，即分词

DELETE my_index


PUT my_index
{
  "mappings": {
    "dynamic_templates":[
    {
      "message_as_text":{
        "match_mapping_type": "string",
        "match": "message*",
        "mapping": {
          "type": "text"
        }
      }
    },
    {
      "string_as_keywords": {
        "match_mapping_type": "string",
        "mapping": {
          "type": "keyword"
        }
      }
    }
    ]
  }
}


PUT my_index/_doc/1
{
  "name": "alfred",
  "message": "handsome boy"
}

GET my_index/_mapping

DELETE my_index

PUT my_index

{

"mappings": {

"dynamic_templates":[

{

"message_as_text":{

"match_mapping_type": "string",

"match": "message*",

"mapping": {

"type": "text"

}

{

"string_as_keywords": {

"match_mapping_type": "string",

"mapping": {

"type": "keyword"

}

]

}

PUT my_index/_doc/1

{

"name": "alfred",

"message": "handsome boy"

}

GET my_index/_mapping

所有以 long_开头的字段都设定为 long 类型

所有自动匹配为 double 类型的都设定为 float 类型，以节省空间

PUT my_index
{
  "mappings": {
    "dynamic_templates":[
    {
      "double_as_float":{
        "match_mapping_type": "double",
        "mapping": {
          "type": "float"
        }
      }
    }
    ]
  }
}

PUT my_index

{

"mappings": {

"dynamic_templates":[

{

"double_as_float":{

"match_mapping_type": "double",

"mapping": {

"type": "float"

}

]

}

匹配规则一般有如下几个参数：

match_mapping_type 匹配 es 自动识别的字段类型，如 boolean,long,string 等
match, unmatch 匹配字段名
path_match, path_unmatch 匹配路径

自定义 Mapping 的操作步骤如下：

写入一条文档到 es 的临时索引中，获取 es 自动生成的 mapping
修改步骤1得到的 mapping，自定义相关配置
使用步骤2的 mapping 创建实际所需索引

索引模板，英文为 Index Template，主要用于在新建索引时自动应用预先设定的配置，简化索引创建的操作步骤

可以设定索引的配置和 mapping
可以有多个模板，根据 order 设置，order 在的覆盖小的配置

PUT _template/test_template
{
  "index_patterns": ["te*", "bar*"],
  "order": 0,
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "_source": {
      "enabled": true
    },
    "properties": {
      "name":{
        "type": "keyword"
      }
    }
  }
}

GET _template
GET _template/test_template
DELETE _template/test_template

PUT _template/test_template

{

"index_patterns": ["te*", "bar*"],

"order": 0,

"settings": {

"number_of_shards": 1

"mappings": {

"_source": {

"enabled": true

"properties": {

"name":{

"type": "keyword"

}

GET _template

GET _template/test_template

DELETE _template/test_template

Search API

实现对 es中存储的数据进行查询分析，endpoint 为_search，如下所示：

GET /_search
GET /my_index/_search
GET /my_index1,my_index2/_search
GET /my_*/_search

GET /_search

GET /my_index/_search

GET /my_index1,my_index2/_search

GET /my_*/_search

查询主要有两种形式

URI Search
- 操作简便，方便对完命令行测试
- 仅包含部分查询语法
  
  GET /my_index/_search?q=user:alfred
  
  1
  
  GET /my_index/_search?q=user:alfred
Request Body Search
- es 提供的完备查询语法 Query DSL(Domain Specific Language)
  
  GET /my_index/_search { "query": { "term": {"user": "alfred"} } }
  
  1
  2
  3
  4
  5
  6
  
  GET /my_index/_search
  {
    "query": {
      "term": {"user": "alfred"}
    }
  }

https://github.com/mobz/elasticsearch-head

https://github.com/medcl/elasticsearch-analysis-ik

常见问题

1、You are not authorized to access Ingest Manager. Ingest Manager requires superuser privileges.

# vi config/elasticsearch.yml
xpack.security.enabled: true
# 设置密码
bin/elasticsearch-setup-passwords interactive

# vi config/kibana.yml
elasticsearch.username: "kibana_system"
elasticsearch.password: "xxxxxx"
xpack.security.encryptionKey: "something_at_least_32_characters"
# bin/kibana 启动服务

# vi config/elasticsearch.yml

xpack.security.enabled: true

# 设置密码

bin/elasticsearch-setup-passwords interactive

# vi config/kibana.yml

elasticsearch.username: "kibana_system"

elasticsearch.password: "xxxxxx"

xpack.security.encryptionKey: "something_at_least_32_characters"

# bin/kibana 启动服务

此时使用默认超级用户 elastic进行登录

2、跨域配置

# vi config/elasticsearch.yml
http.cors.enabled: true
http.cors.allow-origin: "*"

# vi config/elasticsearch.yml

http.cors.enabled: true

http.cors.allow-origin: "*"

Elasticsearch 入门

常见术语

Document

Document MetaData

Index

Rest API

索引API

文档 Document API

Elasticsearch倒排索引与分词

倒排索引组成

分词

Analyze API

预定义分词器

中文分词

自定义分词

Character Filters

Tokenizer

Token Filters

自定义分词 API

分词使用说明

Mapping 设置

自定义 Mapping

copy_to

index

index_options

null_value

数据类型

Dynamic Mapping

Dynamic Templates

Search API

常见问题

Hi，您需要填写昵称和邮箱！