如果没有指定分析器,则使用
standard
默认情况下。它将进行基于语法的标记化。所以你对短语“嗨,我的名字是XYZ abc”的称呼是
[hi, my, name, isxyz, abc]
和
match_phrase
[xyz, abc]
相邻(除非您指定
slop
).
您可以使用其他分析器,也可以修改查询。如果你使用
match
查询时,它将匹配术语“abc”。如果希望短语匹配,则需要使用不同的分析器。
NGrams
举个例子:
PUT test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
PUT test_index/_doc/1
{
"content": "hi my name isxyz abc."
}
PUT test_index/_doc/2
{
"content": "hey wassupxyz abc. how is life"
}
POST test_index/_doc/_search
{
"query": {
"match_phrase": {
"content": "xyz abc"
}
}
}
结果找到了两份文件。
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "test_index",
"_type": "_doc",
"_id": "2",
"_score": 0.5753642,
"_source": {
"content": "hey wassupxyz abc. how is life"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"content": "hi my name isxyz abc."
}
}
]
}
}
编辑:
如果你想做一个
wildcard
查询时,可以使用
标准
分析仪。您在注释中指定的用例将如下添加:
PUT test_index/_doc/3
{
"content": "RegionLasit Pant0Q00B000001KBQ1SAO00"
}
:
POST test_index/_doc/_search
{
"query": {
"wildcard": {
"content.keyword": {
"value": "*Lasit Pant*"
}
}
}
}
基本上,您是在不使用
nGram
分析仪。您的查询短语将是
"*<my search terms>*"
nGrams
.