代码之家 › 专栏 › 技术社区 › Lasit Pant

弹性搜索不能给出准确的结果

elastic-stack elasticsearch mongodb python-3.x python

Lasit Pant · 技术社区 · 7 年前

我正在使用匹配短语查询在ES中查找。但我注意到返回的结果不合适。代码--

      res = es.search(index=('indice_1'),

               body = {
    "_source":["content"],

    "query": {
        "match_phrase":{
        "content":"xyz abc"
        }}}

   ,
size=500,
scroll='60s')

它不能让我记录内容所在的位置- “嗨,我叫XYZ abc”和“嗨,是XYZ abc”。“生活如何”

2 回复 | 直到 7 年前

Tim 7 年前

如果没有指定分析器,则使用 standard 默认情况下。它将进行基于语法的标记化。所以你对短语“嗨,我的名字是XYZ abc”的称呼是 [hi, my, name, isxyz, abc] 和 match_phrase [xyz, abc] 相邻(除非您指定 slop ).

您可以使用其他分析器,也可以修改查询。如果你使用 match 查询时,它将匹配术语“abc”。如果希望短语匹配,则需要使用不同的分析器。 NGrams

举个例子:

PUT test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  }, 
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT test_index/_doc/1
{
  "content": "hi my name isxyz abc."
}

PUT test_index/_doc/2
{
  "content": "hey wassupxyz abc. how is life"
}

POST test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "content": "xyz abc"
    }
  }
}

结果找到了两份文件。

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "hey wassupxyz abc. how is life"
        }
      },
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "content": "hi my name isxyz abc."
        }
      }
    ]
  }
}

编辑: 如果你想做一个 wildcard 查询时,可以使用 标准 分析仪。您在注释中指定的用例将如下添加:

PUT test_index/_doc/3
{
  "content": "RegionLasit Pant0Q00B000001KBQ1SAO00"
}

POST test_index/_doc/_search
{
  "query": {
    "wildcard": {
      "content.keyword": {
        "value": "*Lasit Pant*"
      }
    }
  }
}

基本上,您是在不使用 nGram 分析仪。您的查询短语将是 "*<my search terms>*" nGrams .

Pratik Patel 7 年前

 res = es.search(index=('indice_1'),

               body = {
    "_source":["content"],

    "query": {
        "query":"xyz abc"
        },
        type:"phrase"}

   ,
size=500,
scroll='60s')

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

1 年前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

1 年前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

1 年前

user29715306 · from_users=和chats=电视节目中的差异

1 年前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

1 年前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

1 年前

prayner · 更新嵌套字典包含列表中的项

1 年前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

1 年前

Dave · 如何在for循环中修改列表值

1 年前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

1 年前