有几种不同的方法可以实现这一点,所有这些方法都基于相同的原则。您需要执行三次搜索:
-
一个没有行过滤器来计算出现的总数
-
其中一个在当前行之前带有筛选器,用于获取当前事件之前的记录计数
-
一个在当前行后按范围过滤以查找当前事件
这可以通过多重搜索、filter+top_hit聚合和filter+全局聚合来实现。以下是如何使用过滤器+全局聚合实现这一目标的示例:
DELETE test
PUT test
{
"mappings": {
"properties": {
"line_no": {
"type": "integer"
},
"line": {
"type": "text"
}
}
}
}
POST test/_bulk?refresh=true
{ "index": { "_id": "1" } }
{ "line_no": 54, "line": "content"}
{ "index": { "_id": "2" } }
{ "line_no": 55, "line": "content"}
{ "index": { "_id": "3" } }
{ "line_no": 56, "line": "content"}
{ "index": { "_id": "4" } }
{ "line_no": 57, "line": "content"}
{ "index": { "_id": "5" } }
{ "line_no": 58, "line": "foo"}
{ "index": { "_id": "6" } }
{ "line_no": 59, "line": "bar"}
{ "index": { "_id": "7" } }
{ "line_no": 60, "line": "baz"}
{ "index": { "_id": "8" } }
{ "line_no": 61, "line": "content"}
{ "index": { "_id": "9" } }
{ "line_no": 62, "line": "content"}
{ "index": { "_id": "10" } }
{ "line_no": 63, "line": "content"}
POST test/_search?filter_path=hits.hits,aggregations.all.all_occurrencess.doc_count,aggregations.all.all_occurrences.previous_occurrences.doc_count
{
"size": 1,
"query": {
"bool": {
"must": [
{
"range": {
"line_no": {
"gt": 59
}
}
},
{
"match": {
"line": "content"
}
}
]
}
},
"sort": [
{
"line_no": {
"order": "asc"
}
}
],
"aggs": {
"all": {
"global": {},
"aggs": {
"all_occurrences": {
"filter": {
"match": {
"line": "content"
}
},
"aggs": {
"previous_occurrences": {
"filter": {
"range": {
"line_no": {
"lte": 59
}
}
}
}
}
}
}
}
}
}
此查询的结果将是:
{
"hits": {
"hits": [
{
"_index": "test",
"_id": "8",
"_score": 1.3829923,
"_source": {
"line_no": 61,
"line": "content"
},
"sort": [
61
]
}
]
},
"aggregations": {
"all": {
"all_occurrences": {
"previous_occurrences": {
"doc_count": 4
}
}
}
}
}
在上面的结果中
hits.hits[0]
将表示第59行之后与您的查询匹配的下一行。这个
aggregations.all.all_occurrences.doc_count
将表示包含“内容”的行数(在您的理论示例中为300,但我将其减少到7,因为示例简洁)。最后
aggregations.all.all_occurrences.previous_occurrences.doc_count
表示在当前行之前发生的事件数。若要获得当前的出现次数,您需要将其添加1。