代码之家 › 专栏 › 技术社区 › Dekel

在elasticsearch/kibana脚本字段中将IP(字符串)转换为long

elasticsearch-painless elasticsearch

Dekel · 技术社区 · 7 年前

我正在尝试使用脚本化的字段来使用无痛语言,以便添加一个新字段(origing\u ip\u calc)来具有所述IPv4的int(long)表示。

下面的脚本在groovy中工作(据我所知,这应该基本上是一样的),但在这个特定的情况下似乎几乎不是这样。

âString[] ipAddressInArray = "1.2.3.4".split("\\.");

long result = 0;
for (int i = 0; i < ipAddressInArray.length; i++) {
    int power = 3 - i;
    int ip = Integer.parseInt(ipAddressInArray[i]);
    long longIP = (ip * Math.pow(256, power)).toLong();
    result = result + longIP;
}
return result;

我也在看 this question

也尝试过使用InetAddress,但没有成功。

1 回复 | 直到 7 年前

Nikolay Vasiliev 5 年前

使用Elasticsearch无痛脚本,您可以使用以下代码:

POST ip_search/doc/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "originating_ip_calc": {
      "script": {
        "source": """
String ip_addr = params['_source']['originating_ip'];
def ip_chars = ip_addr.toCharArray();
int chars_len = ip_chars.length;
long result = 0;
int cur_power = 0;
int last_dot = chars_len;
for(int i = chars_len -1; i>=-1; i--) {
  if (i == -1 || ip_chars[i] == (char) '.' ){
    result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power));
    last_dot = i;
    cur_power += 1;
  }
}         
return result
""",
        "lang": "painless"
      }
    }
  },
  "_source": ["originating_ip"]
}

(请注意,我使用 Kibana console 要将请求发送到ES,它会在发送之前进行一些转义,使其成为有效的JSON。)

"hits": [
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "2",
    "_score": 1,
    "_source": {
      "originating_ip": "10.0.0.1"
    },
    "fields": {
      "originating_ip_calc": [
        167772161
      ]
    }
  },
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "1",
    "_score": 1,
    "_source": {
      "originating_ip": "1.2.3.4"
    },
    "fields": {
      "originating_ip_calc": [
        16909060
      ]
    }
  }
]

但为什么一定要这样呢?

为什么这种方法 `.split`

如果您将问题中的代码发送到ES,则它会返回如下错误:

      "script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n    int power = 3 - i;\n    int ip = Integer.parseInt(ipAddressInArray[i]);\n    long longIP = (ip * Math.pow(256, power)).toLong();\n    result = result + longIP;\n}\nreturn result;",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown call [split] with [1] arguments on type [String]."

这主要是由于Java的 String.split() is not considered safe to use (因为它隐式地创建regex模式)。他们建议使用 Pattern#split 但要做到这一点,您应该在索引中启用正则表达式。

      "script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");...
      "lang": "painless",
      "caused_by": {
        "type": "illegal_state_exception",
        "reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops."

为什么我们要做一个明确的演员 `(char) '.'`

所以,我们必须手动拆分点上的字符串。简单的方法是将字符串的每个字符与 '.' (在Java中是指 char String ).

但对于 painless 字符串 (因为我们正在迭代一个字符数组)。

为什么我们要直接使用char数组?

因为显然 无痛的 不允许 .length 方法也:

    "reason": {
      "type": "script_exception",
      "reason": "compile error",
      "script_stack": [
        "\"1.2.3.4\".length",
        "         ^---- HERE"
      ],
      "script": "\"1.2.3.4\".length",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown field [length] for type [String]."
      }
    }

`无痛的` ?

尽管我在谷歌上快速搜索后找不到任何关于这个名字的历史记录,但从 documentation page 用于生产 .

Groovy security vulnerabilities . 因此Elasticsearch团队创建了一个非常有限的Java/Groovy脚本子集,该脚本将具有可预测的性能,并且不包含这些安全漏洞,并将其称为 无痛的 .

如果有什么是真的脚本语言,是吗 有限的 .

在elasticsearch/kibana脚本字段中将IP(字符串)转换为long

为什么这种方法 .split

为什么我们要做一个明确的演员 (char) '.'

为什么我们要直接使用char数组?

无痛的 ?

为什么这种方法 `.split`

为什么我们要做一个明确的演员 `(char) '.'`

`无痛的` ?