代码之家  ›  专栏  ›  技术社区  ›  Mark Miller

结果中包含的最小Solr分数?

  •  0
  • Mark Miller  · 技术社区  · 6 年前

    我使用所有默认的solr(7.5)设置创建了一组医疗术语。文档来自csv文件,我使用 bin/post 使用默认设置。

    当我提交一个愚蠢的查询时,我可能无法得到我请求的行数。

    http://host/solr/collection/select?fl=anyLabel,score&q=anyLabel:(astronaut%20%20football%20felafel)&rows=9999&wt=csv

    有什么分数阈值吗?在这种情况下,最低分数是~8。我运行过其他一些不那么愚蠢的查询,这些查询返回的结果是合理的,得分为2或3。

    为什么这个结果在8分的结果之后被截断?我能控制吗?

    anyLabel,score football,16.0328 astronaut haemolytic anaemia,15.470738 astronaut hemolytic anemia,15.470738 canadian football,14.440538 american football,14.440538 football field,14.440538 astronaut-bone demineralization syndrome,14.188901 indoor football arena,13.135968 australian rules football,13.135968 canadian football - sport,13.135968 american football - sport,13.135968 aussie rules football,13.135968 indoor football court,13.135968 astronaut-bone demineralization syndrome (disorder),13.103226 australian rules football ground,12.04758 indoor football arena (environment),12.04758 indoor american football arena,12.04758 american or canadian football,12.04758 american or canadian football field,11.12575 accidentally kicked during football game,11.12575 australian rules football ground (environment),11.12575 canadian football - sport (qualifier value),11.12575 american or canadian football - sport,11.12575 american football - sport (qualifier value),11.12575 australian rules football (qualifier value),11.12575 "american or canadian football\, device",11.12575 accidentally stepped on during football game,10.334962 american or canadian football field (environment),10.334962 accidentally kicked during football game (event),10.334962 american or canadian football - sport (qualifier value),9.649129 "american or canadian football\, device (physical object)",9.649129 accidentally stepped on during football game (event),9.649129 "place of occurrence of accident or poisoning\, football field",8.518538 "place of occurrence of accident or poisoning\, football field (environment)",8.047099

    1 回复  |  直到 6 年前
        1
  •  2
  •   MatsLindh    6 年前

    没有最低分数-任何高于 0 在某种程度上被视为匹配,并且只要 rows start 参数与 numFound 响应中的值。

    一般来说,请求之间的得分是不可比的,而推断得分意味着“另一个文档得分一半的文档仅为相关的50%”也没有意义。

    分数还将取决于所使用的相似性算法,这在SOLR版本之间可能有所不同。对于7.5,这是BM25的相似性。