rank_feature 查询

原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/query-dsl-rank-feature-query.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/query-dsl-rank-feature-query.html

重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考当前版本文档。

» » »

« 注意 script 查询 »

基于 rank_feature 或 rank_features 字段的数值提升文档的相关性评分。

rank_feature 查询通常用在 bool 查询的 should 子句中，因此它的相关性分数会添加到 bool查询的其他分数中。

与 function_score 查询或其他更改相关性评分的方法不同，当 track_total_hits 参数不为true时，rank_feature 查询会有效地跳过非竞争命中。这可以极大地提高查询速度。

等级特征(rank feature)函数

为了基于等级特征字段计算相关性评分，rank_feature 查询支持以下数学函数：

如果你不知道从哪里开始，我们建议使用 saturation 函数。如果没有指定函数，默认情况下，rank_feature 查询使用 saturation 函数。

请求示例

创建索引

要使用 rank_feature 查询，索引必须包含rank_feature 或 rank_features字段映射。要了解如何为 rank_feature 查询创建索引，请尝试以下示例。

使用以下字段映射创建一个名为 test 的索引：

pagerank，一个 rank_feature 类型的字段，用来衡量一个网站的重要性。
url_length，一个 rank_feature 类型的字段，包含网站url的长度。在本例中，positive_score_impact 值设置为 false 表示长 URL 与相关性负相关。
topics，一个 rank_features 类型的字段，包含一个主题列表和一个衡量每个文档与该主题关联程度的度量

PUT /test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature"
      },
      "url_length": {
        "type": "rank_feature",
        "positive_score_impact": false
      },
      "topics": {
        "type": "rank_features"
      }
    }
  }
}

添加并索引几个文档到索引 test 中：

PUT /test/_doc/1?refresh
{
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50.3,
  "url_length": 42,
  "topics": {
    "sports": 50,
    "brazil": 30
  }
}

PUT /test/_doc/2?refresh
{
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
    "sports": 35,
    "formula one": 65,
    "brazil": 20
  }
}

PUT /test/_doc/3?refresh
{
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 50.3,
  "url_length": 37,
  "topics": {
    "movies": 60,
    "super hero": 65
  }
}

查询示例

下面这个查询搜索 2016，并根据 pagerank、url_length和sports主题(topic.sports) 提升相关性评分。

GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "rank_feature": {
            "field": "pagerank"
          }
        },
        {
          "rank_feature": {
            "field": "url_length",
            "boost": 0.1
          }
        },
        {
          "rank_feature": {
            "field": "topics.sports",
            "boost": 0.4
          }
        }
      ]
    }
  }
}

`rank_feature`的顶级参数

field

(必需, string) 用来提升相关性评分的rank_feature或rank_features类型的字段。

boost

(可选, float) 用以降低或提升相关性评分的浮点数。默认为 1.0。

提升值是相对于默认值1.0的。提升值在 0 和 1.0 之间会降低相关性评分，而当它大于 1.0 才会增加相关性评分。

saturation

(可选, function object) 用于根据 rank_feature 类型的 field 的值提升相关性评分的饱和度函数。如果未指定函数，rank_feature 查询默认使用 saturation 函数。更多信息请参考饱和度(saturation)。

只能指定 saturation、log 或 sigmoid 函数中的一个。

log

(可选, function object) 用于根据 rank_feature 类型的 field 的值提升相关性评分的对数函数。更多信息请参考对数(logarithm)。

只能指定 saturation、log 或 sigmoid 函数中的一个。

sigmoid

(可选, function object) 用于根据 rank_feature 类型的 field 的值提升相关性评分的 sigmoid 函数。更多信息请参考 sigmoid。

只能指定 saturation、log 或 sigmoid 函数中的一个。

注意

saturation

saturation 函数给出等于 S / (S + pivot) 的分数，其中 S 是 rank_feature 类型字段的值，pivot 是可配置的 pivot 值，因此如果 S 小于 pivot，结果将小于 0.5，否则大于 0.5。评分总是位于 0 和 1 之间, 即 (0,1)。

如果等级特征具有负得分影响(即 positive_score_impact 为 false)，则该函数将被计算为 pivot / (S + pivot)，其值将随着 S的增加而减少。

GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "saturation": {
        "pivot": 8
      }
    }
  }
}

如果未提供 pivot 值，Elasticsearch 将计算默认值，该值等于索引中所有等级特征值的近似几何平均值。如果你还没有机会训练一个好的 pivot 值，我们建议使用这个默认值。

GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "saturation": {}
    }
  }
}

logarithm

log 函数给出的分数等于 log(scaling_factor + S)，其中 S 是 rank_feature 类型字段的值，scaling_factor 是可配置的缩放因子。其计算的值无界。

该函数仅支持对分数有正面影响的等级特征(即 positive_score_impact 为 true)。

GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "log": {
        "scaling_factor": 4
      }
    }
  }
}

sigmoid

sigmoid 函数是 saturation 的扩展，它增加了一个可配置的 exponent。按 S^exp^ / (S^exp^ + pivot^exp^) 计算分数。与 saturation 函数一样，pivot 是 S 的值，该值给出 0.5的分数，分数在 (0,1) 范围内。

exponent 必须是正数，通常在 [0.5, 1] 范围内。一个好的值应该通过训练来计算。如果你没有机会这样做，我们建议使用 saturation 函数代替之。

GET /test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "sigmoid": {
        "pivot": 7,
        "exponent": 0.6
      }
    }
  }
}

« 注意 script 查询 »