本文共 3504 字,大约阅读时间需要 11 分钟。
本文中提到了一些数据集下载链接,包括通过CSN下载测试数据集和通过ClueBenchmark搜索更多数据集。这些链接可能包含丰富的文档资料,可供数据分析和测试使用。
创建Elasticsearch索引时,需注意以下设置和映射:
索引设置:
# GET http://192.168.16.128:9200/es_news/_settings{ "number_of_shards": 1, "number_of_replicas": 0}映射定义:
# POST http://192.168.16.128:9200/es_news/_mapping{ "dynamic": true, "properties": { "id": { "type": "long" }, "title": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "content": { "type": "text", "analyzer": "ik_smart" }, "shortname": { "type": "text", "analyzer": "ik_max_word" }, "location": { "type": "geo_point" } }}接着,可以通过POST请求添加自定义数据:
# POST http://192.168.16.128:9200/es_news/_doc{ "title": "我中了一个奖品", "content": "奖品内容是苹果电脑", "location": "88.884874,29.263792", "shortname": "日喀则"} 每个文档都有一个相关性评分字段_score,其计算基于TF/IDF算法,具体包括以下部分:
检索词频率(TF):
反向文档频率(IDF):
字段长度准则:
通过设置explain: true可以详细查看评分计算过程:
# GET http://192.168.16.128:9200/es_news/_search{ "explain": "true", "query": { "match": { "content": "奖品" } }} 为了满足具体需求,可以对评分进行更详细的控制:
title^2等方式,可以在title字段和content字段之间调整权重。{ "query": { "multi_match": { "query": "奖品", "fields": ["content", "title^2"] } }} function-score进行评分调整:# PUT http://blogposts/post/1{ "title": "关于热度", "content": "在这篇文章中我们将讨论……", "votes": 6} # GET http://blogposts/post/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" } } }} modifier进行平滑:{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 2 } } }} 点赞数更平滑:
log1p或square等修饰语,以平滑评分变化。更精细的控制:
function_score中的functions数组,结合linear、exp或gauss函数,对具体字段进行评分衰减。# GET http://_search{ "query": { "function_score": { "functions": [ { "gauss": { "location": { "origin": { "lat": 51.5, "lon": 0.12 }, "offset": "2km", "scale": "3km" } } }, { "gauss": { "price": { "origin": "50", "offset": "50", "scale": "20" } } }, { "weight": 2 } ] } }} 通过以上方法,可以根据具体需求灵活调整Elasticsearch的相关性评分,以优化搜索结果。
转载地址:http://zygdz.baihongyu.com/