BM25检索算法 python
BM25是一种常用的信息检索模型,用于评估查询中每个文档的相关性。以下是一个简单的Python实现:
import math
def bm25(query, document, k1=1.2, b=0.75):
"""
Calculate BM25 score for a given query and document.
:param query: A list of query terms.
:param document: A string representing the document text.
:param k1: A constant for term frequency normalization.
:param b: A constant for field-length normalization.
:return: BM25 score as a float.
"""
score = 0.0
dl = len(document)
avgdl = math.fsum(len(word) for word in document.split()) / len(document.split())
for term in query:
fi = document.count(term)
qfi = query.count(term)
k = k1 * (1 - b + b * (dl / avgdl))
score += (fi * (k1 + k * fi) / (k1 + k * (1 - b + b * (fi / avgdl)))) * (qfi ** 2)
return score
# Example usage:
query = ["python", "search", "algorithm"]
document = "Python is a high-level programming language used for general-purpose programming. It is an interpreted language with dynamic semantics. Its design philosophy emphasizes code readability with its notable use of significant whitespace. The language provides constructs that enable clear programming on both small and large scales."
score = bm25(query, document)
print(f"BM25 Score: {score}")
这段代码定义了一个bm25
函数,它接受查询词和文档作为输入,并返回BM25得分。在实例化时,我们使用了一个查询词列表和一个文档字符串。然后,我们打印出计算出的BM25得分。
评论已关闭