统计文本词频的几种方法（Python）

作者：System 时间：2024年08月14日分类：所有,python 字数：729

这篇文章距离上次修改已过342天，其中的内容可能已经有所变动。

在Python中，统计文本词频的几种方法包括：

使用Python内置的collections模块中的Counter类。
使用正则表达式分割文本，然后通过字典统计词频。
使用jieba库进行中文分词后统计词频。

下面是这些方法的示例代码：

使用Counter类：




from collections import Counter
 
text = "This is an example for word frequency counting."
counter = Counter(text.split())
print(counter)

使用正则表达式和字典：




import re
 
text = "This is an example for word frequency counting."
words = re.findall(r'\w+', text)
word_freq = {word: words.count(word) for word in set(words)}
print(word_freq)

使用jieba库进行中文分词：




import jieba
 
text = "这是一个例子来进行词频统计。"
words = jieba.cut(text)
word_freq = Counter(words)
print(word_freq)

注意：jieba库需要先通过pip install jieba进行安装。

统计文本词频的几种方法（Python）

评论已关闭

推荐阅读