自然语言处理基础技术工具篇之TextBlob

wolf · 发表于 2024-11-8 10:58

登陆有奖并可浏览互动！

您需要登录才可以下载或查看，没有账号？立即注册

×

TextBlob简介

TextBlob是一个用Python编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务，比如，词性标注，名词性成分提取，情感分析，文本翻译，等等。
Github地址：https://github.com/sloria/TextBlob
官方文档：https://textblob.readthedocs.io/en/dev/

<hr/>TextBlob实战

安装：pip install textblob

配置国内源安装：pip install textblob -i https://pypi.tuna.tsinghua.edu.cn/simple

参考：https://textblob.readthedocs.io/en/dev/quickstart.html
from textblob import TextBlob
text = 'I love natural language processing! I am not like fish!'
blob = TextBlob(text)<hr/>1.词性标注

blob.tags
[('I', 'PRP'),
('love', 'VBP'),
('natural', 'JJ'),
('language', 'NN'),
('processing', 'NN'),
('I', 'PRP'),
('am', 'VBP'),
('not', 'RB'),
('like', 'IN'),
('fish', 'NN')]<hr/>2.短语抽取

np = blob.noun_phrases
for w in np:
print(w)
natural language processing<hr/>3.计算句子情感值

for sentence in blob.sentences:
print(sentence + '------>' + str(sentence.sentiment.polarity))
I love natural language processing!------>0.3125
i am not like you!------>0.0<hr/>4.Tokenization（把文本切割成句子或者单词）

token = blob.words
for w in token:
print(w)
I
love
natural
language
processing
I
am
not
like
fish
sentence = blob.sentences
for s in sentence:
print(s)
I love natural language processing!
I am not like fish!<hr/>5.词语变形(Words Inflection)

token = blob.words
for w in token:
# 变复数
print(w.pluralize())
# 变单数
print(w.singularize())
we
I
love
love
naturals
natural
languages
language
processings
processing
we
I
ams
am
nots
not
likes
like
fish
fish<hr/>6.词干化(Words Lemmatization)

from textblob import Word
w = Word('went')
print(w.lemmatize('v'))
w = Word('octopi')
print(w.lemmatize())
go
octopus<hr/>7.集成WordNet

from textblob.wordnet import VERB
word = Word('octopus')
syn_word = word.synsets
for syn in syn_word:
print(syn)
Synset('octopus.n.01')
Synset('octopus.n.02')指定返回的同义词集为动词
syn_word1 = Word("hack").get_synsets(pos=VERB)
for syn in syn_word1:
print(syn)
Synset('chop.v.05')
Synset('hack.v.02')
Synset('hack.v.03')
Synset('hack.v.04')
Synset('hack.v.05')
Synset('hack.v.06')
Synset('hack.v.07')
Synset('hack.v.08')查看synset(同义词集)的具体定义
Word("beautiful").definitions
['delighting the senses or exciting intellectual or emotional admiration',
'(of weather) highly enjoyable']<hr/>8.拼写纠正(Spelling Correction)

sen = 'I lvoe naturl language processing!'
sen = TextBlob(sen)
print(sen.correct())
I love nature language processing!Word.spellcheck()返回拼写建议以及置信度
w1 = Word('good')
w2 = Word('god')
w3 = Word('gd')
print(w1.spellcheck())
print(w2.spellcheck())
print(w3.spellcheck())
[('good', 1.0)]
[('god', 1.0)]
[('go', 0.586139896373057), ('god', 0.23510362694300518), ('d', 0.11658031088082901), ('g', 0.03626943005181347), ('ed', 0.009067357512953367), ('rd', 0.006476683937823834), ('nd', 0.0038860103626943004), ('gr', 0.0025906735751295338), ('sd', 0.0006476683937823834), ('md', 0.0006476683937823834), ('id', 0.0006476683937823834), ('gdp', 0.0006476683937823834), ('ga', 0.0006476683937823834), ('ad', 0.0006476683937823834)]<hr/>9.句法分析(Parsing)

text = TextBlob('I lvoe naturl language processing!')
print(text.parse())
I/PRP/B-NP/O lvoe/NN/I-NP/O naturl/NN/I-NP/O language/NN/I-NP/O processing/NN/I-NP/O !/./O/O<hr/>10.N-Grams

text = TextBlob('I lvoe naturl language processing!')
print(text.ngrams(n=2))
[WordList(['I', 'lvoe']), WordList(['lvoe', 'naturl']), WordList(['naturl', 'language']), WordList(['language', 'processing'])]

另外，代码我已经上传github：https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlobDemo.ipynb

知乎专栏：知乎用户
公众号：StudyForAI（小白人工智能入门学习）

原文地址：https://zhuanlan.zhihu.com/p/51865496

图文播报

[分享] 自然语言处理基础技术工具篇之TextBlob

登陆有奖并可浏览互动！

发表回复

官方推荐 /3

个人中心