Python解释器清楚地告诉您:
AttributeError: 'tuple' object has no attribute 'endswith'
tokensPOS
是一个元组数组,因此不能将其元素直接传递给
lemmatize()
方法(查看类的代码
WordNetLemmatizer
here
endswith()
,因此需要从中传递每个元组的第一个元素
tokenPOS
lemmatizedWords = []
for w in tokensPOS:
lemmatizedWords.append(WordNetLemmatizer().lemmatize(w[0]))
方法
柠檬化()
使用
wordnet.NOUN
作为默认位置。不幸的是,Wordnet使用不同于其他nltk语料库的标记,因此您必须手动翻译它们(如您提供的链接中所示),并使用适当的标记作为第二个参数来
get_wordnet_pos()
从…起
this answer
:
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from nltk.tokenize import word_tokenize
def get_wordnet_pos(treebank_tag):
if treebank_tag.startswith('J'):
return wordnet.ADJ
elif treebank_tag.startswith('V'):
return wordnet.VERB
elif treebank_tag.startswith('N'):
return wordnet.NOUN
elif treebank_tag.startswith('R'):
return wordnet.ADV
else:
return ''
string = 'dogs runs fast'
tokens = word_tokenize(string)
tokensPOS = pos_tag(tokens)
print(tokensPOS)
lemmatizedWords = []
for w in tokensPOS:
lemmatizedWords.append(WordNetLemmatizer().lemmatize(w[0],get_wordnet_pos(w[1])))
print(lemmatizedWords)