代码之家 › 专栏 › 技术社区 › OverflowingTheGlass

将平均感知器标记位置转换为WordNet位置,避免元组错误

pos-tagger nltk nlp python-3.x python

OverflowingTheGlass · 技术社区 · 8 年前

from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from nltk.tokenize import word_tokenize

string = 'dogs runs fast'

tokens = word_tokenize(string)
tokensPOS = pos_tag(tokens)
print(tokensPOS)

[('dogs', 'NNS'), ('runs', 'VBZ'), ('fast', 'RB')]

lemmatizedWords = []
for w in tokensPOS:
       lemmatizedWords.append(WordNetLemmatizer().lemmatize(w))

print(lemmatizedWords)

产生的错误:

Traceback (most recent call last):

  File "<ipython-input-30-462d7c3bdbb7>", line 15, in <module>
    lemmatizedWords = WordNetLemmatizer().lemmatize(w)

  File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)

  File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1712, in _morphy
    forms = apply_rules([form])

  File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1692, in apply_rules
    for form in forms

  File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1694, in <listcomp>
    if form.endswith(old)]

AttributeError: 'tuple' object has no attribute 'endswith'

我想我有两个问题:

POS标记没有转换为WordNet可以理解的标记(我尝试了实现类似于这个答案的东西 wordnet lemmatization and pos tagging in python
数据结构的格式不正确,无法循环遍历每个元组(除此之外,我找不到更多关于这个错误的信息) os 相关代码)

我该如何使用柠檬化进行词性标注以避免这些错误?

1 回复 | 直到 8 年前

Jakub Rakus 8 年前

Python解释器清楚地告诉您:

AttributeError: 'tuple' object has no attribute 'endswith'

tokensPOS 是一个元组数组,因此不能将其元素直接传递给 lemmatize() 方法(查看类的代码 WordNetLemmatizer here endswith() ,因此需要从中传递每个元组的第一个元素 tokenPOS

lemmatizedWords = []
for w in tokensPOS:
    lemmatizedWords.append(WordNetLemmatizer().lemmatize(w[0]))

方法 柠檬化() 使用 wordnet.NOUN 作为默认位置。不幸的是,Wordnet使用不同于其他nltk语料库的标记,因此您必须手动翻译它们(如您提供的链接中所示),并使用适当的标记作为第二个参数来 get_wordnet_pos() 从…起 this answer :

from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from nltk.tokenize import word_tokenize

def get_wordnet_pos(treebank_tag):

    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    else:
        return ''

string = 'dogs runs fast'

tokens = word_tokenize(string)
tokensPOS = pos_tag(tokens)
print(tokensPOS)

lemmatizedWords = []
for w in tokensPOS:
    lemmatizedWords.append(WordNetLemmatizer().lemmatize(w[0],get_wordnet_pos(w[1])))

print(lemmatizedWords)

推荐文章

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

1 年前

Cam · Pandas列表日期到日期时间

1 年前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

1 年前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

1 年前

LMC · Numpy数组布尔索引以获取包含元素

1 年前

vr8ce · 非成对标记中特定字符的正则表达式

1 年前

Kernel · 如果指定了crs参数,shapefile的geopandas.read_file将出错

1 年前

ShaAnder · 为什么sqllachemy返回的是类而不是字符串

1 年前

sixtytrees · detectron2软件包未安装(没有名为“torch”的模块),但我安装了torch

1 年前

Pernoctador · Python映射可以复制吗?我需要参考地图

1 年前