代码之家 › 专栏 › 技术社区 › Rocketq

如何在BigARTM中迭代顶级单词?

python-2.7 python

Rocketq · 技术社区 · 9 年前

我想写主题名和与该主题相关的关键词。 BigARTM库已从0.7.6版本更新为0.8.0版本,因此以下旧代码停止工作:

for topic_name in model_artm.topic_names:
    print topic_name + ': ',
    for word in model_artm.score_tracker["top_words"].last_topic_info[topic_name].tokens:
        print word,
    print

问题与第二个循环有关,没有 last_topic_info 根据 the official manual ,我们需要 artm.score_tracker.TopTokensScoreTracker ,我们应该这样写:

model_artm.score_tracker["topTokes1"].last_tokens[topic_name].value #doesn't work.

你知道怎么回事吗?

1 回复 | 直到 9 年前

Alexander Frey 9 年前

在这里,BigARTM Score Tracker API在0.7.9版和0.8.0版之间有一个小的变化。以下示例应适用于v0.8.0

import artm
batch_vectorizer = artm.BatchVectorizer(data_path=r'D:\Datasets\kos',
                                        data_format='batches')
dictionary = artm.Dictionary(data_path=r'D:\Datasets\kos')
model = artm.ARTM(num_topics=15,
                  num_document_passes=5,
                  dictionary=dictionary,
                  scores=[artm.TopTokensScore(name='top_tokens_score')])

model.fit_offline(batch_vectorizer=batch_vectorizer, num_collection_passes=3)

top_tokens = model.score_tracker['top_tokens_score']
for topic_name in model.topic_names:
    print '\n', topic_name
    for (token, weight) in zip(top_tokens.last_tokens[topic_name],
                               top_tokens.last_weights[topic_name]):
        print token, '-', weight