使用Quanteda:
library(quanteda)
txt <- c("hello world world fizz", "foo bar bar buzz")
dfm <- dfm(txt)
topfeatures(dfm, n = 2, groups = seq_len(ndoc(dfm)))
# $`1`
# world hello
# 2 1
#
# $`2`
# bar foo
# 2 1
您也可以在
DocumentTermMatrix
和
dfm
.
或者使用经典
tm
library(tm)
packageVersion("tm")
# [1] â0.7.1â
txt <- c(doc1="hello world world", doc2="foo bar bar fizz buzz")
dtm <- DocumentTermMatrix(Corpus(VectorSource(txt)))
n <- 5
(top <- findMostFreqTerms(dtm, n = n))
# $doc1
# world hello
# 2 1
#
# $doc2
# bar buzz fizz foo
# 2 1 1 1
do.call(rbind, lapply(top, function(x) { x <- names(x);length(x)<-n;x }))
# [,1] [,2] [,3] [,4] [,5]
# doc1 "world" "hello" NA NA NA
# doc2 "bar" "buzz" "fizz" "foo" NA
findMostFreqTerms
从开始提供
tm version 0.7-1
.