代码之家  ›  专栏  ›  技术社区  ›  coolhand

Wordcloud来自R中的数据表

  •  1
  • coolhand  · 技术社区  · 7 年前

    我有一个数据表,由积极和消极的单词联想组成。我想创建两个词云,一个是积极的词,一个是消极的词。

    举例 sentiment_words 表:

              element_id    sentence_id   negative     positive
    1115:          1        1115          limits       agree,available
    1116:          1        1116          slow         strongly,agree
    1117:          1        1117                       management
    1118:          1        1118                                      
    1119:          1        1119          concerns     strongly,agree,better,
    

    library(wordcloud) library(sentimentr)

    例如,如何仅从“肯定”列中提取单词来创建wordcloud?我不确定如何解决每一行都有多个相关单词的问题(例如,“同意,可用”应被视为两个条目)

    我在这方面做了不同的尝试 wordcloud() 功能,如 wordcloud(words = sentiment_words$positive, freq = 3, min.freq = 1, max.words = 200, random.order = FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))

    编辑:我已经试过了 tidyverse 回答如下,我得到的结果是: words n <chr> <int> 1 " \"ability\"" 3 2 " \"ability\")" 1 3 " \"acceptable\")" 1 4 " \"accomplish\"" 1 5 " \"accomplished\")" 1 6 " \"accountability\"" 1 7 " \"accountability\")" 1 8 " \"accountable\"" 2 9 " \"accountable\")" 1

    我试过多种不同的方法 gsub() apply ) c( 但是还没有找到任何有效的方法。结果是应该一起计算的单词被分别计算(例如,“可接受”和“可接受”是wordcloud中的两个不同单词)

    情话

    for (j in seq(sentiment_words)) {
      sentiment_words[[j]] <- gsub("character(0)", "", sentiment_words[[j]])
      sentiment_words[[j]] <- gsub('"', "", sentiment_words[[j]])
      sentiment_words[[j]] <- gsub("c\\(", "", sentiment_words[[j]])
      sentiment_words[[j]] <- gsub(" ", "", sentiment_words[[j]])
      sentiment_words[[j]] <- gsub("\\)", "", sentiment_words[[j]])  
    }
    

    我还必须过滤掉 count_words 功能。注意,它过滤“character(0)”而不是“character(0)”,因为我删除了上面的右括号

    filter(!!var != "character(0") %>%
    

    实现上述功能可以提供基于文本极性的最干净的wordcloud

    2 回复  |  直到 7 年前
        1
  •  2
  •   Maurits Evers    7 年前

    tidyverse -应该让您开始的基于策略的方法。我同意尤兹先生的观点,因为我不完全清楚问题出在哪里。

    1. data.frame 使用基于特定列中逗号分隔的单词的字数 var 您的源数据的 df .

      library(tidyverse)
      count_words <- function(df, var) {
          var <- enquo(var)
          df %>%
              separate_rows(!!var, sep = ",") %>%
              filter(!!var != "") %>%
              group_by(!!var) %>%
              summarise(n = n()) %>%
              rename(words = !!var)
      }
      
    2. 然后,我们可以为 positive negative

      df.pos <- count_words(df, positive)
      df.neg <- count_words(df, negative)
      

      让我们检查一下房间 数据帧

      df.pos
      # A tibble: 5 x 2
        words          n
        <chr>      <int>
      1 agree          3
      2 available      1
      3 better         1
      4 management     1
      5 strongly       2
      
      df.neg
      # A tibble: 3 x 2
        words        n
        <chr>    <int>
      1 concerns     1
      2 limits       1
      3 slow         1
      
    3. 让我们画出云这个词

      library(wordcloud)
      wordcloud(words = df.pos$words, freq = df.pos$n, min.freq = 1,
                max.words = 200, random.order = FALSE, rot.per = 0.35,
                colors = brewer.pal(8, "Dark2"))
      

      enter image description here

      wordcloud(words = df.neg$words, freq = df.neg$n, min.freq = 1,
                max.words = 200, random.order = FALSE, rot.per = 0.35,
                colors = brewer.pal(8, "Dark2"))
      

      enter image description here

        2
  •  0
  •   Tyler Rinker DaniM    7 年前

    多愁善感的 attributes(sentiment_words)$counts ).这个 documentation for extract_sentiment_terms shows examples https://github.com/trinker/sentimentr/blob/master/R/extract_sentiment_terms.R

    library(sentimentr)
    library(wordcloud)
    library(data.table)
    
    set.seed(10)
    x <- get_sentences(sample(hu_liu_cannon_reviews[[2]], 1000, TRUE))
    sentiment_words <- extract_sentiment_terms(x)
    
    sentiment_counts <- attributes(sentiment_words)$counts
    sentiment_counts[polarity > 0,]
    
    par(mfrow = c(1, 3), mar = c(0, 0, 0, 0))
    ## Positive Words
    with(
        sentiment_counts[polarity > 0,],
        wordcloud(words = words, freq = n, min.freq = 1,
              max.words = 200, random.order = FALSE, rot.per = 0.35,
              colors = brewer.pal(8, "Dark2"), scale = c(4.5, .75)
        )
    )
    mtext("Positive Words", side = 3, padj = 5)
    
    ## Negative Words
    with(
        sentiment_counts[polarity < 0,],
        wordcloud(words = words, freq = n, min.freq = 1,
              max.words = 200, random.order = FALSE, rot.per = 0.35,
              colors = brewer.pal(8, "Dark2"), scale = c(4.5, 1)
        )
    )
    mtext("Negative Words", side = 3, padj = 5)
    
    sentiment_counts[, 
        color := ifelse(polarity > 0, 'red', 
            ifelse(polarity < 0, 'blue', 'gray70')
        )]
    
    ## Together
    with(
        sentiment_counts[polarity != 0,],
        wordcloud(words = words, freq = n, min.freq = 1,
              max.words = 200, random.order = FALSE, rot.per = 0.35,
              colors = color, ordered.colors = TRUE, scale = c(5, .75)
        )
    )
    mtext("Positive (red) & Negative (blue) Words", side = 3, padj = 5)
    

    enter image description here