代码之家 › 专栏 › 技术社区 › Sina

将CountVectorizer结果设置为pandas.DataFrame

countvectorizer text-mining dataframe pandas python

Sina · 技术社区 · 5 年前

我需要设置pandas.DataFrame与CountVectorizer产生的矩阵特征。

count_vect = CountVectorizer()
count_vect.fit(text)

xtrain_count = count_vect.transform(train_x)
SaveTxt = pandas.DataFrame()
SaveTxt['text']=xtrain_count

SaveTxt['text']=xtrain_count 我有以下错误!

 raise ValueError('Cannot set a frame with no defined index '
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

我想知道我应该如何设置CountVectorizer的结果矩阵的数据帧? CountVectorizer结果是一个csr\矩阵,大约有20000行和200000列,内容是整数(1到6)

1 回复 | 直到 5 年前

-1

E_net4 Tunn 5 年前

pd.DataFrame(my_csr_matrix.todense())

以下是概念证明:

import random

import lorem
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

m = 10
random.seed(0)
data = [lorem.paragraph() for _ in range(m)]

cv = CountVectorizer()
cv.fit(data)

df = pd.DataFrame(data=cv.transform(data).todense())

print(df.shape)
print(df.head())

(10, 27)
   0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26
0  1  2  2  3  3  0  2  0  3  1   2   2   2   1   1   5   3   2   1   3   1   0   2   2   1   4   4
1  0  0  4  1  0  0  1  3  0  3   2   0   1   0   1   1   1   5   3   2   0   0   1   0   0   3   1
2  0  2  3  1  1  1  2  0  2  0   1   1   1   1   1   3   2   0   1   2   1   4   3   0   1   2   5
3  3  3  4  7  1  2  4  2  2  0   1   2   1   1   0   0   0   2   1   3   2   2   2   2   0   3   4
4  2  3  1  2  3  4  1  1  4  3   2   4   2   2   3   3   2   0   2   3   2   5   4   3   2   1   2

推荐文章

TheCodeNovice · R中符号格式的尾随零和其他问题[重复]

5 月前

Daniel Estévez · 扩展数据帧以包含不存在的值

5 月前

T Richard · 根据条件交换分组数据中的字符串或值

5 月前

Homer Jay Simpson · R中flextable的标题字体和垂直合并

6 月前

RKIDEV · Panda迭代行并将第n行值乘以下一(n+1)行值

6 月前

Ssong · 如何有条件地运用资本化?

6 月前

Marcio Lino · 在Pandas中转换多个值列

6 月前

Ray · 在Python pandas包中使用groupby函数时,输出结果存在差异的原因是什么?

6 月前

RobertF · 如果列没有表头,如何在R数据帧中引用变量名?

6 月前

Homer Jay Simpson · ggplot2`geom_label()中的警告消息`

6 月前