代码之家  ›  专栏  ›  技术社区  ›  rmahesh

尝试在Keras中标记文本时出错?

  •  0
  • rmahesh  · 技术社区  · 6 年前

    Keras和deep learning非常陌生,但我正在遵循在线指南,我正在尝试标记我的文本,以便在创建神经网络层时,可以访问“形状”作为“输入形状”。以下是我目前的代码:

    df = pd.read_csv(pathname, encoding = "ISO-8859-1")
    df = df[['content_cleaned', 'meaningful']]
    df = df.sample(frac=1)
    
    #Transposed columns into numpy arrays 
    X = np.asarray(df[['content_cleaned']])
    y = np.asarray(df[['meaningful']])
    
    #Split into training and testing set
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21) 
    
    # Create tokenizer
    tokenizer = Tokenizer(num_words=100) #No row has more than 100 words.
    
    #Tokenize the predictors (text)
    X_train = np.concatenate(tokenizer.sequences_to_matrix(int(X_train), mode="binary"))
    X_test = np.concatenate(tokenizer.sequences_to_matrix(int(X_test), mode="binary"))
    
    #Convert the labels to the binary
    encoder = LabelBinarizer()
    encoder.fit(y_train) 
    y_train = encoder.transform(y_train)
    y_test = encoder.transform(y_test)
    

    错误是突出显示:

    X_train = tokenizer.sequences_to_matrix(int(X_train), mode="binary")
    

    错误消息是:

    TypeError: only length-1 arrays can be converted to Python scalars
    

    谁能抓住我的错误,并可能提供解决方案?我对这一点还很陌生,还没能解决这个问题。

    任何帮助都会很好!

    1 回复  |  直到 6 年前
        1
  •  0
  •   sdcbr    6 年前

    您正在尝试将numpy数组转换为python整数,这当然是不可能的,并且会导致错误(错误与Keras无关)。你真正想做的是改变 dtype 从那个numpy数组到 int . 请尝试以下操作:

    X_train.astype(np.int32)

    int(X_train)