代码之家  ›  专栏  ›  技术社区  ›  ARAT

CanceledError:[\u\u]RecvAsync已取消

  •  0
  • ARAT  · 技术社区  · 5 年前

    我有个问题。我用CPU和Tensorflow 1.14.0在本地机器上运行相同的代码。很好用。然而,当我用TensorFlow2.0在GPU上运行它时,我得到

    CancelledError:  [_Derived_]RecvAsync is cancelled.      [[{{node Adam/Adam/update/AssignSubVariableOp/_65}}]]   [[Reshape_13/_62]] [Op:__inference_distributed_function_3722]
    
    Function call stack: distributed_function
    

    可复制代码如下:

    import numpy as np
    import pandas as pd
    import tensorflow as tf
    from tensorflow import keras
    print(tf.__version__)
    
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    batch_size = 32
    num_obs = 100
    num_cats = 1 # number of categorical features
    n_steps = 10 # number of timesteps in each sample
    n_numerical_feats = 18 # number of numerical features in each sample
    cat_size = 12 # number of unique categories in each categorical feature
    embedding_size = 1 # embedding dimension for each categorical feature
    
    labels =  np.random.random(size=(num_obs*n_steps,1)).reshape(-1,n_steps,1)
    print(labels.shape)
    #(100, 10, 1)
    
    #3 numerical variable
    num_data = np.random.random(size=(num_obs*n_steps,n_numerical_feats))
    print(num_data.shape)
    #(1000, 3)
    #Reshaping numeric features to fit into an LSTM network
    features = num_data.reshape(-1,n_steps, n_numerical_feats)
    print(features.shape)
    #(100, 10, 3)
    
    #one categorical variables with 4 levels
    cat_data = np.random.randint(0,cat_size,num_obs*n_steps)
    print(cat_data.shape)
    #(1000,)
    idx = cat_data.reshape(-1, n_steps)
    print(idx.shape)
    #(100, 10)
    
    numerical_inputs = keras.layers.Input(shape=(n_steps, n_numerical_feats), name='numerical_inputs', dtype='float32')
    #<tf.Tensor 'numerical_inputs:0' shape=(?, 10, 36) dtype=float32>
    
    cat_input = keras.layers.Input(shape=(n_steps,), name='cat_input')
    #<tf.Tensor 'cat_input:0' shape=(None, 10) dtype=float32>
    
    cat_embedded = keras.layers.Embedding(cat_size, embedding_size, embeddings_initializer='uniform')(cat_input)
    #<tf.Tensor 'embedding_1/Identity:0' shape=(None, 10, 1) dtype=float32>
    
    merged = keras.layers.concatenate([numerical_inputs, cat_embedded])
    #<tf.Tensor 'concatenate_1/Identity:0' shape=(None, 10, 37) dtype=float32>
    
    lstm_out = keras.layers.LSTM(64, return_sequences=True)(merged)
    #<tf.Tensor 'lstm_2/Identity:0' shape=(None, 10, 64) dtype=float32>
    
    Dense_layer1 = keras.layers.Dense(32, activation='relu', use_bias=True)(lstm_out)
    #<tf.Tensor 'dense_4/Identity:0' shape=(None, 10, 32) dtype=float32>
    Dense_layer2 = keras.layers.Dense(1, activation='linear', use_bias=True)(Dense_layer1 )
    #<tf.Tensor 'dense_5/Identity:0' shape=(None, 10, 1) dtype=float32>
    
    model = keras.models.Model(inputs=[numerical_inputs, cat_input], outputs=Dense_layer2)
    
    #compile model
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
    model.compile(loss='mse',
                  optimizer=optimizer,
                  metrics=['mae', 'mse'])
    EPOCHS =5
    
    #fit the model
    #you can use input layer names instead
    history = model.fit([features, idx], 
                        y = labels,
                        epochs=EPOCHS,
                        batch_size=batch_size)
    

    有没有人有类似的问题?显然这是一个错误,但我不知道如何来,因为我想使用TensorFlow2.0。

    0 回复  |  直到 5 年前
        1
  •  3
  •   yao he    5 年前

    我发现tensorflow-gpu2.0.0是用cuda7.6.0编译的。

    然后我将cuda从7.4.2更新到7.6.4,问题就解决了。

    1. 更新cuda至7.6.2;
    2. 使用 TF_FORCE_GPU_ALLOW_GROWTH=true 强制允许GPU增长。