代码之家  ›  专栏  ›  技术社区  ›  Rocketq

带TensorFlow GPU的Keras完全冻结PC

  •  2
  • Rocketq  · 技术社区  · 6 年前

    我有相当简单的架构LSTM nn。在经历了几次“时代1-2”之后,我的电脑完全冻结了,我甚至连鼠标都动不了:

    Layer (type)                 Output Shape              Param #   
    =================================================================
    lstm_4 (LSTM)                (None, 128)               116224    
    _________________________________________________________________
    dropout_3 (Dropout)          (None, 128)               0         
    _________________________________________________________________
    dense_5 (Dense)              (None, 98)                12642     
    =================================================================
    Total params: 128,866
    Trainable params: 128,866
    Non-trainable params: 0
    
        # Same problem  with 2 layers LSTM  with dropout and Adam optimizer
    
        SEQUENCE_LENGTH =3, len(chars) = 98
        model = Sequential()
        model.add(LSTM(128, input_shape = (SEQUENCE_LENGTH, len(chars))))
        #model.add(Dropout(0.15))
        #model.add(LSTM(128))
        model.add(Dropout(0.10))
        model.add(Dense(len(chars), activation = 'softmax'))
    
        model.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(lr=0.01), metrics=['accuracy'])
    

    我就是这样训练的:

    history = model.fit(X, y, validation_split=0.20, batch_size=128, epochs=10, shuffle=True,verbose=2).history
    

    NN需要5分钟才能完成1个时代。批量越大并不意味着问题发生得越快。但是更复杂的模型可以训练更多的时间来达到几乎相同的精度-大约0.46(完整代码 here )

    我有最新的Linux Mint,1070ti,8GB,32GB RAM

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 396.26 Driver Version: 396.26 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 GeForce GTX 107... Off | 00000000:08:00.0 On | N/A |
    | 0% 35C P8 10W / 180W | 303MiB / 8116MiB | 0% Default |
    +-------------------------------+----------------------+----------------------+
    

    库:

    Keras==2.2.0
    Keras-Applications==1.0.2
    Keras-Preprocessing==1.0.1
    keras-sequential-ascii==0.1.1
    keras-tqdm==2.0.1
    tensorboard==1.8.0
    tensorflow==1.0.1
    tensorflow-gpu==1.8.0
    

    我试过限制GPU内存的使用,但这里不会有问题,因为在培训期间,它只消耗1 GB的GPU内存:

    from keras.backend.tensorflow_backend 
    import set_session config = tf.ConfigProto() 
    
    config.gpu_options.per_process_gpu_memory_fraction = 0.9 
    
    config.gpu_options.allow_growth = True set_session(tf.Session(config=config))
    

    这里怎么了?我怎样才能解决这个问题?

    2 回复  |  直到 6 年前
        1
  •  1
  •   Snehal    6 年前
    • tensorflow==1.0.1 tensorflow-gpu==1.8.0 here

    • LSTM CuDNNLSTM recurrent_activation='sigmoid'