代码之家  ›  专栏  ›  技术社区  ›  jon

带桶的Tensorflow错误

  •  0
  • jon  · 技术社区  · 8 年前

    我尝试使用tensorflow训练序列到序列模型。我在教程中看到,桶有助于加快培训。到目前为止,我只能使用一个桶进行训练,也可以使用一个gpu和多个桶,使用或多或少的现成代码,但当我尝试使用多个桶和多个gpu时,我得到了一个错误声明: 无效参数:必须为dtype int32的占位符张量“gpu_scope_0/encoder50_gpu0”提供一个值

    从错误中,我可以看出我没有正确地声明input_feed,因此它期望每次输入的大小都是最大的bucket。但是,我对为什么会出现这种情况感到困惑,因为在我正在修改的示例中,它在初始化input_feed的占位符时也会做同样的事情。据我所知,教程也初始化到最大大小的bucket,但当我使用教程的代码时,不会发生此错误。

    以下是我认为相关的初始化代码:

    self.encoder_inputs = [[] for _ in xrange(self.num_gpus)]
        self.decoder_inputs = [[] for _ in xrange(self.num_gpus)]
        self.target_weights = [[] for _ in xrange(self.num_gpus)]
        self.scope_prefix = "gpu_scope"
        for j in xrange(self.num_gpus):
            with tf.device("/gpu:%d" % (self.gpu_offset + j)):
                with tf.name_scope('%s_%d' % (self.scope_prefix, j)) as scope:
                    for i in xrange(buckets[-1][0]):  # Last bucket is the biggest one.
                        self.encoder_inputs[j].append(tf.placeholder(tf.int32, shape=[None],
                                                                     name="encoder{0}_gpu{1}".format(i,j)))
                    for i in xrange(buckets[-1][1] + 1):
                        self.decoder_inputs[j].append(tf.placeholder(tf.int32, shape=[None],
                                                                     name="decoder{0}_gpu{1}".format(i,j)))
                        self.target_weights[j].append(tf.placeholder(tf.float32, shape=[None],
                                                                     name="weight{0}_gpu{1}".format(i,j)))
    
        # Our targets are decoder inputs shifted by one.
        self.losses = []
        self.outputs = []
    
        # The following loss computation creates the neural network. The specified
        # device hosts the trainable tf parameters.
        bucket = buckets[0]
        i = 0
        with tf.device(param_device):
            output, loss = tf.nn.seq2seq.model_with_buckets(self.encoder_inputs[i], self.decoder_inputs[i],
                                                            [self.decoder_inputs[i][k + 1] for k in
                                                             xrange(len(self.decoder_inputs[i]) - 1)],
                                                            self.target_weights[0], buckets,
                                                            lambda x, y: seq2seq_f(x, y, True),
                                                            softmax_loss_function=self.softmax_loss_function)
    
        bucket = buckets[0]
        self.encoder_states = []
        with tf.device('/gpu:%d' % self.gpu_offset):
            with variable_scope.variable_scope(variable_scope.get_variable_scope(),
                                               reuse=True):
                self.encoder_outputs, self.encoder_states = get_encoder_outputs(self,
                                                                                self.encoder_inputs[0])
    
        if not forward_only:
            self.grads = []
        print ("past line 297")
        done_once = False
        for i in xrange(self.num_gpus):
            with tf.device("/gpu:%d" % (self.gpu_offset + i)):
                with tf.name_scope("%s_%d" % (self.scope_prefix, i)) as scope:
                    with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True):
                        #for j, bucket in enumerate(buckets):
                        output, loss = tf.nn.seq2seq.model_with_buckets(self.encoder_inputs[i],
                                                                        self.decoder_inputs[i],
                                                                        [self.decoder_inputs[i][k + 1] for k in
                                                                         xrange(len(self.decoder_inputs[i]) - 1)],
                                                                        self.target_weights[i], buckets,
                                                                        lambda x, y: seq2seq_f(x, y, True),
                                                                        softmax_loss_function=self.softmax_loss_function)
    
                        self.losses.append(loss)
                        self.outputs.append(output)
    
    
        # Training outputs and losses.
        if forward_only:
            self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(
                self.encoder_inputs, self.decoder_inputs,
                [self.decoder_inputs[0][k + 1] for k in xrange(buckets[0][1])],
                self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
                softmax_loss_function=self.softmax_loss_function)
            # If we use output projection, we need to project outputs for decoding.
            if self.output_projection is not None:
                for b in xrange(len(buckets)):
                    self.outputs[b] = [
                        tf.matmul(output, self.output_projection[0]) + self.output_projection[1]
                        for output in self.outputs[b]
                        ]
        else:
            self.bucket_grads = []
            self.gradient_norms = []
            params = tf.trainable_variables()
            opt = tf.train.GradientDescentOptimizer(self.learning_rate)
            self.updates = []
            with tf.device(aggregation_device):
                for g in xrange(self.num_gpus):
                    for b in xrange(len(buckets)):
                        gradients = tf.gradients(self.losses[g][b], params)
                        clipped_grads, norm = tf.clip_by_global_norm(gradients, max_gradient_norm)
                        self.gradient_norms.append(norm)
                        self.updates.append(
                            opt.apply_gradients(zip(clipped_grads, params), global_step=self.global_step))
    

    输入数据时的相关代码如下:

        input_feed = {}
          for i in xrange(self.num_gpus):
            for l in xrange(encoder_size):
                input_feed[self.encoder_inputs[i][l].name] = encoder_inputs[i][l]
            for l in xrange(decoder_size):
                input_feed[self.decoder_inputs[i][l].name] = decoder_inputs[i][l]
                input_feed[self.target_weights[i][l].name] = target_weights[i][l]
    
            # Since our targets are decoder inputs shifted by one, we need one more.
            last_target = self.decoder_inputs[i][decoder_size].name
            input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)
    
            last_weight = self.target_weights[i][decoder_size].name
            input_feed[last_weight] = np.zeros([self.batch_size], dtype=np.float32)
        # Output feed: depends on whether we do a backward step or not.
    
        if not forward_only:
            output_feed = [self.updates[bucket_id], self.gradient_norms[bucket_id], self.losses[bucket_id]]
        else:
            output_feed = [self.losses[bucket_id]]  # Loss for this batch.
            for l in xrange(decoder_size):  # Output logits.
                output_feed.append(self.outputs[0][l])
    

    现在我正在考虑将每个输入填充到bucket大小,但我预计这会失去bucket的一些优势

    1 回复  |  直到 8 年前
        1
  •  0
  •   jon    8 年前