代码之家  ›  专栏  ›  技术社区  ›  alvas

如何更新双层多层感知器的学习速率?

  •  0
  • alvas  · 技术社区  · 6 年前

    考虑到XOR问题:

    X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
    Y = xor_output = np.array([[0,1,1,0]]).T
    

    一个简单的

    • 双层多层感知器
    • 它们之间的乙状结肠激活
    • 均方误差(mse)作为损失函数/优化准则

    [代码]:

    def sigmoid(x): # Returns values that sums to one.
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(sx): # For backpropagation.
        # See https://math.stackexchange.com/a/1225116
        return sx * (1 - sx)
    
    # Cost functions.
    def mse(predicted, truth):
        return np.sum(np.square(truth - predicted))
    
    X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
    Y = xor_output = np.array([[0,1,1,0]]).T
    
    # Define the shape of the weight vector.
    num_data, input_dim = X.shape
    # Lets set the dimensions for the intermediate layer.
    hidden_dim = 5
    # Initialize weights between the input layers and the hidden layer.
    W1 = np.random.random((input_dim, hidden_dim))
    
    # Define the shape of the output vector. 
    output_dim = len(Y.T)
    # Initialize weights between the hidden layers and the output layer.
    W2 = np.random.random((hidden_dim, output_dim))
    

    并将停止标准作为一个固定数量的epoch(通过x和y的迭代次数),固定学习率为0.3:

    # Initialize weigh
    num_epochs = 10000
    learning_rate = 0.3
    

    当我运行向前向后传播并更新每个时代的权重时, 我应该如何更新权重?

    我试着简单地将学习率的乘积加上反向传播导数的点积和层输出的乘积,但是模型仍然只更新了一个方向的权重,导致所有权重都降到接近零。

    for epoch_n in range(num_epochs):
        layer0 = X
        # Forward propagation.
    
        # Inside the perceptron, Step 2. 
        layer1 = sigmoid(np.dot(layer0, W1))
        layer2 = sigmoid(np.dot(layer1, W2))
    
        # Back propagation (Y -> layer2)
    
        # How much did we miss in the predictions?
        layer2_error = mse(layer2, Y)
    
        #print(layer2_error)
        # In what direction is the target value?
        # Were we really close? If so, don't change too much.
        layer2_delta = layer2_error * sigmoid_derivative(layer2)
    
        # Back propagation (layer2 -> layer1)
        # How much did each layer1 value contribute to the layer2 error (according to the weights)?
        layer1_error = np.dot(layer2_delta, W2.T)
        layer1_delta = layer1_error * sigmoid_derivative(layer1)
    
        # update weights
        W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
        W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
        #print(np.dot(layer0.T, layer1_delta))
        #print(epoch_n, list((layer2)))
    
        # Log the loss value as we proceed through the epochs.
        losses.append(layer2_error.mean())
    

    如何正确更新权重?

    完整代码:

    from itertools import chain
    import matplotlib.pyplot as plt
    import numpy as np
    np.random.seed(0)
    
    def sigmoid(x): # Returns values that sums to one.
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(sx):
        # See https://math.stackexchange.com/a/1225116
        return sx * (1 - sx)
    
    # Cost functions.
    def mse(predicted, truth):
        return np.sum(np.square(truth - predicted))
    
    X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
    Y = xor_output = np.array([[0,1,1,0]]).T
    
    # Define the shape of the weight vector.
    num_data, input_dim = X.shape
    # Lets set the dimensions for the intermediate layer.
    hidden_dim = 5
    # Initialize weights between the input layers and the hidden layer.
    W1 = np.random.random((input_dim, hidden_dim))
    
    # Define the shape of the output vector. 
    output_dim = len(Y.T)
    # Initialize weights between the hidden layers and the output layer.
    W2 = np.random.random((hidden_dim, output_dim))
    
    # Initialize weigh
    num_epochs = 10000
    learning_rate = 0.3
    
    losses = []
    
    for epoch_n in range(num_epochs):
        layer0 = X
        # Forward propagation.
    
        # Inside the perceptron, Step 2. 
        layer1 = sigmoid(np.dot(layer0, W1))
        layer2 = sigmoid(np.dot(layer1, W2))
    
        # Back propagation (Y -> layer2)
    
        # How much did we miss in the predictions?
        layer2_error = mse(layer2, Y)
    
        #print(layer2_error)
        # In what direction is the target value?
        # Were we really close? If so, don't change too much.
        layer2_delta = layer2_error * sigmoid_derivative(layer2)
    
        # Back propagation (layer2 -> layer1)
        # How much did each layer1 value contribute to the layer2 error (according to the weights)?
        layer1_error = np.dot(layer2_delta, W2.T)
        layer1_delta = layer1_error * sigmoid_derivative(layer1)
    
        # update weights
        W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
        W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
        #print(np.dot(layer0.T, layer1_delta))
        #print(epoch_n, list((layer2)))
    
        # Log the loss value as we proceed through the epochs.
        losses.append(layer2_error.mean())
    
    # Visualize the losses
    plt.plot(losses)
    plt.show()
    

    我有没有遗漏什么东西?

    也许我错过了从成本到第二层的衍生产品?


    编辑

    我意识到我错过了从成本到第二层的部分导数,在添加它之后:

    # Cost functions.
    def mse(predicted, truth):
        return 0.5 * np.sum(np.square(predicted - truth)).mean()
    
    def mse_derivative(predicted, truth):
        return predicted - truth
    

    通过更新后的跨时期的反向传播循环:

    for epoch_n in range(num_epochs):
        layer0 = X
        # Forward propagation.
    
        # Inside the perceptron, Step 2. 
        layer1 = sigmoid(np.dot(layer0, W1))
        layer2 = sigmoid(np.dot(layer1, W2))
    
        # Back propagation (Y -> layer2)
    
        # How much did we miss in the predictions?
        cost_error = mse(layer2, Y)
        cost_delta = mse_derivative(layer2, Y)
    
        #print(layer2_error)
        # In what direction is the target value?
        # Were we really close? If so, don't change too much.
        layer2_error = np.dot(cost_delta, cost_error)
        layer2_delta = layer2_error *  sigmoid_derivative(layer2)
    
        # Back propagation (layer2 -> layer1)
        # How much did each layer1 value contribute to the layer2 error (according to the weights)?
        layer1_error = np.dot(layer2_delta, W2.T)
        layer1_delta = layer1_error * sigmoid_derivative(layer1)
    
        # update weights
        W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
        W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
    

    它似乎在训练和学习异或…

    但现在的问题是 layer2_error layer2_delta 计算正确,即 代码的以下部分是否正确?

    # How much did we miss in the predictions?
    cost_error = mse(layer2, Y)
    cost_delta = mse_derivative(layer2, Y)
    
    #print(layer2_error)
    # In what direction is the target value?
    # Were we really close? If so, don't change too much.
    layer2_error = np.dot(cost_delta, cost_error)
    layer2_delta = layer2_error *  sigmoid_derivative(layer2)
    

    在…上做点积对吗? cost_delta cost_error 对于 第二层误差 ?或将 第二层误差 就等于 科斯塔三角洲 ?

    即。

    # How much did we miss in the predictions?
    cost_error = mse(layer2, Y)
    cost_delta = mse_derivative(layer2, Y)
    
    #print(layer2_error)
    # In what direction is the target value?
    # Were we really close? If so, don't change too much.
    layer2_error = cost_delta
    layer2_delta = layer2_error *  sigmoid_derivative(layer2)
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   kmario23 Mazdak    6 年前

    是的,把残差乘起来是正确的( cost_error )更新权重时使用delta值。

    但是,不管是不是点积都不重要,因为它 成本误差 是一个标量。所以,简单的乘法就足够了。但是,我们必须乘以成本函数的梯度,因为这就是我们开始后向传播的地方(也就是说,它是后向传播的入口)。

    此外,还可以简化以下功能:

    def mse(predicted, truth):
        return 0.5 * np.sum(np.square(predicted - truth)).mean()
    

    作为

    def mse(predicted, truth):
        return 0.5 * np.mean(np.square(predicted - truth))