代码之家  ›  专栏  ›  技术社区  ›  blue-sky

损失函数的选择

  •  0
  • blue-sky  · 技术社区  · 5 年前

    以下是word2vec的实现:

    %reset -f
    
    import torch
    from torch.autograd import Variable
    import numpy as np
    import torch.functional as F
    import torch.nn.functional as F
    
    corpus = [
        'this test',
        'this separate test'
    ]
    
    def get_input_layer(word_idx):
        x = torch.zeros(vocabulary_size).float()
        x[word_idx] = 1.0
        return x
    
    def tokenize_corpus(corpus):
        tokens = [x.split() for x in corpus]
        return tokens
    
    tokenized_corpus = tokenize_corpus(corpus)
    
    vocabulary = []
    for sentence in tokenized_corpus:
        for token in sentence:
            if token not in vocabulary:
                vocabulary.append(token)
    
    word2idx = {w: idx for (idx, w) in enumerate(vocabulary)}
    idx2word = {idx: w for (idx, w) in enumerate(vocabulary)}
    
    window_size = 2
    idx_pairs = []
    # for each sentence
    for sentence in tokenized_corpus:
        indices = [word2idx[word] for word in sentence]
        # for each word, threated as center word
        for center_word_pos in range(len(indices)):
            # for each window position
            for w in range(-window_size, window_size + 1):
                context_word_pos = center_word_pos + w
                # make soure not jump out sentence
                if context_word_pos < 0 or context_word_pos >= len(indices) or center_word_pos == context_word_pos:
                    continue
                context_word_idx = indices[context_word_pos]
                idx_pairs.append((indices[center_word_pos], context_word_idx))
    
    idx_pairs = np.array(idx_pairs) # it will be useful to have this as numpy array
    
    vocabulary_size = len(vocabulary)
    
    embedding_dims = 4
    W1 = Variable(torch.randn(embedding_dims, vocabulary_size).float(), requires_grad=True)
    W2 = Variable(torch.randn(vocabulary_size, embedding_dims).float(), requires_grad=True)
    num_epochs = 1
    learning_rate = 0.001
    
    for epo in range(num_epochs):
        loss_val = 0
        for data, target in idx_pairs:
            x = Variable(get_input_layer(data)).float()
            y_true = Variable(torch.from_numpy(np.array([target])).long())
    
            z1 = torch.matmul(W1, x)
            z2 = torch.matmul(W2, z1)
    
            log_softmax = F.log_softmax(z2, dim=0)
    
            loss = F.nll_loss(log_softmax.view(1,-1), y_true)
            print(float(loss))
    
            loss_val += loss.data.item()
            loss.backward()
            W1.data -= learning_rate * W1.grad.data
            W2.data -= learning_rate * W2.grad.data
    
            W1.grad.data.zero_()
            W2.grad.data.zero_()
    
            print(W1.shape)
            print(W2.shape)
    
            print(f'Loss at epo {epo}: {loss_val/len(idx_pairs)}')
    

    这张照片是:

    0.33185482025146484
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 0.041481852531433105
    3.302438735961914
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 0.45428669452667236
    2.3144636154174805
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 0.7435946464538574
    0.33418864011764526
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 0.7853682264685631
    1.0644199848175049
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 0.9184207245707512
    0.4970806837081909
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 0.980555810034275
    3.2861199378967285
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 1.3913208022713661
    6.170125961303711
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 2.16258654743433
    

    修改代码以使用mse_loss change y_true来浮动:

    y_true = Variable(torch.from_numpy(np.array([target])).float())
    

    使用mse_损耗:

    loss = F.mse_loss(log_softmax.view(1,-1), y_true)
    

    合并更新:

    for epo in range(num_epochs):
        loss_val = 0
        for data, target in idx_pairs:
            x = Variable(get_input_layer(data)).float()
            y_true = Variable(torch.from_numpy(np.array([target])).float())
    
            z1 = torch.matmul(W1, x)
            z2 = torch.matmul(W2, z1)
    
            log_softmax = F.log_softmax(z2, dim=0)
    
            loss = F.mse_loss(log_softmax.view(1,-1), y_true)
            print(float(loss))
    
            loss_val += loss.data.item()
            loss.backward()
            W1.data -= learning_rate * W1.grad.data
            W2.data -= learning_rate * W2.grad.data
    
            W1.grad.data.zero_()
            W2.grad.data.zero_()
    
            print(W1.shape)
            print(W2.shape)
    
            print(f'Loss at epo {epo}: {loss_val/len(idx_pairs)}')
    

    现在的输出是:

    41.75048828125
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 5.21881103515625
    16.929386138916016
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 7.334984302520752
    50.63690948486328
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 13.664597988128662
    36.21110534667969
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 18.190986156463623
    5.304859638214111
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 18.854093611240387
    9.802173614501953
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 20.07936531305313
    15.515325546264648
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 22.018781006336212
    30.408292770385742
    torch.Size([4, 3])
    torch.Size([3, 4])
    Loss at epo 0: 25.81981760263443
    
    -c:12: UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
    

    为什么mse损失不如nll损失有效?是否与PyTorch警告有关:

    Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
    

    ?

    0 回复  |  直到 5 年前
        1
  •  1
  •   Michael Jungo    5 年前

    输入和目标的大小必须相同 nn.MSELoss ,因为它是通过平方 第i次 第i次 目标元素,即。 mse_i = (input_i - target_i) ** 2 .

    此外,您的目标是范围内的非负整数 [0,词汇大小] ,但您使用的是log softmax,它的值在范围内 [-∞, 0] .使用MSE的想法是将预测值设置为与目标相同的值,但这两个间隔的唯一重叠是0。这意味着除0之外的所有类都是不可访问的。

    MSE是一个回归损失函数,在这种情况下并不合适,因为您处理的是分类数据。