代码之家  ›  专栏  ›  技术社区  ›  blue-sky

在量化之前将值解量化到原始值

  •  0
  • blue-sky  · 技术社区  · 5 年前

    论文《小前馈网络的自然语言处理》 https://arxiv.org/pdf/1708.00214.pdf 国家:

    enter image description here

    我已经在python中按照上面的公式实现了量化:

    b = 128
    
    embedding_matrix = [[20000,3000,1000],[1999999,20000,1999999], [20000,3000,1000]]
    
    scaled = [ abs(round( (1 / (b - 1) * max(e)) , 3)) for e in embedding_matrix]
    
    print(scaled)
    
    i = 0
    
    quantized = []
    for e in embedding_matrix :
        for v in e : 
            quantized.append((v , math.floor(.5 + ( (v / scaled[i]) + b) )))
        i = i + 1
        
    quantized
    

    运行此代码 quantized 设置为:

    [(20000, 255),
     (3000, 147),
     (1000, 134),
     (1999999, 255),
     (20000, 129),
     (1999999, 255),
     (20000, 255),
     (3000, 147),
     (1000, 134)]
    

    如何在量化之前反量化回原始值?

    阅读 https://www.tensorflow.org/api_docs/python/tf/quantization/dequantize 描写:

    tf.quantization.dequantize(
        input, min_range, max_range, mode='MIN_COMBINED', name=None, axis=None,
        narrow_range=False, dtype=tf.dtypes.float32
    )
    
    [min_range, max_range] are scalar floats that specify the range for the output. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents.
    

    https://pytorch.org/docs/stable/quantization.html

    似乎实现量化与以上实现不同?

    0 回复  |  直到 5 年前
        1
  •  2
  •   Alexander Pivovarov    5 年前

    他们在报纸上所做的大致是这样的:

    import numpy as np
    
    b = 128
    
    embedding_matrix = np.array([[20000,3000,1000,1000],[1999999,20000,1999999,1999999], [20000,3000,1000,1000]])
    scales = (np.abs(embedding_matrix).max(axis=1) / (b-1)).reshape(-1, 1)
    quantized = (embedding_matrix / scales + b + 0.5).astype(np.uint8)
    dequantized = (quantized - b) * scales
    print(quantized)
    print(dequantized)
    

    [[255 147 134 134]
     [255 129 255 255]
     [255 147 134 134]]
    [[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]
     [1.99999900e+06 1.57480236e+04 1.99999900e+06 1.99999900e+06]
     [2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]]
    

    简言之,他们只有 q_ij = round(e_ij / s_i + b) ,所以在得到量子化值之后 q_ij q_ij = dequantized_ij / s_i + b ,所以 dequantized_ij = (q_ij - b) * s_i

    torch.quantize_per_channel e、 g以下代码的作用基本相同:

    import torch
    t = torch.tensor(embedding_matrix, dtype=torch.float32)
    zero_point = torch.tensor([b]).repeat(t.shape[0], 1).reshape(-1)
    quantized_tensor = torch.quantize_per_channel(t, t.abs().max(axis=1)[0] / (b-1), zero_point, 0, torch.quint8)
    print(quantized_tensor)
    print(quantized_tensor.int_repr())
    

    输出:

    tensor([[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02],
            [2.0000e+06, 1.5748e+04, 2.0000e+06, 2.0000e+06],
            [2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02]], size=(3, 4),
           dtype=torch.quint8, quantization_scheme=torch.per_channel_affine,
           scale=tensor([  157.4803, 15748.0234,   157.4803], dtype=torch.float64),
           zero_point=tensor([128, 128, 128]), axis=0)
    tensor([[255, 147, 134, 134],
            [255, 129, 255, 255],
            [255, 147, 134, 134]], dtype=torch.uint8)
    

    如果在pytorch中按这样的通道进行量化,则只能应用 .dequantize() repr_int , q_per_channel_zero_points ,和 q_per_channel_scales

    这能回答你的问题吗?

    推荐文章