代码之家  ›  专栏  ›  技术社区  ›  Twenkid

将GPT2 h5型号转换为割炬,以转换为ggml形状不匹配

  •  0
  • Twenkid  · 技术社区  · 1 年前

    我想将.h5 gpt2培养基模型转换为ggml。更多详情: https://github.com/ggerganov/ggml/issues/745

    该过程使用ggml库中的脚本,该脚本调用转换函数。模型中有一个很小的差异,它是用50255 vocab.size创建的,而原始的GPT2是50257,我在读取该层时确实通过在脚本中添加填充来修复了这个问题。

    类似的东西

    @tf.function
    def eager_f(symbolic_weight):
      print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
      paddings = tf.constant([[0, 2], [0,0]])  #add 2 after dim 0
      symbolic_weight = tf.pad(symbolic_weight, paddings,  "constant", 0) 
      print(symbolic_weight.shape)
      return symbolic_weight
    

    这是在脚本modeling_tf_utils.py中

    def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
     #(...)
     if saved_weight_value is not None:
                            print("saved_weight_value=",saved_weight_value)
                            print(saved_weight_value.shape)
                            # Check if the shape of the current weight and the one from the H5 file are different
                            print("SAVED_WEIGHT")
                            print(saved_weight_value)
                            print(saved_weight_value.shape)
                            if saved_weight_value.shape[0] == 50255:
                               saved_weight_value = eager_f(saved_weight_value)
                               print("AFTER PADDING SAVED_WEIGHT:")
                               print(saved_weight_value)
                               print(saved_weight_value.shape)
                               ss = input("Press a key...")
    
    It goes thorught the reading of the tf model and then it crashes when starting to map it to pytorch with shapes mismatch which is gross, it's [50257,1024] (tf) to [1024,1024]. 
    
    ```python
    (...)
     K.int_shape(symbolic_weight)= (1024,)
    Traceback (most recent call last):
      File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
        model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
        model, loading_info = load_tf2_checkpoint_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
        return load_tf2_model_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
        return load_tf2_weights_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
        return load_tf2_state_dict_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
        missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
      File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for GPT2Model:
            size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
    

    …/变压器/建模_tf_utils.py

    def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
        mismatched_layers = []
    
        # Read the H5 file
        with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
            # Retrieve the name of each layer from the H5 file
            saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
     ...
    

    从tf模型读出的正向传递来看,第二层具有这种形状。

    我提到了两个线程,它们还有其他与gpt2转换不匹配相关的问题,但它们似乎是不同的情况,而且更老。

    此外,从上次的错误日志来看,转换过程似乎将相同的[502571024]形状/张量应用于许多其他[10241024]、[1024]、[10243072]、[3072]、…、,[1024, 4096], [4096, 1024] ... 要么它没有从tf部分前进一些指针,要么它试图前进,因为不匹配,我不知道——我还没有研究过这部分代码。

    <method-wrapper '__repr__' of TFGPT2MainLayer object at 0x7fee7e878460>
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wte/embeddings:0' shape=(50255, 1024) dtype=float32, numpy=
    array([[ 0.00544963, -0.01376201,  0.00010876, ..., -0.03386341,
             0.00794204,  0.02500119],    
           ...,
             0.01859283,  0.01723549]], dtype=float32)>
    (50257, 1024)
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wpe/embeddings:0' shape=(1024, 1024) dtype=float32, numpy=
    array([[ 0.02799516,  0.02006585, -0.0060562 , ...,  0.00939397,
          ...
             0.00648996, -0.0052477 ]], dtype=float32)>
    (1024, 1024)
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/gamma:0' shape=(1024,) dtype=float32, numpy=array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)>
    (1024,)
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/beta:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>
    (1024,)
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/weight:0' shape=(1024, 3072) dtype=float32, 
    ...
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/bias:0' shape=(1, 3072) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
    (1, 3072)
    SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_proj/weight:0' shape=(1024, 1024) dtype=float32, numpy=
    
    (1024,)
    K.int_shape(symbolic_weight)= (1024,)
    Traceback (most recent call last):
      File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
        model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
        model, loading_info = load_tf2_checkpoint_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
        return load_tf2_model_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
        return load_tf2_weights_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
        return load_tf2_state_dict_in_pytorch_model(
      File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
        missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
      File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for GPT2Model:
            size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
            size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
            size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
            size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
            size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
            size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
            size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
            size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        (...)
    
    0 回复  |  直到 1 年前