代码之家 › 专栏 › 技术社区 › Twenkid

将GPT2 h5型号转换为割炬,以转换为ggml形状不匹配

gpt-2 pytorch tensorflow python

Twenkid · 技术社区 · 1 年前

我想将.h5 gpt2培养基模型转换为ggml。更多详情: https://github.com/ggerganov/ggml/issues/745

该过程使用ggml库中的脚本,该脚本调用转换函数。模型中有一个很小的差异,它是用50255 vocab.size创建的,而原始的GPT2是50257,我在读取该层时确实通过在脚本中添加填充来修复了这个问题。

类似的东西

@tf.function
def eager_f(symbolic_weight):
  print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
  paddings = tf.constant([[0, 2], [0,0]])  #add 2 after dim 0
  symbolic_weight = tf.pad(symbolic_weight, paddings,  "constant", 0) 
  print(symbolic_weight.shape)
  return symbolic_weight

这是在脚本modeling_tf_utils.py中

def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
 #(...)
 if saved_weight_value is not None:
                        print("saved_weight_value=",saved_weight_value)
                        print(saved_weight_value.shape)
                        # Check if the shape of the current weight and the one from the H5 file are different
                        print("SAVED_WEIGHT")
                        print(saved_weight_value)
                        print(saved_weight_value.shape)
                        if saved_weight_value.shape[0] == 50255:
                           saved_weight_value = eager_f(saved_weight_value)
                           print("AFTER PADDING SAVED_WEIGHT:")
                           print(saved_weight_value)
                           print(saved_weight_value.shape)
                           ss = input("Press a key...")

It goes thorught the reading of the tf model and then it crashes when starting to map it to pytorch with shapes mismatch which is gross, it's [50257,1024] (tf) to [1024,1024]. 

```python
(...)
 K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
  File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
    model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
    model, loading_info = load_tf2_checkpoint_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
    return load_tf2_model_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
    return load_tf2_weights_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
    return load_tf2_state_dict_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
    missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
  File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
        size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).

…/变压器/建模_tf_utils.py

def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
    mismatched_layers = []

    # Read the H5 file
    with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
        # Retrieve the name of each layer from the H5 file
        saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
 ...

从tf模型读出的正向传递来看,第二层具有这种形状。

我提到了两个线程,它们还有其他与gpt2转换不匹配相关的问题,但它们似乎是不同的情况,而且更老。

此外,从上次的错误日志来看,转换过程似乎将相同的[502571024]形状/张量应用于许多其他[10241024]、[1024]、[10243072]、[3072]、…、,[1024, 4096], [4096, 1024] ... 要么它没有从tf部分前进一些指针,要么它试图前进,因为不匹配,我不知道——我还没有研究过这部分代码。

<method-wrapper '__repr__' of TFGPT2MainLayer object at 0x7fee7e878460>
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wte/embeddings:0' shape=(50255, 1024) dtype=float32, numpy=
array([[ 0.00544963, -0.01376201,  0.00010876, ..., -0.03386341,
         0.00794204,  0.02500119],    
       ...,
         0.01859283,  0.01723549]], dtype=float32)>
(50257, 1024)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wpe/embeddings:0' shape=(1024, 1024) dtype=float32, numpy=
array([[ 0.02799516,  0.02006585, -0.0060562 , ...,  0.00939397,
      ...
         0.00648996, -0.0052477 ]], dtype=float32)>
(1024, 1024)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/gamma:0' shape=(1024,) dtype=float32, numpy=array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/beta:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/weight:0' shape=(1024, 3072) dtype=float32, 
...
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/bias:0' shape=(1, 3072) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
(1, 3072)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_proj/weight:0' shape=(1024, 1024) dtype=float32, numpy=

(1024,)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
  File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
    model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
    model, loading_info = load_tf2_checkpoint_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
    return load_tf2_model_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
    return load_tf2_weights_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
    return load_tf2_state_dict_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
    missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
  File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
        size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
    (...)

0 回复 | 直到 1 年前