我想将.h5 gpt2培养基模型转换为ggml。更多详情:
https://github.com/ggerganov/ggml/issues/745
该过程使用ggml库中的脚本,该脚本调用转换函数。模型中有一个很小的差异,它是用50255 vocab.size创建的,而原始的GPT2是50257,我在读取该层时确实通过在脚本中添加填充来修复了这个问题。
类似的东西
@tf.function
def eager_f(symbolic_weight):
print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
paddings = tf.constant([[0, 2], [0,0]]) #add 2 after dim 0
symbolic_weight = tf.pad(symbolic_weight, paddings, "constant", 0)
print(symbolic_weight.shape)
return symbolic_weight
这是在脚本modeling_tf_utils.py中
def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
#(...)
if saved_weight_value is not None:
print("saved_weight_value=",saved_weight_value)
print(saved_weight_value.shape)
# Check if the shape of the current weight and the one from the H5 file are different
print("SAVED_WEIGHT")
print(saved_weight_value)
print(saved_weight_value.shape)
if saved_weight_value.shape[0] == 50255:
saved_weight_value = eager_f(saved_weight_value)
print("AFTER PADDING SAVED_WEIGHT:")
print(saved_weight_value)
print(saved_weight_value.shape)
ss = input("Press a key...")
It goes thorught the reading of the tf model and then it crashes when starting to map it to pytorch with shapes mismatch which is gross, it's [50257,1024] (tf) to [1024,1024].
```python
(...)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
return load_tf2_model_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
return load_tf2_weights_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
return load_tf2_state_dict_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
…/变压器/建模_tf_utils.py
def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
mismatched_layers = []
# Read the H5 file
with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
# Retrieve the name of each layer from the H5 file
saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
...
从tf模型读出的正向传递来看,第二层具有这种形状。
我提到了两个线程,它们还有其他与gpt2转换不匹配相关的问题,但它们似乎是不同的情况,而且更老。
此外,从上次的错误日志来看,转换过程似乎将相同的[502571024]形状/张量应用于许多其他[10241024]、[1024]、[10243072]、[3072]、…、,[1024, 4096], [4096, 1024] ... 要么它没有从tf部分前进一些指针,要么它试图前进,因为不匹配,我不知道——我还没有研究过这部分代码。
<method-wrapper '__repr__' of TFGPT2MainLayer object at 0x7fee7e878460>
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/wte/embeddings:0' shape=(50255, 1024) dtype=float32, numpy=
array([[ 0.00544963, -0.01376201, 0.00010876, ..., -0.03386341,
0.00794204, 0.02500119],
...,
0.01859283, 0.01723549]], dtype=float32)>
(50257, 1024)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/wpe/embeddings:0' shape=(1024, 1024) dtype=float32, numpy=
array([[ 0.02799516, 0.02006585, -0.0060562 , ..., 0.00939397,
...
0.00648996, -0.0052477 ]], dtype=float32)>
(1024, 1024)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/gamma:0' shape=(1024,) dtype=float32, numpy=array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/beta:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/weight:0' shape=(1024, 3072) dtype=float32,
...
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/bias:0' shape=(1, 3072) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
(1, 3072)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_proj/weight:0' shape=(1024, 1024) dtype=float32, numpy=
(1024,)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
return load_tf2_model_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
return load_tf2_weights_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
return load_tf2_state_dict_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
(...)