我正在研究带有双向GRU的序列到序列编码器-解码器模型,以完成阿拉伯语语法错误检测和纠正的任务。我想计算我的模型的F0.5分。
这就是我的数据划分方式:
train_data, valid_data, test_data = torchtext.legacy.data.TabularDataset.splits(
path = '',
train = 'train.csv',
test = 'test.csv',
validation = 'val.csv',
format = 'csv',
fields = fields)
这是我的Seq2Seq代码:
class Seq2Seq(nn.Module):
def __init__(self, encoder, decoder, src_pad_idx, device):
super().__init__()
self.encoder = encoder
self.decoder = decoder
self.src_pad_idx = src_pad_idx
self.device = device
def create_mask(self, src):
mask = (src != self.src_pad_idx).permute(1, 0)
return mask
def forward(self, src, src_len, trg, teacher_forcing_ratio = 0.5):
batch_size = src.shape[1]
trg_len = trg.shape[0]
trg_vocab_size = self.decoder.output_dim
outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)
encoder_outputs, hidden = self.encoder(src, src_len)
input = trg[0,:]
mask = self.create_mask(src)
for t in range(1, trg_len):
output, hidden, _ = self.decoder(input, hidden, encoder_outputs, mask)
outputs[t] = output
teacher_force = random.random() < teacher_forcing_ratio
top1 = output.argmax(1)
input = trg[t] if teacher_force else top1
return outputs
我试着使用sklearn。但我认为我的输出不适合此功能