文本生成:从 Seq2Seq 到 GPT 的演进

张开发
2026/5/11 20:21:01 15 分钟阅读

分享文章

文本生成:从 Seq2Seq 到 GPT 的演进
文本生成从 Seq2Seq 到 GPT 的演进1. 技术分析1.1 文本生成技术演进文本生成经历了从规则方法到深度学习的演进文本生成技术路线 规则模板: 基于模板填充 统计语言模型: n-gram 神经语言模型: RNN/LSTM Transformer: GPT/T51.2 文本生成模型对比模型架构特点代表模型RNN/LSTM循环结构序列建模Seq2SeqTransformer注意力机制并行计算GPTT5统一框架多任务T5BERT双向编码理解为主BERT1.3 生成策略对比生成策略 Greedy: 每步选概率最大的 token Beam Search: 保留多个候选 Sampling: 随机采样 Top-K: 限制候选范围 Top-P (Nucleus): 概率质量阈值2. 核心功能实现2.1 RNN 文本生成import torch import torch.nn as nn import torch.nn.functional as F class RNNGenerator(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers2): super().__init__() self.embedding nn.Embedding(vocab_size, embedding_dim) self.lstm nn.LSTM(embedding_dim, hidden_dim, num_layersnum_layers) self.fc nn.Linear(hidden_dim, vocab_size) def forward(self, x, hiddenNone): x self.embedding(x) output, hidden self.lstm(x, hidden) logits self.fc(output) return logits, hidden def generate(self, start_token, max_len100, temperature1.0): self.eval() generated [start_token] hidden None for _ in range(max_len): input_ids torch.tensor([generated[-1]]).unsqueeze(0) with torch.no_grad(): logits, hidden self.forward(input_ids, hidden) logits logits.squeeze(0) / temperature probabilities F.softmax(logits, dim-1) next_token torch.multinomial(probabilities, num_samples1).item() generated.append(next_token) if next_token self.end_token: break return generated2.2 Transformer 文本生成class TransformerGenerator(nn.Module): def __init__(self, vocab_size, d_model512, num_heads8, d_ff2048, num_layers6): super().__init__() self.embedding nn.Embedding(vocab_size, d_model) self.positional_encoding PositionalEncoding(d_model) decoder_layer nn.TransformerDecoderLayer(d_model, num_heads, d_ff) self.decoder nn.TransformerDecoder(decoder_layer, num_layers) self.fc nn.Linear(d_model, vocab_size) def forward(self, tgt, memoryNone, tgt_maskNone): tgt self.embedding(tgt) * torch.sqrt(torch.tensor(self.embedding.embedding_dim, dtypetorch.float32)) tgt self.positional_encoding(tgt) output self.decoder(tgt, memory, tgt_masktgt_mask) output self.fc(output) return output def generate(self, start_token, max_len100, temperature1.0, top_k50): self.eval() generated [start_token] for _ in range(max_len): input_ids torch.tensor([generated]).T tgt_mask nn.Transformer.generate_square_subsequent_mask(len(input_ids)).to(input_ids.device) with torch.no_grad(): logits self.forward(input_ids, tgt_masktgt_mask) logits logits[-1, :] / temperature if top_k 0: v, _ torch.topk(logits, top_k) logits[logits v[-1]] float(-inf) probabilities F.softmax(logits, dim-1) next_token torch.multinomial(probabilities, num_samples1).item() generated.append(next_token) if next_token self.end_token: break return generated2.3 GPT 风格生成class GPTGenerator(nn.Module): def __init__(self, vocab_size, d_model768, num_heads12, d_ff3072, num_layers12): super().__init__() self.transformer nn.Transformer( d_modeld_model, nheadnum_heads, num_encoder_layers0, num_decoder_layersnum_layers, dim_feedforwardd_ff ) self.embedding nn.Embedding(vocab_size, d_model) self.positional_encoding PositionalEncoding(d_model) self.fc nn.Linear(d_model, vocab_size) def forward(self, x): x self.embedding(x) * torch.sqrt(torch.tensor(self.embedding.embedding_dim, dtypetorch.float32)) x self.positional_encoding(x) mask nn.Transformer.generate_square_subsequent_mask(x.size(0)).to(x.device) output self.transformer(x, x, tgt_maskmask) output self.fc(output) return output def generate(self, prompt, tokenizer, max_len100, temperature1.0, top_p0.9): self.eval() input_ids tokenizer.encode(prompt, return_tensorspt).T for _ in range(max_len): with torch.no_grad(): logits self.forward(input_ids) logits logits[-1, :] / temperature if top_p 1.0: sorted_logits, sorted_indices torch.sort(logits, descendingTrue) cumulative_probs torch.cumsum(F.softmax(sorted_logits, dim-1), dim-1) sorted_indices_to_remove cumulative_probs top_p sorted_indices_to_remove[1:] sorted_indices_to_remove[:-1].clone() sorted_indices_to_remove[0] 0 indices_to_remove sorted_indices[sorted_indices_to_remove] logits[indices_to_remove] float(-inf) probabilities F.softmax(logits, dim-1) next_token torch.multinomial(probabilities, num_samples1).item() input_ids torch.cat([input_ids, torch.tensor([[next_token]])], dim0) if next_token tokenizer.eos_token_id: break return tokenizer.decode(input_ids.squeeze().tolist())3. 性能对比3.1 文本生成模型对比模型生成质量训练难度推理速度适用场景RNN中低快简单生成Transformer高中中中等生成GPT-2很高高中复杂生成GPT-3极高很高慢高质量生成3.2 生成策略对比策略多样性连贯性可控性Greedy低高高Beam Search低很高很高Top-K中中中Top-P高高中Temperature可调可调可调3.3 模型大小影响模型参数生成质量训练时间GPT-2 small124M中1周GPT-2 medium355M高2周GPT-2 large774M很高4周GPT-3175B极高数月4. 最佳实践4.1 文本生成模型选择def select_generator(task_type, data_size): if task_type simple: return RNNGenerator(10000, 256, 512) elif task_type medium: return TransformerGenerator(10000, 512, 8, 2048, 6) else: from transformers import GPT2LMHeadModel return GPT2LMHeadModel.from_pretrained(gpt2) class GeneratorFactory: staticmethod def create(config): if config[type] rnn: return RNNGenerator(**config[params]) elif config[type] transformer: return TransformerGenerator(**config[params]) elif config[type] gpt: from transformers import GPT2LMHeadModel return GPT2LMHeadModel.from_pretrained(config[model_name])4.2 文本生成训练流程class TextGenerationTrainer: def __init__(self, model, optimizer, scheduler, loss_fn): self.model model self.optimizer optimizer self.scheduler scheduler self.loss_fn loss_fn def train_step(self, batch): self.optimizer.zero_grad() input_ids batch[input_ids] labels batch[labels] output self.model(input_ids) loss self.loss_fn(output.reshape(-1, output.size(-1)), labels.reshape(-1)) loss.backward() self.optimizer.step() self.scheduler.step() return loss.item() def evaluate(self, dataloader): self.model.eval() total_loss 0 with torch.no_grad(): for batch in dataloader: input_ids batch[input_ids] labels batch[labels] output self.model(input_ids) loss self.loss_fn(output.reshape(-1, output.size(-1)), labels.reshape(-1)) total_loss loss.item() return total_loss / len(dataloader)5. 总结文本生成已进入 Transformer 时代GPT目前最强大的文本生成模型生成策略根据需求选择合适策略模型大小更大模型通常更好但更慢预训练模型推荐使用现成的预训练模型对比数据如下GPT-2 比 RNN 生成质量提升显著Top-P 策略平衡多样性和连贯性温度参数控制随机性推荐使用预训练 GPT 模型进行微调

更多文章