PyTorch模型部署：从训练到生产环境

张开发

• 2026/5/8 16:19:06 • 15 分钟阅读

分享文章

PyTorch模型部署从训练到生产环境1. 技术分析1.1 部署方式对比方式延迟吞吐量资源消耗适用场景PyTorch JIT低高中生产环境部署ONNX Runtime很低很高低跨平台部署TorchServe中高中云端服务TensorRT极低极高低GPU加速推理1.2 模型格式对比格式优点缺点.ptPyTorch原生仅PyTorch可用.onnx跨框架兼容可能丢失动态图特性TorchScript生产优化需要JIT编译2. 核心功能实现2.1 模型导出为TorchScriptimport torch import torch.nn as nn class SimpleModel(nn.Module): def __init__(self, input_dim10, hidden_dim32, num_classes2): super().__init__() self.fc1 nn.Linear(input_dim, hidden_dim) self.relu nn.ReLU() self.fc2 nn.Linear(hidden_dim, num_classes) self.softmax nn.Softmax(dim1) def forward(self, x): x self.fc1(x) x self.relu(x) x self.fc2(x) return self.softmax(x) def export_to_torchscript(): model SimpleModel() model.eval() # Tracing方式 example_input torch.randn(1, 10) traced_model torch.jit.trace(model, example_input) traced_model.save(model_traced.pt) # Scripting方式保留控制流 scripted_model torch.jit.script(model) scripted_model.save(model_scripted.pt) print(模型已导出) return traced_model def load_and_inference(): model torch.jit.load(model_traced.pt) model.eval() with torch.no_grad(): input_data torch.randn(1, 10) output model(input_data) return output2.2 ONNX导出与优化import torch.onnx import onnxruntime as ort class ResNetLikeModel(nn.Module): def __init__(self, num_classes1000): super().__init__() self.features nn.Sequential( nn.Conv2d(3, 64, 3, padding1), nn.BatchNorm2d(64), nn.ReLU(inplaceTrue), nn.MaxPool2d(2, 2), nn.Conv2d(64, 128, 3, padding1), nn.AdaptiveAvgPool2d((1, 1)) ) self.classifier nn.Linear(128, num_classes) def forward(self, x): x self.features(x) x x.view(x.size(0), -1) return self.classifier(x) def export_to_onnx(): model ResNetLikeModel() model.eval() dummy_input torch.randn(1, 3, 224, 224) torch.onnx.export( model, dummy_input, model.onnx, export_paramsTrue, opset_version11, input_names[input], output_names[output], dynamic_axes{ input: {0: batch_size}, output: {0: batch_size} } ) onnx_model onnx.load(model.onnx) onnx.checker.check_model(onnx_model) print(ONNX模型验证通过) return model.onnx def inference_onnx(onnx_model_path): sess_options ort.SessionOptions() sess_options.intra_op_num_threads 4 ort_session ort.InferenceSession(onnx_model_path, sess_options) import numpy as np input_data np.random.randn(1, 3, 224, 224).astype(np.float32) outputs ort_session.run(None, {input: input_data}) return outputs[0]2.3 TorchServe部署# handler.py import torch from ts.torch_handler.base_handler import BaseHandler class ImageClassifier(BaseHandler): def __init__(self): super().__init__() self.model None self.mapping None def initialize(self, ctx): super().initialize(ctx) self.mapping ctx.model_yaml_config.get(mapping, {}) def preprocess(self, data): images [] for row in data: image row.get(data) or row.get(body) if isinstance(image, bytes): image torch.from_numpy( np.frombuffer(image, dtypenp.float32) ).reshape(3, 224, 224) images.append(torch.stack(images)) return torch.stack(images) def inference(self, data): with torch.no_grad(): outputs self.model(data) return outputs def postprocess(self, data): results [] for output in data: probs torch.softmax(output, dim0) top_prob, top_class torch.topk(probs, 5) results.append([ {class: int(c), probability: float(p)} for c, p in zip(top_class, top_prob) ]) return results3. 性能优化3.1 量化推理import torch.quantization def quantize_model(): model SimpleModel() model.eval() # 动态量化 quantized_model torch.quantization.quantize_dynamic( model, {nn.Linear, nn.ReLU}, dtypetorch.qint8 ) torch.save(quantized_model.state_dict(), model_quantized.pt) return quantized_model def static_quantization(): model SimpleModel() model.train() # Fuse模块 model torch.quantization.fuse_modules(model, [[fc1, relu]]) # 设置量化配置 model.qconfig torch.quantization.get_default_qconfig(fbgemm) torch.quantization.prepare(model, inplaceTrue) # 转换 quantized_model torch.quantization.convert(model, inplaceFalse) return quantized_model3.2 性能测试import time def benchmark_inference(): model SimpleModel() model.eval() model_traced torch.jit.trace(model, torch.randn(1, 10)) model_quantized quantize_model() input_data torch.randn(100, 10) num_iterations 1000 results {} # PyTorch原生 times [] with torch.no_grad(): for _ in range(num_iterations): start time.perf_counter() _ model(input_data) times.append(time.perf_counter() - start) results[PyTorch] sum(times) / len(times) * 1000 # TorchScript times [] for _ in range(num_iterations): start time.perf_counter() _ model_traced(input_data) times.append(time.perf_counter() - start) results[TorchScript] sum(times) / len(times) * 1000 print(推理性能对比 (ms):) for name, ms in results.items(): print(f {name}: {ms:.3f}ms) return results4. 最佳实践4.1 部署架构选择场景推荐方案理由小规模部署Flask TorchScript简单易用中等规模TorchServe官方支持大规模/云端ONNX ONNX Runtime跨平台高性能边缘设备TensorRTGPU加速优化4.2 优化建议# ✅ 推荐使用torch.no_grad()进行推理 with torch.no_grad(): output model(input_data) # ✅ 推荐使用eval()模式 model.eval() # ✅ 推荐使用torch.jit.optimize_for_inference torch.jit.optimize_for_inference def fast_inference(model, input_data): return model(input_data)5. 总结PyTorch模型部署要点TorchScript生产环境首选平衡性能和易用性ONNX跨平台部署的标准格式量化显著降低延迟和内存占用

更多文章

前端开发 2026/5/8 16:19:00

LLM应用的缓存工程2026：让AI响应快10倍的完整技术方案

为什么LLM应用必须认真对待缓存调用一次GPT-4o，大概需要2-5秒，费用0.01-0.1美元。如果你的应用每天有10万次请求，其中有30%是相似或重复的问题，那么每个月你多付出了几万美元，用户还要额外等待数百万秒。缓存不是LLM应…

VMware Workstation Pro 17许可证密钥管理：3个实战策略解决企业虚拟化挑战【免费下载链接】VMware-Workstation-Pro-17-Licence-Keys Free VMware Workstation Pro 17 full license keys. Weve meticulously organized thousands of keys, catering to all major v…

张开发

前端开发 2026/5/8 16:13:54

前端实战项目全解析：从HTML/CSS/JS基础到API集成开发

1. 项目概述与价值定位如果你正在学习前端开发，或者想找一些能直接上手、能跑起来、能放进简历里的实战项目，那么isinsuatay/HTML-CSS-JAVASCRIPT-PROJECTS这个开源仓库绝对值得你花时间好好研究一下。它不是那种只教你写一个“Hello World”按钮的教程&…

张开发

PyTorch模型部署：从训练到生产环境

最新文章

如何让非NVIDIA显卡运行CUDA程序：ZLUDA终极指南

CTFshow文件上传刷题

AI编码助手集成PDF处理技能：用自然语言指令自动化文档工作流

对AI（s-44）的压力测试-身份否定与反扮演指令压力实测

AI辅助CTF解题：提示词工程与安全研究新范式

AI驱动的三层代码审查体系：从快速扫描到交叉验证的智能防御

推荐文章

全面掌握AssetRipper：从Unity资源提取到多平台部署的完整指南

LLM个性化评估技术：方法与实战解析

终极AI翻唱生成器AICoverGen：零代码实现专业级声线定制与歌曲翻唱

从流水灯到中断处理：手把手教你用Verilog在FPGA上玩转MIPS模型机

NVIDIA Omniverse Kit 106：云端OpenUSD应用开发指南

开发者如何将ChatGPT无缝集成到本地开发环境与工作流

相关文章

R 4.5新增s2_geometry()函数实测：全球10亿点集距离计算耗时从47分钟降至89秒（附基准测试完整复现代码）

Hotkey Detective：3分钟解决Windows热键冲突的完整指南

5步掌握跨平台数据采集：MediaCrawler智能爬虫工具终极指南

预推免‘赶考’全记录：一周内辗转广州、长沙四场线下复试的真实体验与行程攻略

HALCON 20110 + Python 3.8 环境搭建避坑指南：从dll配置到复杂测量功能实现

算法公平性审查官认证考试全攻略：软件测试从业者的进阶之路

分享文章

更多文章

LLM应用的缓存工程2026：让AI响应快10倍的完整技术方案

别再盲目刷签到墙！2026真正值得带笔记本去的6场AI大会：含3场提供可复现代码库+论文复现实验环境即时部署服务

Spring Data 2027 @Query 注解：灵活构建自定义查询

chinese-address-generator：高效中国地址生成解决方案

LeetCode 存在重复元素题解

Diablo Edit2：暗黑破坏神2角色编辑器完整使用指南

WELearn网课助手终极指南：5分钟掌握智能学习工具

深度解析WSA-Windows-10：在Windows 10上运行Android应用的技术实现

中小团队如何通过Taotoken统一管理多个AI模型的API成本

从中国1:100万地图到美国国家平面坐标：聊聊兰伯特等角割圆锥投影那些‘隐藏’的行业应用与选择逻辑

VMware Workstation Pro 17许可证密钥管理：3个实战策略解决企业虚拟化挑战

前端实战项目全解析：从HTML/CSS/JS基础到API集成开发