LLM 近期重大架构进化一览：从 Gemma 4 到 DeepSeek V4

AI
5月19日

强哥来了

Gemma 4 E2B 与 E4B 的第一个小型架构改动，是采用了”共享 KV Cache”机制：后续层会复用前面层已经计算出的 Key-Value 状态，从而降低长上下文场景下的显存占用与计算成本 … 根据 DeepSeek V4 论文，在 1M Token Context 下，相比采用 MLA 与 DSA 的 DeepSeek V3.2：DeepSeek V4-Pro 的单 token 推理 FLOPs 仅为后者的 27%，KV Cache 大小仅为后者的 10% … 整体来看，论文确实展示了非常强的最终结果，例如：DeepSeek V4-Flash-Base 在多数 Base Benchmark 上超过 DeepSeek V3.2-Base。

原文连接

{{userData.name}}已认证

LLM 近期重大架构进化一览：从 Gemma 4 到 DeepSeek V4

Ardot

阿里悟空

QClaw

关于我们

商务合作

隐私声明

用户协议

{{userData.name}}已认证

相似站点

Ardot

阿里悟空

QClaw

关于我们

商务合作

隐私声明

用户协议