智能体 AI 的前沿:架构、算法与向自主推理的转变
智能体人工智能(agentic AI)的兴起标志着机器学习领域的一次根本性转型:从被动的生成式系统,转向能够主动与环境交互的自主实体。1 早期的大语言模型(LLM)主要聚焦于文本合成与模式模仿;而当代智能体 AI 则将这些能力组织进面向多步推理、迭代规划与独立行动的复杂架构中。2 这一范式迁移体现为从“浅推理”(模型将查询映射为即时回答)走向“深推理”框架:计算量随问题而变、自我反思,并与外部工具形成闭环交互。2 本报告对截至 2026 年初智能体 AI 的机器学习模型、底层算法与架构演进进行全面分析。
智能体架构:从手工流水线到模型原生系统
AI 智能体的结构组成将其与标准基础模型区分开来。现代智能体系统通常被概念化为一组模块生态,用以扩展 LLM 的核心能力。1 这些系统集成五个基础模块:感知(perception)、规划(planning)、行动(action)、记忆(memory)与反馈(feedback)。1 感知负责摄取并编码多模态输入(文本、视觉数据、传感反馈等),建立当前环境状态的表示。4 规划作为策略控制器,将高层目标分解为细粒度子任务。5 行动模块通过 API、图形界面(GUI)或机器人执行器与数字/物理世界交互。3 记忆保证长时间跨度任务的上下文持续性,而反馈回路则允许基于试错过程对策略进行实时优化。1
该架构中的关键演进,是从“流水线式(pipeline-based)”范式转向“模型原生(model-native)”系统。3 在流水线式方法中,智能体能力由外部、往往脆弱的逻辑或“胶水代码(glue code)”管理,用来编排 LLM 与环境之间的交互。3 相比之下,模型原生范式试图通过大规模强化学习(RL)将这些能力直接内化进模型参数之中。3 这使模型从被动文本生成器演化为目标导向的智能体,在行动之前先“思考”,这一点已在 DeepSeek-R1 与 OpenAI o1 系列等近期突破中有所体现。3
智能体范式对比
| Feature | Pipeline-based Paradigm | Model-native Paradigm |
|---|---|---|
| Logic Orchestration | Handcrafted external pipelines and "glue code" 3 | Internalized capabilities within model parameters 3 |
| Flexibility | Rigid; often fails in novel or dynamic scenarios 3 | Highly adaptive; learns through outcome-driven exploration 3 |
| Optimization | Modular; each component is tuned separately 3 | End-to-end; joint optimization via Reinforcement Learning 3 |
| Reasoning | External prompts (e.g., "Think step by step") 3 | Autonomous internal "thinking" before response 3 |
| Memory Management | External storage (RAG, sliding windows) 3 | Learned context policies and architectural enhancements 3 |
这一演进的核心动力来自强化学习,它是推动转型的“算法引擎”。3 通过将学习过程从对静态数据集的模仿,重构为在行动空间中基于结果奖励进行探索,模型得以自我纠错并发现更具创新性的解题轨迹。3
推理与规划:策略的算法基础
智能体成功的核心在于规划能力。规划算法帮助智能体在巨大的可能行动搜索空间中找到通往目标的路径。这通常涉及形式化的任务分解模型:高层任务(图略)被转换为由有向无环依赖图所约束的子任务层级(图略)。5
推理范式与框架
引导模型进行推理的方法已经发展出若干清晰的模式。Chain-of-Thought (CoT) 仍是基础方法,引导模型在得出结论前先表达逻辑步骤。12 Tree-of-Thought (ToT) 等扩展将其推广为分支结构,使模型能够同时探索多条推理路径。12 这支持回溯(backtracking),模拟人类在数学证明或解谜中遇到矛盾时放弃路径的认知策略。13
另一个关键模式是 ReAct (Reason + Act),它将推理轨迹与具体行动/观察交织在一起。12 这种“思考-行动-观察”循环使智能体能利用环境反馈更新上下文,但若模型无法推进,也可能陷入无限循环。14 更高级的方法如 Reflexion 与 Self-Refine 引入元认知层:智能体对先前输出进行自我批判以识别错误并改进后续行为。1
智能体推理中的高级搜索算法
在 2024 与 2025 年,将经典搜索算法与神经推理相结合已成为主流趋势,尤其在代码生成与科学发现等高风险领域。15 这些算法使智能体能比简单的贪婪解码更系统地探索推理轨迹。
| Algorithm | Mechanism of Action | Key Agentic Application |
|---|---|---|
| Monte Carlo Tree Search (MCTS) | Balances exploration/exploitation through iterative tree traversal | Text-based game agents; workflow optimization (e.g., AFlow) 15 |
| A* Search | Uses heuristics to find the shortest path to a goal state | Path planning in complex environments; action space navigation 15 |
| Beam Search | Maintains a fixed number of top-performing candidate paths | Knowledge-guided Retrieval Augmented Generation (RAG) 15 |
| Bayesian Optimization | Models the objective function to find optimal solutions | Hyperparameter tuning and search over chemical spaces 15 |
| Evolutionary Search | Iteratively selects and mutates the best solutions | Formula discovery; optimization of large-scale agentic workflows 15 |
近期研究强调“测试时计算(test-time compute)”的扩展价值:让模型花更多时间搜索解而非立即输出。15 多项研究表明,优化测试时计算可能比单纯增加参数规模更有效。15 例如,AFlow 框架使用 MCTS 自动发现并优化智能体工作流,将设计过程重述为在代码表示的序列空间中进行搜索的问题。16
强化学习与自我进化的“Aha 时刻”
智能体 AI 的成熟与强化学习(RL)紧密相连,尤其当模型摆脱人类标注数据的限制时更是如此。9 DeepSeek-R1 的训练提供了一个关键案例,展示 RL 如何激励推理能力。9
DeepSeek-R1 的训练流水线
不同于依赖监督微调(SFT)来“教会”推理的传统模型,DeepSeek-R1-Zero 证明了:推理行为(如自我验证与反思)可以仅通过在基础模型上进行 RL 而涌现。9 这种自我进化由基于规则的奖励模型驱动,它能对数学或编码任务的正确性提供自动化反馈。11
改进后的 DeepSeek-R1 流水线包含多个阶段,以兼顾性能与可读性:
- 冷启动(Cold Start): 先用数千条“冷启动”样本对基础模型进行微调,为推理提供初始结构,然后再过渡到纯 RL。9
- 多阶段 RL 优化: 模型经历迭代 RL 阶段,对准确性(数学/编码正确解)与格式(例如正确使用 <think> 标签暴露推理过程)给予奖励。11
- “Aha 时刻”的发现: RL-only 训练的重要发现之一是“aha moments”的出现:模型在计算中途识别到自身逻辑错误,重新评估之前步骤并调整路径以得到正确答案。11
- 蒸馏(Distillation): 为获得高效智能体,将大型模型发现的推理模式蒸馏到更小的致密模型(如 Qwen 或 Llama 系列)中。这些蒸馏模型往往能显著超过通过传统方法训练、但体量更大的开源模型。9
这些奖励通常使用 Group Relative Policy Optimization (GRPO) 等形式化方法,在 AIME 2024 等推理基准上显著提升表现,例如 DeepSeek-R1-Zero 的 pass@1 从 15.6% 提升到 71.0%。18
大型行动模型(LAM):连接逻辑与执行
大型行动模型(Large Action Models, LAMs)代表了从文本交互迈向自主执行的下一步,重点在于在数字或物理环境中自动完成任务。22 LLM 也许能描述如何订机票,但 LAM 被设计为能导航网页、输入数据并完成交易。6
LAM 的运行循环
LAM 的开发需要将感知、规划与控制整合进统一框架。4 这些模型通过连续的“智能体循环(agent loop)”将原始感知输入转换为结构化、目标导向的行动。4
- 感知与多模态编码: LAM 先通过 Vision Transformers (ViT) 或 CNN 处理图像、GUI 截图、触觉反馈等输入,得到捕获环境状态的高维嵌入。4
- 目标解读与对齐: 使用自然语言编码器(如 T5 或 BERT 类 transformer)将人类指令翻译为与行动空间兼容的结构化计划。4
- 世界建模: 高级 LAM 构建内部模拟器(world models),预测环境对行动的响应,在执行前评估后果。4
- 行动规划与运动控制: 采用模型预测控制(MPC)或分层 RL 选择最优行动序列,并映射为低层控制信号(机器人关节角度或软件函数调用)。4
- 闭环反馈: 使用实时传感数据监控执行;若环境变化或行动失败,则更新内部状态并重新规划。4
LAM 应用与业务影响
LAM 部署已在行业自动化中带来可观回报。Salesforce 的 Agentforce 平台与 xLAM 模型族旨在处理复杂 CRM 工作流,将推理与函数调用融入企业运营。26
| Industry Sector | Primary LAM Application | Reported Efficiency Gain |
|---|---|---|
| Enterprise IT | Autonomous system monitoring and fault recovery 29 | 44% ROI in ITOps monitoring 30 |
| Finance | Automated invoice processing and reconciliation 6 | 90% accuracy improvement; 60% time reduction 6 |
| Logistics | Real-time route optimization and supply chain management 27 | 20% improvement in delivery speeds 32 |
| Healthcare | Robotic surgery guidance and record management 27 | Enhanced precision and reduced administrative load 27 |
| Marketing | Autonomous lead qualification and content creation 31 | 20% increase in marketing ROI 32 |
LAM 部署的关键组件是“视觉落地(visual grounding)”:Orby 的 ActIO 等模型像人类一样理解 GUI,识别按钮与字段,而不是依赖脆弱的后端代码。6 这使智能体能跨越遗留软件与动态 Web 环境工作,即便缺少 API 也能运行。6
多智能体系统:协作智能与群体行为
现代任务的复杂度常常超过单一、整体化智能体的能力上限,促使多智能体系统(Multi-Agent Systems, MAS)发展:由专门化智能体协作完成共同目标。34 这一方法类似人类组织结构,将经理、研究员与执行者等角色分配给不同实体。34
分层分解与专门化智能体
DEPART(Divide, Evaluate, Plan, Act, Reflect, Track)等框架采用分层结构,将规划、执行与视觉理解分解给专门化智能体。36
- 规划智能体: 生成高层策略,但一次只分配一步,以便在继续前根据环境反馈调整。36
- 行动执行器: 按规划者指令在环境中执行落地的低层交互(点击、输入等)。36
- 视觉执行器: 解释视觉上下文,仅在必要时共享信息,以降低计算成本与模态干扰。36
为优化多轮交互,研究者提出 Hierarchical Interactive Multi-turn Policy Optimization (HIMPO)。36 该后训练策略使用角色特定的稠密奖励促进专门化,并用任务级稀疏奖励将集体输出对齐到最终目标。36
协同协议与“AgentOps”
随着多智能体协作愈发普遍,对标准化通信的需求推动了 Model Context Protocol (MCP) 的发展。29 MCP 为分布式智能体保持一致上下文提供结构化框架,解决多智能体编排中最顽固的挑战之一。29 同时,“AgentOps”的兴起为大企业内部治理、验证并安全扩展这些自治系统提供必要基础设施。3
MAS 开发问题常集中在协同挑战(10%)、基础设施(14%)与 bug(22%)。35 解决这些问题需要先进的可观测性平台,在生产环境中实时可视化智能体行为与决策。30
记忆架构与智能体搜索的崛起
记忆是让智能体从经验中学习并随时间保持一致性的认知底座。1 传统 LLM 受限于固定上下文窗口,而智能体 AI 引入了模块化与内化的记忆解决方案。3
短期与长期记忆的演进
短期记忆管理从外部流水线方法(滑动窗口、摘要)转向模型原生增强(注意力优化、位置编码外推)。3 长期记忆传统由检索增强生成(RAG)承担,如今正被内化为学习到的上下文策略。3
Memory-as-Action (MemAct) 等高级框架将工作记忆的整理视作一组可学习的策略行动。10 这使智能体能动态决定保留、压缩或检索哪些信息,以优化任务表现。10 Dynamic Context Policy Optimization (DCPO) 等算法用于处理 RL 训练期间记忆编辑带来的不稳定性。10
智能体搜索:Search-o1 案例
Deep Research 智能体的发展代表了智能体信息搜寻能力的顶点。39 不同于标准搜索引擎,这类智能体通过多轮检索与动态规划来挖掘深层信息。39
Search-o1 是一个将智能体搜索工作流直接整合进大型推理模型逐步推理过程的框架。41 它引入“Reason-in-Documents”模块:当模型遇到不确定知识点时选择性调用搜索工具。40 这缓解了长推理链中的知识不足风险,提升 LRMs 在复杂任务中的可信度与适用性。41
| Memory Carrier | Technical Approach | Functional Outcome |
|---|---|---|
| External Repository | RAG; Structure & Compressed Summarization 7 | Access to massive, static knowledge bases 7 |
| Global Parameters | Parameter Internalization via RL 7 | Direct "parametric" knowledge and reasoning 7 |
| Latent Memory | MemGen; Weaving Generative Latent Memory 38 | Human-like cognitive patterns in reasoning 38 |
| System Resources | MemOS; Treat memory as a manageable resource 10 | Controllability and personalized modeling 10 |
评估与验证:衡量自主行为
评估智能体 AI 需要从静态指标转向动态、基于环境的基准。43 评估方法如今系统性地从四个维度分析智能体:基础能力(规划、工具使用、反思、记忆)、应用特定基准(Web、软件、科学)、成本效率与安全。43
基准与治理
行业正在转向更真实、更具挑战性的评估(如 WebArena 与 AlfWorld)。36 但安全与合规仍是大规模采用的首要门槛。30 目前仅 13% 的组织使用全自治智能体,多数仍依赖人类监督或试点项目做有限用例。30
企业越来越多地采用“人类在环(human-in-the-loop)”模型:人类通过设定目标、定义边界与监督智能体之间的沟通流来指导系统。30 验证方法通常包括数据质量检查(50%)、人类审阅智能体输出(47%)以及漂移/异常监控(41%)。30
2026 展望:物理 AI 与自我进化
展望 2026 年剩余时间,若干趋势有望重塑智能体 AI 格局。通过机器人与物联网(IoT)将智能体与物理世界结合,是下一次重大跃迁。31 AI 将从屏幕走向机器,能够在仓库或家庭等杂乱、非结构化环境中导航。31
此外,“搜索智能体自我进化(Search Agent Self-Evolution)”概念正在升温:智能体学习拓展并融合信息源、适应多模态,并自主发展更健壮的基础设施。45 由智能体使用 Physics-Informed Neural Networks (PINNs) 与 Kolmogorov–Arnold Networks (KANs) 推动的科学发现加速,预计将在材料科学与医学领域带来突破。17
随着 AI 从工具演化为主动伙伴,这些系统的成功将取决于其能否无缝融入人类工作流,并通过透明性、可靠性与可量化 ROI 来维持信任。22
推荐阅读:最值得读的 3 篇论文
为理解智能体 AI 的现状与未来轨迹,推荐以下三篇论文,它们在算法、推理与搜索方面具有基础性贡献:
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. 该报告开创性展示:仅通过强化学习(无需初始 SFT)即可诱导复杂推理行为(自我纠错、内部“思考”)。它体现了推理时扩展计算的威力,以及模型原生智能体实现类人推理基准表现的潜力。9
- AFlow: Automating Agentic Workflow Generation. ICLR 2025 发表,该论文将智能体工作流优化重述为搜索问题,采用 MCTS 自动发现有效工作流,用机器探索替代人工设计;通过优化运行逻辑,使更小模型胜过大得多的对手。16
- Search-o1: Agentic Search-Enhanced Large Reasoning Models. 该论文提出将智能体检索增强生成直接整合进大型推理模型推理过程的框架,为解决长跨度推理的知识缺口提供技术蓝图,是理解 Deep Research 智能体与“搜索 + 推理”融合未来的关键读物。41
智能体 AI 的轨迹已经清晰:从静态回答走向动态、自主交互。通过强化学习与搜索算法内化推理、记忆与行动,这些系统正成为下一代智能技术的运行骨干。在“经验时代(Era of Experience)”,AI 将不再只从过去学习,而会通过与持续演化的世界不断交互来增长其智能。3
References
- A Survey on the Feedback Mechanism of LLM-based AI Agents - IJCAI, accessed January 26, 2026, https://www.ijcai.org/proceedings/2025/1175.pdf
- Generative to Agentic AI: Survey, Conceptualization, and Challenges - arXiv, accessed January 26, 2026, https://arxiv.org/html/2504.18875v1
- Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI - alphaXiv, accessed January 26, 2026, https://www.alphaxiv.org/overview/2510.16720v1
- What are Large Action Models? The Next Frontier in AI Decision-Making | DigitalOcean, accessed January 26, 2026, https://www.digitalocean.com/resources/articles/large-action-models
- LLM-Based Hierarchical TODO Decomposition - Emergent Mind, accessed January 26, 2026, https://www.emergentmind.com/topics/llm-based-hierarchical-todo-decomposition
- Large Action Models (LAMs): The Future of Enterprise AI Automation - Uniphore, accessed January 26, 2026, https://www.uniphore.com/glossary/large-action-models/
- Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI - arXiv, accessed January 26, 2026, https://arxiv.org/html/2510.16720v1
- 2026 enterprise AI predictions -- fragmentation, commodification and the agent push facing CIOs - Information Week, accessed January 26, 2026, https://www.informationweek.com/machine-learning-ai/2026-enterprise-ai-predictions-fragmentation-commodification-and-the-agent-push-facing-cios
- (PDF) Technical Report: Analyzing DeepSeek-R1's Impact on AI Development, accessed January 26, 2026, https://www.researchgate.net/publication/388484582_Technical_Report_Analyzing_DeepSeek-R1's_Impact_on_AI_Development
- ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization, accessed January 26, 2026, https://www.semanticscholar.org/paper/ReSum%3A-Unlocking-Long-Horizon-Search-Intelligence-Wu-Li/055f039761d570658146e34dbe512d8671493887
- Understanding DeepSeek R1—A Reinforcement Learning-Driven Reasoning Model, accessed January 26, 2026, https://kili-technology.com/blog/understanding-deepseek-r1
- AI Agents - Antonio Esteves - Medium, accessed January 26, 2026, https://ajaesteves.medium.com/ai-agents-841d906aefb5
- What is Tree Of Thoughts Prompting? - IBM, accessed January 26, 2026, https://www.ibm.com/think/topics/tree-of-thoughts
- What Is Agentic Reasoning? - IBM, accessed January 26, 2026, https://www.ibm.com/think/topics/agentic-reasoning
- xinzhel/LLM-Search: Survey on LLM Inference via Search ... - GitHub, accessed January 26, 2026, https://github.com/xinzhel/LLM-Search
- AFLOW: AUTOMATING AGENTIC WORKFLOW GENERATION - ICLR Proceedings, accessed January 26, 2026, https://proceedings.iclr.cc/paper_files/paper/2025/file/5492ecbce4439401798dcd2c90be94cd-Paper-Conference.pdf
- What will happen with AI in 2026? - What kind of breakthroughs are we gonna see? - Reddit, accessed January 26, 2026, https://www.reddit.com/r/singularity/comments/1pzquum/what_will_happen_with_ai_in_2026_what_kind_of/
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - The Wire China, accessed January 26, 2026, https://www.thewirechina.com/wp-content/uploads/2025/01/DeepSeek-R1-Document.pdf
- ICLR Poster AFlow: Automating Agentic Workflow Generation, accessed January 26, 2026, https://iclr.cc/virtual/2025/poster/27691
- AFlow: Automating Agentic Workflow Generation | OpenReview, accessed January 26, 2026, https://openreview.net/forum?id=z5uVAKwmjf
- How DeepSeek R1 Works: Explaining All Its Key Components and Their Consequences, accessed January 26, 2026, https://www.pedromebo.com/blog/en-how-deepseek-r1-works
- Dawn of Large Action Models in AI | by Noel Furtado | Jan, 2026 - Medium, accessed January 26, 2026, https://medium.com/@noeljf_in/dawn-of-large-action-models-in-ai-662c572279af
- Large Action Models, toward operational artificial intelligence - Tech4Future, accessed January 26, 2026, https://tech4future.info/en/large-action-models-operational-ai/
- Large Action Models (LAMs): A Guide With Examples - DataCamp, accessed January 26, 2026, https://www.datacamp.com/blog/large-action-models
- Understanding Large Action Models: Part 1 - DataOps Labs, accessed January 26, 2026, https://blog.dataopslabs.com/prompt-to-action-large-action-models-i
- xLAM: A Family of Large Action Models for AI Agents - Salesforce, accessed January 26, 2026, https://www.salesforce.com/blog/large-action-model-ai-agent/
- Large Action Models: The Latest AI Technology - Scopic, accessed January 26, 2026, https://scopicsoftware.com/blog/large-action-models/
- What Are Large Action Models (LAMs)? - Salesforce, accessed January 26, 2026, https://www.salesforce.com/agentforce/large-action-models/
- Top 10 AI Agent Research Papers to Read - Ema, accessed January 26, 2026, https://www.ema.co/additional-blogs/addition-blogs/top-ai-agent-research-papers
- New global report finds enterprises hitting Agentic AI inflection point - Dynatrace, accessed January 26, 2026, https://www.dynatrace.com/news/press-release/pulse-of-agentic-ai-2026/
- Top 5 AI Agent Trends for 2026 - United States Artificial Intelligence Institute, accessed January 26, 2026, https://www.usaii.org/ai-insights/top-5-ai-agent-trends-for-2026
- 150+ AI Agent Statistics [2026] - Master of Code, accessed January 26, 2026, https://masterofcode.com/blog/ai-agent-statistics
- Agentic Patterns and Implementation with Agentforce - Salesforce Architects, accessed January 26, 2026, https://architect.salesforce.com/fundamentals/agentic-patterns
- LLMs for Multi-Agent Cooperation | Xueguang Lyu, accessed January 26, 2026, https://xue-guang.com/post/llm-marl/
- A Large-Scale Study on the Development and Issues of Multi-Agent AI Systems - arXiv, accessed January 26, 2026, https://arxiv.org/html/2601.07136v1
- DEPART: HIERARCHICAL MULTI-AGENT SYSTEM ... - OpenReview, accessed January 26, 2026, https://openreview.net/pdf/af2cc92bb045206ca7733acadb3a94fe72719916.pdf
- Top 10 Must-Read AI Agent Research Papers (with Links) : r/AgentsOfAI - Reddit, accessed January 26, 2026, https://www.reddit.com/r/AgentsOfAI/comments/1n4ni03/top_10_mustread_ai_agent_research_papers_with/
- IAAR-Shanghai/Awesome-AI-Memory - GitHub, accessed January 26, 2026, https://github.com/IAAR-Shanghai/Awesome-AI-Memory
- [2508.05668] A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges - arXiv, accessed January 26, 2026, https://arxiv.org/abs/2508.05668
- Search-o1: Agentic Search-Enhanced Large Reasoning Models | Request PDF - ResearchGate, accessed January 26, 2026, https://www.researchgate.net/publication/397425838_Search-o1_Agentic_Search-Enhanced_Large_Reasoning_Models
- Search-o1: Agentic Search-Enhanced Large Reasoning Models - ACL Anthology, accessed January 26, 2026, https://aclanthology.org/2025.emnlp-main.276.pdf
- Search-o1: Agentic Search-Enhanced Large Reasoning Models - arXiv, accessed January 26, 2026, https://arxiv.org/html/2501.05366v1
- [2503.16416] Survey on Evaluation of LLM-based Agents - arXiv, accessed January 26, 2026, https://arxiv.org/abs/2503.16416
- Top 10 Research Papers on AI Agents - Analytics Vidhya, accessed January 26, 2026, https://www.analyticsvidhya.com/blog/2024/12/ai-agents-research-papers/
- A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges - arXiv, accessed January 26, 2026, https://arxiv.org/html/2508.05668v3
- Future of AI Agents: Top Trends in 2026 - Blue Prism, accessed January 26, 2026, https://www.blueprism.com/resources/blog/future-ai-agents-trends/
- FoundationAgents/AFlow: ICLR 2025 Oral. Automating Agentic Workflow Generation., accessed January 26, 2026, https://github.com/FoundationAgents/AFlow