AI Agents

18 posts in AI Agents

8x Terminal Performance Gains: NVIDIA's Data Recipe Lets 32B Beat 480B

February 2026Infrastructure

8x Terminal Performance Gains: NVIDIA's Data Recipe Lets 32B Beat 480B

Mirage of Synthesis: DREAM's Agentic Framework Catches What Static Benchmarks Miss

February 2026Safety

Mirage of Synthesis: DREAM's Agentic Framework Catches What Static Benchmarks Miss

80.3 on ScreenSpotPro: GUI-Owl-1.5 Sets New Bar for Open-Source GUI Agents

February 2026AI Agents

80.3 on ScreenSpotPro: GUI-Owl-1.5 Sets New Bar for Open-Source GUI Agents

When Should AI Agents Ask for Help? CMU's CowCorpus Maps Four Human Collaboration Styles

February 2026AI Agents

When Should AI Agents Ask for Help? CMU's CowCorpus Maps Four Human Collaboration Styles

Even GPT-5 Fails at Discovery: OdysseyArena Exposes the Inductive Bottleneck in LLM Agents

February 2026AI Agents

Even GPT-5 Fails at Discovery: OdysseyArena Exposes the Inductive Bottleneck in LLM Agents

Prompt Fatigue Solved: Vibe AIGC Turns Users Into 'Commanders' of Multi-Agent Creative Workflows

February 2026AI Agents

Prompt Fatigue Solved: Vibe AIGC Turns Users Into 'Commanders' of Multi-Agent Creative Workflows

Google Introduces Agentic Vision: Gemini 3 Flash Now Zooms, Annotates, and Investigates Images

February 2026AI Agents

Google Introduces Agentic Vision: Gemini 3 Flash Now Zooms, Annotates, and Investigates Images

260% Better at Catching Moving Objects: DynamicVLA Solves Robot Latency Problem

January 2026AI Agents

260% Better at Catching Moving Objects: DynamicVLA Solves Robot Latency Problem

15-Hour Agent Runtimes Solved: Idea2Story Precomputes Research Knowledge Offline

January 2026AI Agents

15-Hour Agent Runtimes Solved: Idea2Story Precomputes Research Knowledge Offline

73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

January 2026AI Agents

73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

January 2026AI Agents

56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

SimpleMem gives LLM agents 30x cheaper memory with 26% better recall

January 2026AI Agents

SimpleMem gives LLM agents 30x cheaper memory with 26% better recall

Microsoft's Agent Lightning Decouples RL Training from Agent Logic, Enabling Fine-Tuning of Any AI Agent with Zero Code Changes

January 2026Infrastructure

Microsoft's Agent Lightning Decouples RL Training from Agent Logic, Enabling Fine-Tuning of Any AI Agent with Zero Code Changes

Agent Memory Fragmentation Solved: EverMemOS Achieves 93% on LoCoMo via Engram-Inspired Lifecycle

January 2026AI Agents

Agent Memory Fragmentation Solved: EverMemOS Achieves 93% on LoCoMo via Engram-Inspired Lifecycle

Agent Memory Loss Solved: InfiAgent's File-Centric Architecture Enables Unlimited Runtime

January 2026AI Agents

Agent Memory Loss Solved: InfiAgent's File-Centric Architecture Enables Unlimited Runtime

DeepResearchEval: Benchmark Shows Gemini Leads Quality, Manus Wins Factual Accuracy

January 2026Safety

DeepResearchEval: Benchmark Shows Gemini Leads Quality, Manus Wins Factual Accuracy

Zero Training Data, Full Performance: Dr. Zero Matches Supervised Search Agents

January 2026Infrastructure

Zero Training Data, Full Performance: Dr. Zero Matches Supervised Search Agents

Gold Medal at IMO and IOI: DeepSeek-V3.2 Matches GPT-5 with Open Weights

January 2026Infrastructure

Gold Medal at IMO and IOI: DeepSeek-V3.2 Matches GPT-5 with Open Weights