AI Agents

18 posts in AI Agents

8x Terminal Performance Gains: NVIDIA's Data Recipe Lets 32B Beat 480B

Infrastructure
8x Terminal Performance Gains: NVIDIA's Data Recipe Lets 32B Beat 480B

Mirage of Synthesis: DREAM's Agentic Framework Catches What Static Benchmarks Miss

Safety
Mirage of Synthesis: DREAM's Agentic Framework Catches What Static Benchmarks Miss

80.3 on ScreenSpotPro: GUI-Owl-1.5 Sets New Bar for Open-Source GUI Agents

AI Agents
80.3 on ScreenSpotPro: GUI-Owl-1.5 Sets New Bar for Open-Source GUI Agents

When Should AI Agents Ask for Help? CMU's CowCorpus Maps Four Human Collaboration Styles

AI Agents
When Should AI Agents Ask for Help? CMU's CowCorpus Maps Four Human Collaboration Styles

Even GPT-5 Fails at Discovery: OdysseyArena Exposes the Inductive Bottleneck in LLM Agents

AI Agents
Even GPT-5 Fails at Discovery: OdysseyArena Exposes the Inductive Bottleneck in LLM Agents

Prompt Fatigue Solved: Vibe AIGC Turns Users Into 'Commanders' of Multi-Agent Creative Workflows

AI Agents
Prompt Fatigue Solved: Vibe AIGC Turns Users Into 'Commanders' of Multi-Agent Creative Workflows

Google Introduces Agentic Vision: Gemini 3 Flash Now Zooms, Annotates, and Investigates Images

AI Agents
Google Introduces Agentic Vision: Gemini 3 Flash Now Zooms, Annotates, and Investigates Images

260% Better at Catching Moving Objects: DynamicVLA Solves Robot Latency Problem

AI Agents
260% Better at Catching Moving Objects: DynamicVLA Solves Robot Latency Problem

15-Hour Agent Runtimes Solved: Idea2Story Precomputes Research Knowledge Offline

AI Agents
15-Hour Agent Runtimes Solved: Idea2Story Precomputes Research Knowledge Offline

73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

AI Agents
73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

AI Agents
56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

SimpleMem gives LLM agents 30x cheaper memory with 26% better recall

AI Agents
SimpleMem gives LLM agents 30x cheaper memory with 26% better recall

Microsoft's Agent Lightning Decouples RL Training from Agent Logic, Enabling Fine-Tuning of Any AI Agent with Zero Code Changes

Infrastructure
Microsoft's Agent Lightning Decouples RL Training from Agent Logic, Enabling Fine-Tuning of Any AI Agent with Zero Code Changes

Agent Memory Fragmentation Solved: EverMemOS Achieves 93% on LoCoMo via Engram-Inspired Lifecycle

AI Agents
Agent Memory Fragmentation Solved: EverMemOS Achieves 93% on LoCoMo via Engram-Inspired Lifecycle

Agent Memory Loss Solved: InfiAgent's File-Centric Architecture Enables Unlimited Runtime

AI Agents
Agent Memory Loss Solved: InfiAgent's File-Centric Architecture Enables Unlimited Runtime

DeepResearchEval: Benchmark Shows Gemini Leads Quality, Manus Wins Factual Accuracy

Safety
DeepResearchEval: Benchmark Shows Gemini Leads Quality, Manus Wins Factual Accuracy

Zero Training Data, Full Performance: Dr. Zero Matches Supervised Search Agents

Infrastructure
Zero Training Data, Full Performance: Dr. Zero Matches Supervised Search Agents

Gold Medal at IMO and IOI: DeepSeek-V3.2 Matches GPT-5 with Open Weights

Infrastructure
Gold Medal at IMO and IOI: DeepSeek-V3.2 Matches GPT-5 with Open Weights