Research

Discover latest breakthroughs in Tech

Baidu Introduces ERNIE 5.0: Trillion-Parameter Unified Multimodal MoE Rivals GPT-5

Vision
Baidu Introduces ERNIE 5.0: Trillion-Parameter Unified Multimodal MoE Rivals GPT-5

Google Introduces Agentic Vision: Gemini 3 Flash Now Zooms, Annotates, and Investigates Images

AI Agents
Google Introduces Agentic Vision: Gemini 3 Flash Now Zooms, Annotates, and Investigates Images

First Holistic OCR Model: OCRVerse Unifies Document Parsing and Code Generation

Vision
First Holistic OCR Model: OCRVerse Unifies Document Parsing and Code Generation

175% Faster Prefill with Better Accuracy: ConceptMoE's Adaptive Token Compression for MoE

Infrastructure
175% Faster Prefill with Better Accuracy: ConceptMoE's Adaptive Token Compression for MoE

260% Better at Catching Moving Objects: DynamicVLA Solves Robot Latency Problem

AI Agents
260% Better at Catching Moving Objects: DynamicVLA Solves Robot Latency Problem

15-Hour Agent Runtimes Solved: Idea2Story Precomputes Research Knowledge Offline

AI Agents
15-Hour Agent Runtimes Solved: Idea2Story Precomputes Research Knowledge Offline

UPLiFT vs Cross-Attention Upsamplers: Linear Scaling Meets SOTA Quality

Vision
UPLiFT vs Cross-Attention Upsamplers: Linear Scaling Meets SOTA Quality

2x Faster VLA Inference with 70% Fewer Layers: Shallow-π Distillation for Edge Robotics

Infrastructure
2x Faster VLA Inference with 70% Fewer Layers: Shallow-π Distillation for Edge Robotics

6x Fewer Tokens, Better OCR: DeepSeek's Visual Causal Flow Beats GPT-4o and Gemini

Vision
6x Fewer Tokens, Better OCR: DeepSeek's Visual Causal Flow Beats GPT-4o and Gemini

11% Better Than Human: Chroma 1.0's Real-Time Voice Cloning for Spoken Dialogue

Voice AI
11% Better Than Human: Chroma 1.0's Real-Time Voice Cloning for Spoken Dialogue

90% Attention Sparsity with Zero Quality Loss: SALAD Speeds Up Video Diffusion 1.7x

Infrastructure
90% Attention Sparsity with Zero Quality Loss: SALAD Speeds Up Video Diffusion 1.7x

73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

AI Agents
73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

FP8 Rollout Instability Solved: Jet-RL Unifies Precision for Stable RL Training

Infrastructure
FP8 Rollout Instability Solved: Jet-RL Unifies Precision for Stable RL Training

56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

AI Agents
56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

97ms First-Packet Latency: Qwen3-TTS Beats ElevenLabs in Voice Cloning Across 10 Languages

Voice AI
97ms First-Packet Latency: Qwen3-TTS Beats ElevenLabs in Voice Cloning Across 10 Languages