Infrastructure

23 posts in Infrastructure

Unified Latents Hits 1.4 FID by Replacing Stable Diffusion's Ad Hoc VAE with a Diffusion Prior

Vision
Unified Latents Hits 1.4 FID by Replacing Stable Diffusion's Ad Hoc VAE with a Diffusion Prior

80.3 on ScreenSpotPro: GUI-Owl-1.5 Sets New Bar for Open-Source GUI Agents

AI Agents
80.3 on ScreenSpotPro: GUI-Owl-1.5 Sets New Bar for Open-Source GUI Agents

3.5x Faster Image Generation: DDiT Dynamically Resizes Patches in Diffusion Transformers

Vision
3.5x Faster Image Generation: DDiT Dynamically Resizes Patches in Diffusion Transformers

Reasoning Overthinking Solved: SAGE Cuts Tokens 44% While Improving Accuracy

Infrastructure
Reasoning Overthinking Solved: SAGE Cuts Tokens 44% While Improving Accuracy

Baidu Introduces ERNIE 5.0: Trillion-Parameter Unified Multimodal MoE Rivals GPT-5

Vision
Baidu Introduces ERNIE 5.0: Trillion-Parameter Unified Multimodal MoE Rivals GPT-5

First Holistic OCR Model: OCRVerse Unifies Document Parsing and Code Generation

Vision
First Holistic OCR Model: OCRVerse Unifies Document Parsing and Code Generation

175% Faster Prefill with Better Accuracy: ConceptMoE's Adaptive Token Compression for MoE

Infrastructure
175% Faster Prefill with Better Accuracy: ConceptMoE's Adaptive Token Compression for MoE

6x Fewer Tokens, Better OCR: DeepSeek's Visual Causal Flow Beats GPT-4o and Gemini

Vision
6x Fewer Tokens, Better OCR: DeepSeek's Visual Causal Flow Beats GPT-4o and Gemini

2x Faster VLA Inference with 70% Fewer Layers: Shallow-π Distillation for Edge Robotics

Infrastructure
2x Faster VLA Inference with 70% Fewer Layers: Shallow-π Distillation for Edge Robotics

UPLiFT vs Cross-Attention Upsamplers: Linear Scaling Meets SOTA Quality

Vision
UPLiFT vs Cross-Attention Upsamplers: Linear Scaling Meets SOTA Quality

90% Attention Sparsity with Zero Quality Loss: SALAD Speeds Up Video Diffusion 1.7x

Infrastructure
90% Attention Sparsity with Zero Quality Loss: SALAD Speeds Up Video Diffusion 1.7x

FP8 Rollout Instability Solved: Jet-RL Unifies Precision for Stable RL Training

Infrastructure
FP8 Rollout Instability Solved: Jet-RL Unifies Precision for Stable RL Training

73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

AI Agents
73% on BrowseComp: Meituan's 560B Open-Source Model Leads Agentic Benchmarks

Tsinghua Researchers Show Diffusion LLMs Reason Better When You Take Away Their Flexibility

Infrastructure
Tsinghua Researchers Show Diffusion LLMs Reason Better When You Take Away Their Flexibility

97ms First-Packet Latency: Qwen3-TTS Beats ElevenLabs in Voice Cloning Across 10 Languages

Voice AI
97ms First-Packet Latency: Qwen3-TTS Beats ElevenLabs in Voice Cloning Across 10 Languages

56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

AI Agents
56.7% on OSWorld: EvoCUA's Evolutionary Training Beats Closed-Source Computer Use Agents

Microsoft's Agent Lightning Decouples RL Training from Agent Logic, Enabling Fine-Tuning of Any AI Agent with Zero Code Changes

Infrastructure
Microsoft's Agent Lightning Decouples RL Training from Agent Logic, Enabling Fine-Tuning of Any AI Agent with Zero Code Changes

16x Faster On-Device Video Generation: Qualcomm's ReHyAt Distills Attention in 160 GPU Hours

Vision
16x Faster On-Device Video Generation: Qualcomm's ReHyAt Distills Attention in 160 GPU Hours

Gold Medal at IMO and IOI: DeepSeek-V3.2 Matches GPT-5 with Open Weights

Infrastructure
Gold Medal at IMO and IOI: DeepSeek-V3.2 Matches GPT-5 with Open Weights

6% Better Math Reasoning in Fewer Tokens: Multiplex Thinking Merges Multiple Paths into One

Infrastructure
6% Better Math Reasoning in Fewer Tokens: Multiplex Thinking Merges Multiple Paths into One

Zero Training Data, Full Performance: Dr. Zero Matches Supervised Search Agents

Infrastructure
Zero Training Data, Full Performance: Dr. Zero Matches Supervised Search Agents

Why Reasoning Models Cheat on Efficiency: TNT's Fix Cuts Tokens 50%

Infrastructure
Why Reasoning Models Cheat on Efficiency: TNT's Fix Cuts Tokens 50%

Why Hyper-Connections Explode at Scale: DeepSeek's Manifold Fix

Infrastructure
Why Hyper-Connections Explode at Scale: DeepSeek's Manifold Fix