Safety

7 posts in Safety

Meta's LeCun Introduces SAI: A Measurable Alternative to AGI

March 2026Safety

Meta's LeCun Introduces SAI: A Measurable Alternative to AGI

Mirage of Synthesis: DREAM's Agentic Framework Catches What Static Benchmarks Miss

February 2026Safety

Mirage of Synthesis: DREAM's Agentic Framework Catches What Static Benchmarks Miss

Voice Search Breaks in Noise: SQuTR Benchmark Reveals the Real Bottleneck

February 2026Voice AI

Voice Search Breaks in Noise: SQuTR Benchmark Reveals the Real Bottleneck

Even GPT-5 Fails at Discovery: OdysseyArena Exposes the Inductive Bottleneck in LLM Agents

February 2026AI Agents

Even GPT-5 Fails at Discovery: OdysseyArena Exposes the Inductive Bottleneck in LLM Agents

DeepResearchEval: Benchmark Shows Gemini Leads Quality, Manus Wins Factual Accuracy

January 2026Safety

DeepResearchEval: Benchmark Shows Gemini Leads Quality, Manus Wins Factual Accuracy

VLM Hallucinations Exposed: VIB-Probe Pinpoints and Suppresses Faulty Attention Heads

January 2026Safety

VLM Hallucinations Exposed: VIB-Probe Pinpoints and Suppresses Faulty Attention Heads

Why Chain-of-Thought Works: Researchers Find a Single 'Reasoning Switch' in LLMs

January 2026LLMs

Why Chain-of-Thought Works: Researchers Find a Single 'Reasoning Switch' in LLMs