alphaXiv

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

1,292

01 Oct 2025

reasoning neuro-symbolic-ai agents

Aristotle: IMO-level Automated Theorem Proving

Harmonic

Aristotle, developed by The Harmonic Team, achieved gold-medal-equivalent performance on the IMO 2025 by producing formally verified solutions to five out of six problems, leveraging a hybrid approach that integrates informal reasoning with a formal proof search algorithm in Lean 4.

341

30 Sep 2025

fine-tuning reinforcement-learning chain-of-thought

Debunk the Myth of SFT Generalization

Boston University LinkedIn

Research from Boston University and LinkedIn demonstrates that vanilla Supervised Fine-Tuning (SFT) can achieve strong generalization capabilities comparable to or exceeding Reinforcement Learning (RL) methods, provided training data incorporates prompt diversity and Chain-of-Thought (CoT) supervision. The work attributes SFT's previously observed failures to data design issues rather than inherent algorithmic limitations.

121

02 Oct 2025

reasoning reinforcement-learning multi-agent-learning

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Carnegie Mellon University

Stanford University

RLAD trains large language models to discover and utilize natural language "reasoning abstractions," which are high-level descriptions of procedural and factual knowledge for problem-solving. This approach enhances reasoning by promoting structured exploration, outperforming baseline methods by up to 11.9 percentage points on math benchmarks like AIME 2025 and showing improved generalization across diverse domains.

107

127

02 Oct 2025

synthetic-data transformers generative-models

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

Virginia Tech FAIR at Meta Cerebras Systems Independent consultant

Researchers from FAIR at Meta systematically investigated synthetic data in LLM pre-training, showing that while pure synthetic data isn't superior, mixing approximately 30% rephrased synthetic data with natural text can accelerate pre-training by 5-10x and potentially reduce irreducible loss. The study clarifies the conditional benefits and optimal application strategies for synthetic data across various scales.

472

30 Sep 2025

neural-and-evolutionary-computing artificial-intelligence machine-learning

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Pathway

Researchers at Pathway developed the Dragon Hatchling (BDH) architecture, which bridges the gap between Large Language Models and biologically plausible brain models through a system of locally interacting neuron particles. Its GPU-optimized variant (BDH-GPU) achieves performance comparable to Transformer models on language tasks, while demonstrating emergent modularity, monosemantic synapses, and adaptive sparsity that offers inherent interpretability.

125

02 Oct 2025

deep-reinforcement-learning reasoning generative-models

ExGRPO: Learning to Reason from Experience

ExGRPO introduces a principled framework for large language models that enhances reasoning capabilities by systematically managing and reusing valuable past experiences in reinforcement learning from verifiable rewards (RLVR). It achieves consistent performance gains of +3.5 to +7.6 points on reasoning benchmarks and stabilizes training for weaker models by optimizing intermediate-difficulty, low-entropy reasoning trajectories.

327

132

02 Oct 2025

agent-based-systems agents agentic-frameworks

The Unreasonable Effectiveness of Scaling Agents for Computer Use

Simular Research

Simular Research introduces Behavior Best-of-N (bBoN), a framework that boosts Computer-Use Agent (CUA) reliability on complex digital tasks by generating and comparatively evaluating multiple full-length solution trajectories. bBoN achieves a new state-of-the-art 69.9% success rate on OSWorld, nearing human performance.

1,612

29 Sep 2025

agents reinforcement-learning deep-reinforcement-learning

Training Agents Inside of Scalable World Models

Google DeepMind introduces Dreamer 4, a world model agent that achieves the first successful diamond acquisition in Minecraft purely from offline data. The model demonstrates real-time, highly accurate simulation of complex game mechanics on a single GPU, enabling an agent to learn long-horizon tasks and generalize action understanding to new environments.

02 Oct 2025

agents agentic-frameworks reasoning

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

160

01 Oct 2025

agents agentic-frameworks reinforcement-learning

GEM: A Gym for Agentic LLMs

NUS

Stanford University SMU Sea AI Lab Northeastern Oxford OpenRLHF ROLL RL2

GEM, an open-source environment simulator, provides a standardized, framework-agnostic platform for reinforcement learning research with agentic large language models. It facilitates the development of multi-turn, long-horizon agents capable of tool use and introduces REINFORCE with Return Batch Normalization (ReBN) as an effective algorithm for these complex settings.

190

There are no more papers matching your filters at the moment.

Popular Communities

Install Browser Extension

Blog|We're hiring

alphaXiv

Explore

Communities

Login

Feedback

Dark mode

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Aristotle: IMO-level Automated Theorem Proving

Debunk the Myth of SFT Generalization

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

ExGRPO: Learning to Reason from Experience

The Unreasonable Effectiveness of Scaling Agents for Computer Use

Training Agents Inside of Scalable World Models

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

GEM: A Gym for Agentic LLMs

Popular Communities