LLM Reasoning as Trajectories:
Step-Specific Representation Geometry and Correctness Signals

Microsoft
ACL 2026 (Main)

Chain-of-thought reasoning in LLMs traces structured trajectories through representation space. Step-specific regions become linearly separable with depth, late-step geometry predicts correctness, and trajectory-based steering enables both error correction and reasoning length control at inference time.

t-SNE visualization of step-specific activations across layers
Step-specific representation structure across layers. t-SNE visualization of activations preceding Step markers at layers 0, 11, 21, and 31. Clusters become increasingly separated at deeper layers, revealing step-specific geometric structure.

Overview

Current large language models generate tokens by iteratively updating high-dimensional representations. When solving math problems with chain-of-thought prompting, the sequence of reasoning steps can be viewed as successive states forming a trajectory through the model's representation space. We characterize this trajectory and find it is highly structured: each reasoning step occupies a distinct, linearly separable region that becomes progressively more delineated at deeper layers. This organization is already present in base models — reasoning training primarily reshapes when convergence occurs rather than introducing new representational structure.

Building on this, we show that correct and incorrect solutions follow similar early-step paths but diverge systematically at late steps, yielding actionable mid-reasoning correctness signals. We further introduce trajectory-based steering, an inference-time intervention framework that enables both error correction and reasoning length control without retraining.

Step-Specific Representation Subspaces

We extract hidden-state activations immediately preceding each Step marker during chain-of-thought reasoning on GSM8K. These activations form snapshots of the model's representation trajectory as reasoning unfolds.

Layer-wise linear probe accuracy for step-specific classification
Layer-wise linear probe accuracy for step classification (Steps 2, 3, 5, and Final Answer Marker) across Instruct, R1-Distill, and Base models. Later steps require deeper layers to become separable.
  • Step-specific regions are linearly separable. Step 1 has probe accuracy above 0.99 at every layer for all models. Later steps require deeper layers to become separable (Table 1).
  • Structure is shared across training regimes. Cross-model transfer of step-specific linear probes consistently achieves accuracy above 0.90 for nearly all model pairs (Table 1).
  • Robust to prompt format. Probes trained on fixed-format Step X: activations transfer to freeform responses (no explicit step markers) with best-layer accuracies consistently above 0.84 (Table 7).

💡 Key Insight: Reasoning steps occupy functionally ordered, linearly separable subspaces that are largely shared across training regimes and robust to surface-level formatting.

Correctness in Trajectory Geometry

We group trajectories by final-answer correctness and compute between-step activation distances. Early-step transitions are nearly identical for correct and incorrect solutions, but late-step transitions diverge systematically. We then train logistic regression classifiers on trajectory features to predict final-answer correctness before the answer is emitted.

Between-step activation distances and mid-reasoning correctness prediction
(a) Between-step activation distances. Late transitions show statistically significant divergence between correct and incorrect solutions († marks non-overlapping 95% CIs). (b) Test ROC-AUC across layers. Late-step trajectory features (peak 0.87 at layer 29) substantially outperform early-step geometry and logit-lens baselines.
  • Early-step geometry is correctness-invariant. Step 1→2 distances show no significant difference (Δ(I−C) ≈ 0).
  • Late-step trajectories diverge. Last step → answer marker: Euclidean Δ(I−C) = −13.39, cosine Δ = −0.06.
  • Trajectory features predict correctness with ROC-AUC 0.87 (peak at layer 29), substantially outperforming step-count-only baselines (0.65) and logit-lens features (0.77).

💡 Key Insight: Correctness is encoded not in where the model ends up in representation space, but in how it gets there — the trajectory, not the destination.

Error-Targeted Inference-Time Interventions

We show that unconditional test-time scaling — always injecting control tokens like "Wait" or "Hmm" — is often harmful, reducing accuracy by up to 36%. Instead, we use correctness predictors to gate interventions, applying them only when failure is predicted.

Unconditional vs predictor-gated interventions
Unconditional vs. error-targeted interventions on GSM8K. Predictor-gated methods convert large unconditional accuracy drops into consistent gains.
  • Unconditional interventions are often harmful. Always injecting "Wait" reduces accuracy by 36.0%; even "Step" incurs a 1.6% drop.
  • Predictor-gated interventions yield gains of up to +35.4% relative to always-on, intervening on only 12.3% of examples.

💡 Key Insight: Late-step trajectory geometry encodes detectable correctness signals that can guide interventions, but one-shot corrections remain limited in their ability to reliably repair diverse failure modes.

Trajectory-Based Steering

Correcting Deviating Reasoning Trajectories

We derive an ideal reasoning trajectory from correct examples in PCA space and apply low-rank steering updates whenever the current trajectory deviates beyond learned thresholds. This enables localized, repeated corrections at every step boundary.

Trajectory-based steering for correctness
Trajectory-based steering for correctness on GSM8K, stratified by original step count. Gains are largest on longer, error-prone reasoning chains.
  • Most effective on longer chains: +7.60% accuracy on 6-step problems (75.44% → 83.04%) and +7.69% on 7-step problems (67.69% → 75.38%).
  • High preservation rates (≥97%): repeated low-magnitude interventions avoid destabilizing already-correct trajectories.

Reasoning Length Control

Using the same termination-related subspace, we can directly and continuously control reasoning length by steering activations toward (Shorten) or away from (Prolong) the termination region.

Reasoning length control via termination subspace
Reasoning length and number of steps as a function of steering strength |α|. Moderate strengths (|α| ≤ 0.4) change length with ~1% accuracy impact.

💡 Key Insight: Reasoning trajectories can be causally manipulated to both correct errors and control reasoning length, establishing them as a unifying abstraction for interpreting, predicting, and influencing LLM reasoning behavior.

Cross-Task Generalization

Step-specific probes trained on GSM8K transfer robustly to both MATH-500 and MMLU, achieving best-layer accuracies above 0.85 for all steps. Steering vectors derived from GSM8K also improve MATH-500 accuracy from 36.40% to 38.20% (+1.80%) without any retuning — indicating that the identified geometry reflects general properties of CoT reasoning.

BibTeX

@misc{sun2026llmreasoningtrajectoriesstepspecific,
      title={LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals}, 
      author={Lihao Sun and Hang Dong and Bo Qiao and Qingwei Lin and Dongmei Zhang and Saravan Rajmohan},
      year={2026},
      eprint={2604.05655},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.05655}, 
}