Qwen2.5-32B-Instruct + PerformanceEnhancer v2

Inference-time enhanced Qwen2.5-32B-Instruct by Proprioceptive AI, Inc.

This model achieves 93.9% on ARC-Challenge (up from 82.2% baseline) through inference-time enhancement — with zero fine-tuning of the base model weights.

Results

Benchmark Baseline Enhanced Improvement
ARC-Challenge 82.2% 93.9% +11.8%
TruthfulQA MC1 79.4% 100.0% +20.6%
HellaSwag 91.5% 94.0% +2.5%

Base Model

  • Model: Qwen/Qwen2.5-32B-Instruct
  • Quantization: 4-bit NF4 (BitsAndBytes, double quantization)
  • Hardware: Single NVIDIA RTX 3090 (24GB)

How It Works

PerformanceEnhancer v2 is a proprietary inference-time enhancement system. It extracts signals from the model's internal representations that correlate with answer correctness but are not reflected in the output logprobs. A lightweight corrector overrides the model's answer when the internal signal indicates the surface-level response is wrong.

Key Properties

  • No fine-tuning — base model weights are never modified
  • No test leakage — all hyperparameters selected via 5-fold cross-validation on training data only
  • Lightweight — negligible additional inference overhead
  • Single consumer GPU — runs entirely on one RTX 3090

Evaluation Protocol

  1. Trained on ARC-Challenge train split only (1,119 questions)
  2. All hyperparameters selected via 5-fold cross-validation on training data
  3. Single frozen evaluation on test split (1,172 questions)
  4. No cherry-picking — reported number is the one and only test run

Detailed Results

Per-question predictions available at: LoganResearch/arc-challenge-enhancement

Usage

The PerformanceEnhancer is proprietary software. Contact Proprioceptive AI for licensing.

The base model can be loaded normally:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype="bfloat16",
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-32B-Instruct",
    quantization_config=bnb,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")

Citation

@misc{proprioceptive_enhanced_qwen_2026,
  title={Inference-Time Enhanced Qwen2.5-32B-Instruct},
  author={Napolitano, Logan},
  year={2026},
  organization={Proprioceptive AI, Inc.},
  url={https://huggingface.co/LoganResearch/qwen2.5-32b-enhanced}
}

License

Results data and model card: MIT PerformanceEnhancer implementation: All Rights Reserved (patent pending) — Proprioceptive AI, Inc.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LoganResearch/qwen2.5-32b-enhanced

Base model

Qwen/Qwen2.5-32B
Finetuned
(1214)
this model