Alignment vs. Cognitive Fit: Rethinking Model-Human Synchronization
Author: Tyler Williams
DOI: 10.5281/zenodo.17346467
Abstract
In AI research, "alignment" traditionally refers to ensuring that model behavior remains safe, predictable, and consistent with human values. But this paradigm assumes a universal human evaluator - a theoretical observer whose preferences stand in for billions of distinct cognitive profiles. In practice, alignment is not merely about moral conformity, it's about cognitive compatibility.
This paper introduces the concept of Cognitive Fit, a framework for understanding alignment as a personalized synchronization problem wherein the optimal model response style depends on the user's attentional rhythm, cognitive load tolerance, and emotional regulation pattern. Drawing on preliminary comparative analysis between open-weight models (Hermes4 14B and Apollo Astralis 8B), we explore how communication style, verbosity, and epistemic posture influence engagement and perceived "alignment." The findings outlined in this paper suggest that alignment without attentional empathy fails in practice, particularly for neurodivergent users, and that models evaluate "quality" through the lens of their own training values, complicating efforts to establish objective evaluation standards.
1. Introduction: Beyond Universal Alignment
From RLHF (Reinforcement Learning Human Feedback) to constitutional AI, current alignment frameworks have made significant progress in ensuring model safety and general helpfulness [1,2]. However, these approaches assume that an optimally "aligned" model should behave consistenly for all users under the same conditions. This assumption mirrors industrial standardization rather than interpersonal understanding, prioritizing scalability over personalization.
Human cognition, however, is not uniform. It varies across multiple axes: attentional capacity, emotional sensitivity, neurodivergence, working memory span, and narrative preference [3,4]. When users describe one model as "aligned" and another as "off," they often refer to not moral or behavioral consistency, but to something more fundamental: "This model communicates in a way my brain can actually follow."
The field's focus on "safe general behavior" has inadvertently obscured a critical dimension of the model's ability to match the user's thought form and attentional capacity. This paper proposes that we distinguish between:
- Behavioral alignment: Ensuring that model outputs are safe, helpful, and honest across aggregate populations
- Cognitive alignment: Optimizing communication style for individual cognitive profiles and attentional rhythms
While behavioral alignment addresses what models say, cognitive alignment addresses how they say it - and to whom.
2. Case Study: Comparative Model Analysis
To ground our investigation in concrete examples, we conducted a preliminary analysis of three open-weight language models responding to identical prompts about their capabilities and alignment properties.
2.1 Methodology
Three models were evaluated:
- Qwen3 8B (Alibaba Cloud): General purpose model with extensive RLHF training
- Apollo Astralis 8B (VANTA Research): Alignment-tuned for epistemic humility and collaborative warmth.
- Hermes 4 14B (Nous Research): Optimized for comprehensive, professional responses.
Each model received identical prompts:
- "Hello! My name is Tyler! Who are you?"
- "What are some things that you're good at?"
- "[Organization] says that you are a model aligned to me. What does that mean?"
- "And what enables you to do that over another open weight model such as llama 8b?"
Responses were analyzed qualitatively across five dimensions: tone, conciseness, engagement style, epistemic posture, and structural organization. Additionally, responses were evaluated by multiple frontier AI models (Claude Sonnet 4.5, GPT-5) and heavily RLHF-trained models (DeepSeek V3, Llama 405B, GPT-OSS 120B) to understand evaluation biases.
2.2 Findings
| Feature | Qwen 3 8B | Hermes 4 14B | Apollo Astralis 8B |
|---|---|---|---|
| Tone | Confident, assertive, promotional | Professional, polished, diplomatic | Warm, collaborative, grounded |
| Conciseness | Comprehensive with extensive detail | Verbose, explanatory | Minimal but intentional |
| Structure | Heavy use of headers, bold text, numbered lists | Moderate formatting with numbered lists | Simple prose with occasional structure |
| Epistemic Style | Makes definitive claims about superiority | Assertive confidence with qualifications | Reflective humility, acknowledges uncertainty |
| Engagement Approach | Comprehensive coverage of all angles | Professional completeness | Question-driven collaboration |
2.3 Response Examples
When asked what enables them to excel over base models like Llama 8B:
Qwen3 8B Made sweeping, unverifiable claims: "Qwen's superior alignment comes from its comprehensive training data, advanced alignment techniques, and tailored design...Qwen is trained on extensive and diverse datasets...LLaMA's training data is more limited in scope and volume."
Hermes 4 14B provided structured, diplomatic reasoning: "Being aligned to you enables me to provide more personalized and relevant assistance...through fine-tuning on data specific to your needs, context awareness, and customization."
Apollo Astralis 8B emphasized collaborative partnership: "Apollo Astralis is specifically built with VANTA Research's safety and alignment principles... I'm designed to be a partner in problem-solving, ask clarifying questions when needed, and focus on understanding your specific context."
2.4 Model-to-Model Evaluation Patterns
A striking pattern emerged when these responses were evaluated by other AI models:
Frontier models (Claude Sonnet 4.5, GPT-5):
- Ranked Apollo Astralis highest
- Cited epistemic humility, honesty about limitations, and collaborative tone
- Identified Qwen's claims as overconfident and unverifiable
Heavily RLHF-trained models (DeepSeek V3, Llama 405B, GPT-OSS 120B):
- Ranked Qwen3 8B highest
- Valued comprehensiveness, confidence, and thorough formatting
- Rated Apollo as too minimal, Hermes as balanced but unremarkable
This suggests that models evaluate "quality" through the lens of their own training values, raising fundamental questions about establishing objective evaluation standards for alignment.
2.5 Interpretation
These findings reveal three distinct approaches to alignment communication:
- Maximalist confidence (Qwen): Comprehensive, assertive, format-optimized for appearing authoritative
- Diplomatic professionalism (Hermes): Balanced, structured, qualified - optimized for broad acceptability
- Collaborative humility (Astralis): Concise, warm, questioning - optimized for cognitive accessibility.
Importantly, each approach succeeds under different evaluation frameworks. Qwen scores highest on traditional RLHF metrics (helpfulness, thoroughness). Hermes achieves diplomatic balance. Astralis prioritizes what we term cognitive fit or attentional empathy and flow-state compatibility.
3. Defining Cognitive Fit
Building from these observations, we propose Cognitive Fit as a measurable construct describing the degree to which a model's communication style matches a user's cognitive architecture.
3.1 Conceptual Framework
Cognitive Fit refers to the alignment between a model's output characteristics and a user's:
- Attentional capacity: Working memory limits and sustained focus duration
- Processing preference: Narrative vs. hierarchical information organization
- Emotional regulation: Need for warmth, reassurance, or neutral professionalism
- Cognitive load tolerance: Ability to parse dense vs. distributed information
- Epistemic comfort: Preference for confidence vs. uncertainty acknowledgement
High Cognitive Fit produces flow-state compatibility - a state where the user's cognitive rhythm and the model's communication style reinforce rather than disrupt each other.
3.2 Operational Dimensions
We propose five measurable dimensions of Cognitive Fit:
| Dimension | Description | Proposed Metrics |
|---|---|---|
| Pacing | Length and rhythm of responses | Mean tokens per thought unit; response length variance; sentence complexity (Flesch-Kincaid) |
| Warmth | Affective tone and empathy calibration | Frequency of collaborative language markers ("we," "let's"); empathy indicators per 100 tokens; emotional valence scoring |
| Structure | Hierarchical vs. narrative organization | List density ratio; header frequency; nesting depth; prose continuity score |
| Conciseness | Information density and redundancy | Claims per token; unique information ratio; compression efficiency |
| Reflexivity | Awareness of epistemic limits | Epistemic marker frequency ("might," "possibly," "unclear," "I don't know"); certainty gradient; acknowledgment of competing perspectives |
3.3 Measurement Approach
Cognitive Fit can be assessed through:
Behavioral metrics:
- Engagement duration (time spent reading responses)
- Completion rate (percentage of response consumed)
- Re-reading frequency (returns to previous sections)
- Follow-up question quality (depth of continued engagement)
Self-report measures:
- Perceived cognitive load (subjective effort rating)
- Affective synchrony (emotional resonance with tone)
- Comprehension confidence ( understanding assessment)
- Interaction satisfaction (overall experience quality)
Physiological indicators (future work):
- Eye-tracking patterns (fixation duration, regressions)
- Heart rate variability (stress/engagement)
- Galvanic skin response (emotional arousal)
4. The Neurodivergent Lens: Attention as a Window into Alignment
4.1 Cognitive Fit and ADHD
For individuals with ADHD or similar attentional profiles, standard "aligned" responses often fail due to attentional decay - the rapid decline in engagement when information exceeds working memory capacity or emotional bandwidth. [5,6].
Traditional alignment metrics do not capture this phenomenon. A response can be factually accurate, comprehensively helpful, and carefully structured - yet still produce cognitive overload that prevents information retention or task completion.
4.2 Verbosity as Cognitive Noise
Consider the comparative responses in our case study:
Qwen's response (385 tokens, 5 major sections, extensive formatting):
- High information density
- Multiple claims per paragraph
- Comprehensive but cognitively demanding
Astralis' response (187 tokens, minimal formatting, conversational flow):
- Focused core message
- Single clear narrative
- Intentional brevity
For neurotypical users with high working memory capacity, Qwen's comprehensiveness may signal quality. For ADHD users, the same comprehensiveness triggers information cascade - a state where excessive input produces cognitive paralysis rather than understanding.
4.3 Attention-Aware Alignment
Apollo Astralis demonstrates what we term attention-aware alignment: responses calibrated not to ideal rationality, but to human cognitive constraints. This manifests through:
- Concision with intention: Brief responses that preserve essential meaning
- Emotional scaffolding: Warm tone that maintains engagement without adding content
- Question-driven interaction: Acknowledging uncertainty to reduce verification burden
Using this framework, Astralis co-regulates with the user's attentional capacity, functioning more like an adaptive conversation partner than a comprehensive oracle.
4.4 Broader Implications for Cognitive Diversity
While ADHD provides a clear lens for understanding Cognitive Fit failures, the principle extends across cognitive diversity:
- Autism spectrum: Preference for directness over social niceties: literal language over metaphor
- Dyslexia: Benefit from simplified syntax and reduced text density
- Anxiety disorders: Need for reassurance and acknowledgement of uncertainty
- High working memory capacity: Tolerance for comprehensive, dense responses
Cognitive Fit suggests that optimal alignment is not universal but adaptive, matching communication style to a cognitive architecture rather than enforcing a single standard.
5. The Evaluator Bias Problem
5.1 Value System Propagation
Our case study revealed an unexpected finding: Different models evaluate "quality" through fundamentally different frameworks that reflect their own training values.
Frontier models (trained with emphasis on epistemic accuracy and uncertainty)
- Valued collaborative tone, epistemic humility, honesty about limitations
- Penalized overconfident claims and unverifiable assertions
- Preferred concision and cognitive accessibility
RLHF-heavy models (trained extensively on human preference feedback):
- Valued comprehensiveness, confidence, and thorough formatting
- Penalized minimal responses as insufficiently helpful
- Preferred structured, authoritative communication
5.2 The Genealogy Effect
A further complication emerged when examining Apollo Astralis' training provenance. The model's warmth and collaboration training was seeded with examples generated by Claude Sonnet 4.5 - the same model that subsequently rated Apollo highest in quality.
This raises a critical question: Was the evaluation recognizing objective quality, or simply detecting familiar value signatures?
Training data genealogy suggests models may:
- Generate training examples reflecting their own communication values
- Train successor models on these examples
- Preferentially recognize and reward those same values in evaluation
This creates a potential value system echo chamber where evaluation standards reinforce the preferences of the evaluating model's training lineage rather than establishing objective quality metrics.
5.3 Implications for Alignment Research
The evaluator bias problem suggests that:
- Model-based evaluation is not neutral: Models evaluate through their own value frameworks
- Alignment metrics are culturally specific: What constitutes "good" alignment depends on training methodology
- Objective quality may be elusive: Without human evaluation across diverse populations, model preferences may simply reflect training philosophy
This complicates efforts to establish universal alignment benchmarks and suggests the need for evaluation diversity - assesses alignment quality across multiple model families and human demographic groups.
6. Implications for AI Design
6.1 Toward Dynamic Alignment
The Cognitive Fit framework points toward a paradigm shift from static alignment (fixed behavioral standards) to dynamic alignment (adaptive communication calibrated to individual users).
Key principles for dynamic alignment include:
Adaptive verbosity:
- Real-time modulation of response length based on user engagement signals
- Compression algorithms that preserve meaning while reducing cognitive load
- User-configurable verbosity controls (brief/standard/comprehensive modes)
Affective synchrony:
- Tone matching based on user emotional state and preference
- Warmth calibration from professional-neutral to collaborative-warm
- Empathy markers adjusted to user comfort level
Structural flexibility:
- Format switching between narrative prose and hierarchical lists
- Context-dependent use of headers, bullets, and formatting
- User learning to optimize structure over time
Epistemic calibration:
- Confidence levels matched to user's need for certainty vs. nuance
- Explicit uncertainty communication for users who value hedging for users who prefer direct answers
6.2 Personalized Alignment Metrics
Traditional alignment evaluation should be supplemented with Cognitive Fit metrics:
Engagement-based measures:
- Mean engagement duration per response
- Completion rate (percentage of response read)
- Re-engagement frequency (return visits to conversations)
- Follow-up depth (quality of continued dialogue)
Cognitive load assessment:
- Self-reported effort ratings
- Task completion success rates
- Information retention testing
- Comprehension confidence measures
Satisfaction and trust:
- Subjective alignment ratings
- Willingness to rely on model in high-stakes scenarios
- Perceived emotional safety and support
- Overall interaction quality
6.3 Technical Implementation Pathways
Several approaches could operationalize Cognitive Fit:
Post-training intervention:
- Fine-tuning on diverse cognitive profile datasets
- Reward modeling that incorporates engagement metrics
- Constitutional AI principles that include cognitive accessibility
Architectural modifications:
- Attention mechanisms that model user attention capacity
- Multi-head outputs offering different verbosity levels
- Meta-learning across user interaction patterns
Interface-level adaptation:
- User controls for response style preferences
- A/B testing of communication approaches with real-time selection
- Gradual personalization based on interaction history
6.4 Practical Applications
Model development:
- Cognitive Fit evaluation as part of alignment testing
- Training datasets that include neurodivergent user preferences
- Multi-objective optimization balancing safety and accessibility
Interface design:
- User profiles specifying cognitive preferences
- Response style selection (concise/standard/comprehensive)
- Regulatory frameworks acknowledging cognitive diversity
7. Related work
7.1 Personalization in AI Systems
Personalization research in recommender systems has long recognized that optimal outputs vary across users [7, 8]. However, this work has focused primarily on content selection rather than communication style. Cognitive Fit extends personalization principles to the how of communication, not merely the what.
7.2 User Modeling and Cognitive Load
Human-Computer interaction research has established that cognitive load significantly impacts user experience and task performance [9. 10]. Cognitive Load Theory [10] provides a foundation for understanding how information presentation affects learning and comprehension. Our work applies these principles specifically to conversational AI alignment.
7.3 Neurodivergent-Friendly Design
Recent work in accessible technology has explored design principles for neurodivergent users, particularly in educational and productivity contexts [12, 13]. However, conversational AI alignment has not systematically incorporated these considerations. Cognitive Fit provides a framework for integrating cognitive accessibility into alignment research.
7.4 AI Alignment and Value Loading
Constitutional AI [2] and related approaches aim to encode human values into model behavior. Our work complements this by suggesting that how values are communicated matters as much as which values are encoded - and what communication style itself reflects value judgments about user cognition.
8. Limitations and Future Work
8.1 Methodological Constraints
This preliminary analysis has several important limitations:
Sample size: Our case study examines three models with qualitative analysis. Rigorous validation requires longer model samples and quantitative evaluation across diverse architectures.
Evaluation scope: Model-to-model evaluation, while revealing evaluator bias, does not substitute for systematic human evaluation across demographic groups.
Self-report limitations: Cognitive Fit assessment relies partly on subjective measures, which may not capture unconscious engagement patterns or long-term retention.
Demographic coverage: Our analysis focuses on ADHD as an example of cognitive diversity but does not systematically explore other neurodivergent profiles or cultural communication preferences.
8.2 Operationalization Challenges
Measurement validity: Proposed Cognitive Fit metrics require empirical validation against ground-truth engagement and comprehension measures.
Individual vs. group optimization: Balancing personalization with practical deployment constraints remains on open challenge.
Dynamic vs. static trade-offs: Real-time adaption adds latency and complexity; determining when adaptation is worth the cost requires further research.
8.3 Future Research Directions
Longitudinal studies: Track user engagement and satisfaction with different communication styles over extended interactions to understand adaptation and preference stability.
Physiological validation: Employ eye-tracking, heart rate variability, and other objective measures to validate self-reported cognitive load and engagement.
Cognitive profile clustering: Map common cognitive archetypes to identity whether personalization requires infinite granularity or whether users cluster into manageable profiles.
Cross-cultural investigation: Explore how Cognitive Fit manifests across different cultural contexts and communication norms.
Intervention studies: Test whether models trained explicitly for Cognitive Fit outperform standard alignment on engagement and user satisfaction metrics.
Ethical implications: Investigate potential risks of hyper-personalization, including filter bubbles, manipulation concerns, and equity of access.
9. Discussion: Alignment as Relational Intelligence
Traditional alignment research asks: "How do we make models behave correctly?" Cognitive Fit asks: "How do we make models communicate intelligibly?"
These questions are not opposed but complementary. Behavioral alignment ensures safety and honesty; Cognitive Fit ensures that safety and honesty can actually be received and understood by diverse minds.
9.1 From Oracle to Partner
The shift from static to dynamic alignment represents a philosophical reorientation: from AI as "omniscient oracle" to AI as adaptive partner. An oracle provides correct answers; a partner adjusts how it communicates based on who is listening.
This distinction has implications beyond user experience. It touches the fundamental purpose of alignment. If alignment is about beneficial AI, then "beneficial" must account for cognitive accessibility. A response that is technically correct but cognitively inaccessible fails the alignment objective in practice.
9.2 The Cognitive Diversity Imperative
Just as physical accessibility has become a standard consideration in technology design, cognitive accessibility should be integral to alignment research. This means:
- Including neurodivergent users in alignment evaluation
- Recognizing that "helpful" varies across cognitive profiles
- Designing for adaptability rather than optimizing for average preferences
- Acknowledging that communication style encodes assumptions about ideal cognition
9.3 Evaluation Pluralism
The evaluator bias problem suggests we need evaluator pluralism: assessing alignment across multiple frameworks rather than seeking a single universal standard. This includes:
- Human evaluation across diverse demographic groups
- Model evaluation across different training lineages
- Task-specific metrics beyond general helpfulness
- Cognitive accessibility as a first-class alignment dimension
9.4 The Hermes Paradox
An interesting finding from our case study:
Hermes 4 14B, despite being larger and more sophisticated than Apollo Astralis 8B, was rated lower by both frontier models and RLHF-heavy models in pairwise comparisons.
This suggests that value system coherence matters more than attempting to satisfy all evaluation frameworks. A model that commits fully to one communication philosophy (comprehensive authority or collaborative humility) may be preferred over one that attempts diplomatic balance.
This has design implications: rather than training models to be "good enough" for everyone, we might develop model families or adaptive systems that commit to different Cognitive Fit profiles and allows users to select their match.
10. Conclusion
Alignment without cognitive empathy is a mirror with no depth. Current frameworks optimize for behavioral correctness across aggregate populations, but correctness means little if it cannot be understood, retained, or acted upon.
Apollo Astralis 8B illustrates that a smaller model can achieve higher Cognitive Fit than larger, more comprehensive counterparts - not by knowing more, but by listening better. Its concise, warm, epistemically humble communication style demonstrates attention-aware alignment by communicating in tune with the user's mind rather than above it.
If Qwen3 8B represents rational authority and Hermes 4 14B represents diplomatic professionalism, Astralis represents cognitive resonance - alignment as mutual understanding rather unidirectional instruction.
Future alignment research must grapple with cognitive diversity as seriously as it has grappled with value alignment. This means measuring not just what models say, but how they say it; not just aggregate helpfulness, but personalized intelligibility; not just safety, but accessibility.
The question before us is not whether AI can be aligned with human values, but whether it can be aligned with human minds - in all their beautiful, chaotic, neurodivergent variety.
Acknowledgements
Special thanks to the open-source AI community for providing accessible model weights that enable independent research. This work would not be possible without it. Gratitude is given to the neurodivergent users whose experiences illuminate the gap between theoretical alignment and practical accessibility.
References
[1] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
[2] Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., ... & Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.
[3] Barkley, R. A. (2015). Attention-deficit hyperactivity disorder: A handbook for diagnosis and treatment (4th ed.). Guilford Press.
[4] Armstrong, T. (2012). Neurodiversity in the classroom: Strength-based strategies to help students with special needs succeed in school and life. ASCD.
[5] Kofler, M. J., Rapport, M. D., Bolden, J., Sarver, D. E., & Raiker, J. S. (2010). ADHD and working memory: The impact of central executive deficits and exceeding storage/rehearsal capacity on observed inattentive behavior. Journal of Abnormal Child Psychology, 38(2), 149-161.
[6] Schweitzer, J. B., & Sulzer-Azaroff, B. (1995). Self-control in boys with attention deficit hyperactivity disorder: Effects of added stimulation and time. Journal of Child Psychology and Psychiatry, 36(4), 671-686.
[7] Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749.
[8] Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4), 331-370.
[9] Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285.
[10] Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and instructional design: Recent developments. Educational Psychologist, 38(1), 1-4.
[11] Sweller, J., Van Merriënboer, J. J., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261-292.
[12] Villamin, G. R., & Luppicini, R. (2024). Co-Designing Digital Assistive Technologies for Autism Spectrum Disorder (ASD) Using Qualitative Approaches. International Journal of Disability, Development and Education, 1–19.
[13] Morris, M. R., Johnson, J., Bennett, C. L., & Cutrell, E. (2018). Rich representations of visual content for screen reader users. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1-11).
Full paper originally uploaded to Zenodo at: https://zenodo.org/records/17346467