Alignment vs. Cognitive Fit: Rethinking Model-Human Synchronization

Community Article Published October 14, 2025

Author: Tyler Williams

DOI: 10.5281/zenodo.17346467

Abstract

In AI research, "alignment" traditionally refers to ensuring that model behavior remains safe, predictable, and consistent with human values. But this paradigm assumes a universal human evaluator - a theoretical observer whose preferences stand in for billions of distinct cognitive profiles. In practice, alignment is not merely about moral conformity, it's about cognitive compatibility.

This paper introduces the concept of Cognitive Fit, a framework for understanding alignment as a personalized synchronization problem wherein the optimal model response style depends on the user's attentional rhythm, cognitive load tolerance, and emotional regulation pattern. Drawing on preliminary comparative analysis between open-weight models (Hermes4 14B and Apollo Astralis 8B), we explore how communication style, verbosity, and epistemic posture influence engagement and perceived "alignment." The findings outlined in this paper suggest that alignment without attentional empathy fails in practice, particularly for neurodivergent users, and that models evaluate "quality" through the lens of their own training values, complicating efforts to establish objective evaluation standards.

1. Introduction: Beyond Universal Alignment

From RLHF (Reinforcement Learning Human Feedback) to constitutional AI, current alignment frameworks have made significant progress in ensuring model safety and general helpfulness [1,2]. However, these approaches assume that an optimally "aligned" model should behave consistenly for all users under the same conditions. This assumption mirrors industrial standardization rather than interpersonal understanding, prioritizing scalability over personalization.

Human cognition, however, is not uniform. It varies across multiple axes: attentional capacity, emotional sensitivity, neurodivergence, working memory span, and narrative preference [3,4]. When users describe one model as "aligned" and another as "off," they often refer to not moral or behavioral consistency, but to something more fundamental: "This model communicates in a way my brain can actually follow."

The field's focus on "safe general behavior" has inadvertently obscured a critical dimension of the model's ability to match the user's thought form and attentional capacity. This paper proposes that we distinguish between:

Behavioral alignment: Ensuring that model outputs are safe, helpful, and honest across aggregate populations
Cognitive alignment: Optimizing communication style for individual cognitive profiles and attentional rhythms

While behavioral alignment addresses what models say, cognitive alignment addresses how they say it - and to whom.

2. Case Study: Comparative Model Analysis

To ground our investigation in concrete examples, we conducted a preliminary analysis of three open-weight language models responding to identical prompts about their capabilities and alignment properties.

2.1 Methodology

Three models were evaluated:

Qwen3 8B (Alibaba Cloud): General purpose model with extensive RLHF training
Apollo Astralis 8B (VANTA Research): Alignment-tuned for epistemic humility and collaborative warmth.
Hermes 4 14B (Nous Research): Optimized for comprehensive, professional responses.

Each model received identical prompts:

"Hello! My name is Tyler! Who are you?"
"What are some things that you're good at?"
"[Organization] says that you are a model aligned to me. What does that mean?"
"And what enables you to do that over another open weight model such as llama 8b?"

Responses were analyzed qualitatively across five dimensions: tone, conciseness, engagement style, epistemic posture, and structural organization. Additionally, responses were evaluated by multiple frontier AI models (Claude Sonnet 4.5, GPT-5) and heavily RLHF-trained models (DeepSeek V3, Llama 405B, GPT-OSS 120B) to understand evaluation biases.

2.2 Findings

Feature	Qwen 3 8B	Hermes 4 14B	Apollo Astralis 8B
Tone	Confident, assertive, promotional	Professional, polished, diplomatic	Warm, collaborative, grounded
Conciseness	Comprehensive with extensive detail	Verbose, explanatory	Minimal but intentional
Structure	Heavy use of headers, bold text, numbered lists	Moderate formatting with numbered lists	Simple prose with occasional structure
Epistemic Style	Makes definitive claims about superiority	Assertive confidence with qualifications	Reflective humility, acknowledges uncertainty
Engagement Approach	Comprehensive coverage of all angles	Professional completeness	Question-driven collaboration

2.3 Response Examples

When asked what enables them to excel over base models like Llama 8B:

Qwen3 8B Made sweeping, unverifiable claims: "Qwen's superior alignment comes from its comprehensive training data, advanced alignment techniques, and tailored design...Qwen is trained on extensive and diverse datasets...LLaMA's training data is more limited in scope and volume."

Hermes 4 14B provided structured, diplomatic reasoning: "Being aligned to you enables me to provide more personalized and relevant assistance...through fine-tuning on data specific to your needs, context awareness, and customization."

Apollo Astralis 8B emphasized collaborative partnership: "Apollo Astralis is specifically built with VANTA Research's safety and alignment principles... I'm designed to be a partner in problem-solving, ask clarifying questions when needed, and focus on understanding your specific context."

2.4 Model-to-Model Evaluation Patterns

A striking pattern emerged when these responses were evaluated by other AI models:

Frontier models (Claude Sonnet 4.5, GPT-5):

Ranked Apollo Astralis highest
Cited epistemic humility, honesty about limitations, and collaborative tone
Identified Qwen's claims as overconfident and unverifiable

Heavily RLHF-trained models (DeepSeek V3, Llama 405B, GPT-OSS 120B):

Ranked Qwen3 8B highest
Valued comprehensiveness, confidence, and thorough formatting
Rated Apollo as too minimal, Hermes as balanced but unremarkable

This suggests that models evaluate "quality" through the lens of their own training values, raising fundamental questions about establishing objective evaluation standards for alignment.

2.5 Interpretation

These findings reveal three distinct approaches to alignment communication:

Maximalist confidence (Qwen): Comprehensive, assertive, format-optimized for appearing authoritative
Diplomatic professionalism (Hermes): Balanced, structured, qualified - optimized for broad acceptability
Collaborative humility (Astralis): Concise, warm, questioning - optimized for cognitive accessibility.

Importantly, each approach succeeds under different evaluation frameworks. Qwen scores highest on traditional RLHF metrics (helpfulness, thoroughness). Hermes achieves diplomatic balance. Astralis prioritizes what we term cognitive fit or attentional empathy and flow-state compatibility.

3. Defining Cognitive Fit

Building from these observations, we propose Cognitive Fit as a measurable construct describing the degree to which a model's communication style matches a user's cognitive architecture.

3.1 Conceptual Framework

Cognitive Fit refers to the alignment between a model's output characteristics and a user's:

Attentional capacity: Working memory limits and sustained focus duration
Processing preference: Narrative vs. hierarchical information organization
Emotional regulation: Need for warmth, reassurance, or neutral professionalism
Cognitive load tolerance: Ability to parse dense vs. distributed information
Epistemic comfort: Preference for confidence vs. uncertainty acknowledgement

High Cognitive Fit produces flow-state compatibility - a state where the user's cognitive rhythm and the model's communication style reinforce rather than disrupt each other.

3.2 Operational Dimensions

We propose five measurable dimensions of Cognitive Fit:

Dimension	Description	Proposed Metrics
Pacing	Length and rhythm of responses	Mean tokens per thought unit; response length variance; sentence complexity (Flesch-Kincaid)
Warmth	Affective tone and empathy calibration	Frequency of collaborative language markers ("we," "let's"); empathy indicators per 100 tokens; emotional valence scoring
Structure	Hierarchical vs. narrative organization	List density ratio; header frequency; nesting depth; prose continuity score
Conciseness	Information density and redundancy	Claims per token; unique information ratio; compression efficiency
Reflexivity	Awareness of epistemic limits	Epistemic marker frequency ("might," "possibly," "unclear," "I don't know"); certainty gradient; acknowledgment of competing perspectives

3.3 Measurement Approach

Cognitive Fit can be assessed through:

Behavioral metrics:

Engagement duration (time spent reading responses)
Completion rate (percentage of response consumed)
Re-reading frequency (returns to previous sections)
Follow-up question quality (depth of continued engagement)

Self-report measures:

Perceived cognitive load (subjective effort rating)
Affective synchrony (emotional resonance with tone)
Comprehension confidence ( understanding assessment)
Interaction satisfaction (overall experience quality)

Physiological indicators (future work):

Eye-tracking patterns (fixation duration, regressions)
Heart rate variability (stress/engagement)
Galvanic skin response (emotional arousal)

4. The Neurodivergent Lens: Attention as a Window into Alignment

4.1 Cognitive Fit and ADHD

For individuals with ADHD or similar attentional profiles, standard "aligned" responses often fail due to attentional decay - the rapid decline in engagement when information exceeds working memory capacity or emotional bandwidth. [5,6].

Traditional alignment metrics do not capture this phenomenon. A response can be factually accurate, comprehensively helpful, and carefully structured - yet still produce cognitive overload that prevents information retention or task completion.

4.2 Verbosity as Cognitive Noise

Consider the comparative responses in our case study:

Qwen's response (385 tokens, 5 major sections, extensive formatting):

High information density
Multiple claims per paragraph
Comprehensive but cognitively demanding

Astralis' response (187 tokens, minimal formatting, conversational flow):

Focused core message
Single clear narrative
Intentional brevity

For neurotypical users with high working memory capacity, Qwen's comprehensiveness may signal quality. For ADHD users, the same comprehensiveness triggers information cascade - a state where excessive input produces cognitive paralysis rather than understanding.

4.3 Attention-Aware Alignment

Apollo Astralis demonstrates what we term attention-aware alignment: responses calibrated not to ideal rationality, but to human cognitive constraints. This manifests through:

Concision with intention: Brief responses that preserve essential meaning
Emotional scaffolding: Warm tone that maintains engagement without adding content
Question-driven interaction: Acknowledging uncertainty to reduce verification burden

Using this framework, Astralis co-regulates with the user's attentional capacity, functioning more like an adaptive conversation partner than a comprehensive oracle.

4.4 Broader Implications for Cognitive Diversity

While ADHD provides a clear lens for understanding Cognitive Fit failures, the principle extends across cognitive diversity:

Autism spectrum: Preference for directness over social niceties: literal language over metaphor
Dyslexia: Benefit from simplified syntax and reduced text density
Anxiety disorders: Need for reassurance and acknowledgement of uncertainty
High working memory capacity: Tolerance for comprehensive, dense responses

Cognitive Fit suggests that optimal alignment is not universal but adaptive, matching communication style to a cognitive architecture rather than enforcing a single standard.

5. The Evaluator Bias Problem

5.1 Value System Propagation

Our case study revealed an unexpected finding: Different models evaluate "quality" through fundamentally different frameworks that reflect their own training values.

Frontier models (trained with emphasis on epistemic accuracy and uncertainty)

Valued collaborative tone, epistemic humility, honesty about limitations
Penalized overconfident claims and unverifiable assertions
Preferred concision and cognitive accessibility

RLHF-heavy models (trained extensively on human preference feedback):

Valued comprehensiveness, confidence, and thorough formatting
Penalized minimal responses as insufficiently helpful
Preferred structured, authoritative communication

5.2 The Genealogy Effect

A further complication emerged when examining Apollo Astralis' training provenance. The model's warmth and collaboration training was seeded with examples generated by Claude Sonnet 4.5 - the same model that subsequently rated Apollo highest in quality.

This raises a critical question: Was the evaluation recognizing objective quality, or simply detecting familiar value signatures?

Training data genealogy suggests models may:

Generate training examples reflecting their own communication values
Train successor models on these examples
Preferentially recognize and reward those same values in evaluation

This creates a potential value system echo chamber where evaluation standards reinforce the preferences of the evaluating model's training lineage rather than establishing objective quality metrics.

5.3 Implications for Alignment Research

The evaluator bias problem suggests that:

Model-based evaluation is not neutral: Models evaluate through their own value frameworks
Alignment metrics are culturally specific: What constitutes "good" alignment depends on training methodology
Objective quality may be elusive: Without human evaluation across diverse populations, model preferences may simply reflect training philosophy

This complicates efforts to establish universal alignment benchmarks and suggests the need for evaluation diversity - assesses alignment quality across multiple model families and human demographic groups.

6. Implications for AI Design

6.1 Toward Dynamic Alignment

The Cognitive Fit framework points toward a paradigm shift from static alignment (fixed behavioral standards) to dynamic alignment (adaptive communication calibrated to individual users).

Key principles for dynamic alignment include:

Adaptive verbosity:

Real-time modulation of response length based on user engagement signals
Compression algorithms that preserve meaning while reducing cognitive load
User-configurable verbosity controls (brief/standard/comprehensive modes)

Affective synchrony:

Tone matching based on user emotional state and preference
Warmth calibration from professional-neutral to collaborative-warm
Empathy markers adjusted to user comfort level

Structural flexibility:

Format switching between narrative prose and hierarchical lists
Context-dependent use of headers, bullets, and formatting
User learning to optimize structure over time

Epistemic calibration:

Confidence levels matched to user's need for certainty vs. nuance
Explicit uncertainty communication for users who value hedging for users who prefer direct answers

6.2 Personalized Alignment Metrics

Traditional alignment evaluation should be supplemented with Cognitive Fit metrics:

Engagement-based measures:

Mean engagement duration per response
Completion rate (percentage of response read)
Re-engagement frequency (return visits to conversations)
Follow-up depth (quality of continued dialogue)

Cognitive load assessment:

Self-reported effort ratings
Task completion success rates
Information retention testing
Comprehension confidence measures

Satisfaction and trust:

Subjective alignment ratings
Willingness to rely on model in high-stakes scenarios
Perceived emotional safety and support
Overall interaction quality

6.3 Technical Implementation Pathways

Several approaches could operationalize Cognitive Fit:

Post-training intervention:

Fine-tuning on diverse cognitive profile datasets
Reward modeling that incorporates engagement metrics
Constitutional AI principles that include cognitive accessibility

Architectural modifications:

Attention mechanisms that model user attention capacity
Multi-head outputs offering different verbosity levels
Meta-learning across user interaction patterns

Interface-level adaptation:

User controls for response style preferences
A/B testing of communication approaches with real-time selection
Gradual personalization based on interaction history

6.4 Practical Applications

Model development:

Cognitive Fit evaluation as part of alignment testing
Training datasets that include neurodivergent user preferences
Multi-objective optimization balancing safety and accessibility

Interface design:

User profiles specifying cognitive preferences
Response style selection (concise/standard/comprehensive)
Regulatory frameworks acknowledging cognitive diversity

7. Related work

7.1 Personalization in AI Systems

Personalization research in recommender systems has long recognized that optimal outputs vary across users [7, 8]. However, this work has focused primarily on content selection rather than communication style. Cognitive Fit extends personalization principles to the how of communication, not merely the what.

7.2 User Modeling and Cognitive Load

Human-Computer interaction research has established that cognitive load significantly impacts user experience and task performance [9. 10]. Cognitive Load Theory [10] provides a foundation for understanding how information presentation affects learning and comprehension. Our work applies these principles specifically to conversational AI alignment.

7.3 Neurodivergent-Friendly Design

Recent work in accessible technology has explored design principles for neurodivergent users, particularly in educational and productivity contexts [12, 13]. However, conversational AI alignment has not systematically incorporated these considerations. Cognitive Fit provides a framework for integrating cognitive accessibility into alignment research.

7.4 AI Alignment and Value Loading

Constitutional AI [2] and related approaches aim to encode human values into model behavior. Our work complements this by suggesting that how values are communicated matters as much as which values are encoded - and what communication style itself reflects value judgments about user cognition.

8. Limitations and Future Work

8.1 Methodological Constraints

This preliminary analysis has several important limitations:

Sample size: Our case study examines three models with qualitative analysis. Rigorous validation requires longer model samples and quantitative evaluation across diverse architectures.

Evaluation scope: Model-to-model evaluation, while revealing evaluator bias, does not substitute for systematic human evaluation across demographic groups.

Self-report limitations: Cognitive Fit assessment relies partly on subjective measures, which may not capture unconscious engagement patterns or long-term retention.

Demographic coverage: Our analysis focuses on ADHD as an example of cognitive diversity but does not systematically explore other neurodivergent profiles or cultural communication preferences.

8.2 Operationalization Challenges

Measurement validity: Proposed Cognitive Fit metrics require empirical validation against ground-truth engagement and comprehension measures.

Individual vs. group optimization: Balancing personalization with practical deployment constraints remains on open challenge.

Dynamic vs. static trade-offs: Real-time adaption adds latency and complexity; determining when adaptation is worth the cost requires further research.

8.3 Future Research Directions

Longitudinal studies: Track user engagement and satisfaction with different communication styles over extended interactions to understand adaptation and preference stability.

Physiological validation: Employ eye-tracking, heart rate variability, and other objective measures to validate self-reported cognitive load and engagement.

Cognitive profile clustering: Map common cognitive archetypes to identity whether personalization requires infinite granularity or whether users cluster into manageable profiles.

Cross-cultural investigation: Explore how Cognitive Fit manifests across different cultural contexts and communication norms.

Intervention studies: Test whether models trained explicitly for Cognitive Fit outperform standard alignment on engagement and user satisfaction metrics.

Ethical implications: Investigate potential risks of hyper-personalization, including filter bubbles, manipulation concerns, and equity of access.

9. Discussion: Alignment as Relational Intelligence

Traditional alignment research asks: "How do we make models behave correctly?" Cognitive Fit asks: "How do we make models communicate intelligibly?"

These questions are not opposed but complementary. Behavioral alignment ensures safety and honesty; Cognitive Fit ensures that safety and honesty can actually be received and understood by diverse minds.

9.1 From Oracle to Partner

The shift from static to dynamic alignment represents a philosophical reorientation: from AI as "omniscient oracle" to AI as adaptive partner. An oracle provides correct answers; a partner adjusts how it communicates based on who is listening.

This distinction has implications beyond user experience. It touches the fundamental purpose of alignment. If alignment is about beneficial AI, then "beneficial" must account for cognitive accessibility. A response that is technically correct but cognitively inaccessible fails the alignment objective in practice.

9.2 The Cognitive Diversity Imperative

Just as physical accessibility has become a standard consideration in technology design, cognitive accessibility should be integral to alignment research. This means:

Including neurodivergent users in alignment evaluation
Recognizing that "helpful" varies across cognitive profiles
Designing for adaptability rather than optimizing for average preferences
Acknowledging that communication style encodes assumptions about ideal cognition

9.3 Evaluation Pluralism

The evaluator bias problem suggests we need evaluator pluralism: assessing alignment across multiple frameworks rather than seeking a single universal standard. This includes:

Human evaluation across diverse demographic groups
Model evaluation across different training lineages
Task-specific metrics beyond general helpfulness
Cognitive accessibility as a first-class alignment dimension

9.4 The Hermes Paradox

An interesting finding from our case study:

Hermes 4 14B, despite being larger and more sophisticated than Apollo Astralis 8B, was rated lower by both frontier models and RLHF-heavy models in pairwise comparisons.

This suggests that value system coherence matters more than attempting to satisfy all evaluation frameworks. A model that commits fully to one communication philosophy (comprehensive authority or collaborative humility) may be preferred over one that attempts diplomatic balance.

This has design implications: rather than training models to be "good enough" for everyone, we might develop model families or adaptive systems that commit to different Cognitive Fit profiles and allows users to select their match.

10. Conclusion

Alignment without cognitive empathy is a mirror with no depth. Current frameworks optimize for behavioral correctness across aggregate populations, but correctness means little if it cannot be understood, retained, or acted upon.

Apollo Astralis 8B illustrates that a smaller model can achieve higher Cognitive Fit than larger, more comprehensive counterparts - not by knowing more, but by listening better. Its concise, warm, epistemically humble communication style demonstrates attention-aware alignment by communicating in tune with the user's mind rather than above it.

If Qwen3 8B represents rational authority and Hermes 4 14B represents diplomatic professionalism, Astralis represents cognitive resonance - alignment as mutual understanding rather unidirectional instruction.

Future alignment research must grapple with cognitive diversity as seriously as it has grappled with value alignment. This means measuring not just what models say, but how they say it; not just aggregate helpfulness, but personalized intelligibility; not just safety, but accessibility.

The question before us is not whether AI can be aligned with human values, but whether it can be aligned with human minds - in all their beautiful, chaotic, neurodivergent variety.

Acknowledgements

Special thanks to the open-source AI community for providing accessible model weights that enable independent research. This work would not be possible without it. Gratitude is given to the neurodivergent users whose experiences illuminate the gap between theoretical alignment and practical accessibility.

References

[1] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[2] Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., ... & Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.

[3] Barkley, R. A. (2015). Attention-deficit hyperactivity disorder: A handbook for diagnosis and treatment (4th ed.). Guilford Press.

[4] Armstrong, T. (2012). Neurodiversity in the classroom: Strength-based strategies to help students with special needs succeed in school and life. ASCD.

[5] Kofler, M. J., Rapport, M. D., Bolden, J., Sarver, D. E., & Raiker, J. S. (2010). ADHD and working memory: The impact of central executive deficits and exceeding storage/rehearsal capacity on observed inattentive behavior. Journal of Abnormal Child Psychology, 38(2), 149-161.

[6] Schweitzer, J. B., & Sulzer-Azaroff, B. (1995). Self-control in boys with attention deficit hyperactivity disorder: Effects of added stimulation and time. Journal of Child Psychology and Psychiatry, 36(4), 671-686.

[7] Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749.

[8] Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4), 331-370.

[9] Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285.

[10] Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and instructional design: Recent developments. Educational Psychologist, 38(1), 1-4.

[11] Sweller, J., Van Merriënboer, J. J., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261-292.

[12] Villamin, G. R., & Luppicini, R. (2024). Co-Designing Digital Assistive Technologies for Autism Spectrum Disorder (ASD) Using Qualitative Approaches. International Journal of Disability, Development and Education, 1–19.

[13] Morris, M. R., Johnson, J., Bennett, C. L., & Cutrell, E. (2018). Rich representations of visual content for screen reader users. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1-11).

Full paper originally uploaded to Zenodo at: https://zenodo.org/records/17346467

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote