NoLBERT: A No Lookahead(back) Foundational Language Model for Empirical Research
Abstract
NoLBERT, a timestamped language model pre-trained on 1976-1995 text, avoids biases and excels in NLP tasks while enabling the analysis of innovation networks and predicting long-term profits.
We present NoLBERT, a lightweight, timestamped foundational language model for empirical research in social sciences, particularly in economics and finance. By pre-training exclusively on 1976-1995 text, NoLBERT avoids both lookback and lookahead biases that can undermine econometric inference. It exceeds domain-specific baselines on NLP benchmarks while maintaining temporal consistency. Applied to patent texts, NoLBERT enables the construction of firm-level innovation networks and shows that gains in innovation centrality predict higher long-run profit growth.
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper