I'm thrilled to share our new paper, "FinForge: Semi-Synthetic Financial Benchmark Generation" in AI4Finance, AAAI'26!
Key Contributions: - FinForge Framework: A hybrid pipeline integrating manual/programmatic corpus construction with rigorous LM-based synthesis. - FinForge-5k Dataset: A new snapshot benchmark comprising over 5,000 human-validated Q&A pairs across 11 financial subdomains, derived from a curated corpus of 100,000 verified documents (143M tokens). - Benchmarking Results: Evaluation of state-of-the-art open and closed-source models reveals significant variance in financial reasoning capabilities, with leading models achieving approximately 80% accuracy.
Huge thanks to my co-authors @glennmatlin , Anant Gupta, Anirudh JM, Rayan Castilla, and Yi Mei Ng for this collaboration.