File size: 1,236 Bytes
f2e9e7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a955ea0
 
7235e65
 
a955ea0
 
f2e9e7e
 
 
 
 
 
 
bec85f8
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
pipeline_tag: sentence-similarity
base_model: Qwen/Qwen3-Embedding-0.6B
license: apache-2.0
library_name: peft
tags:
- peft
- lora
- sentence-transformers
- semantic-search
- faiss
- hnsw
- mathlib4
- lean4
---

# Lightweight Semantic Mathlib Search — LoRA Adapter (Demo)

**What this is:** 
A small, reproducible demo adapter for `Qwen/Qwen3-Embedding-0.6B` that enables *lightweight, LLM-free* semantic search over mathlib (Lean 4). 
Supports both natural language and formula-based search (can be used in combination) for finding Lean theorems or mathematical statements.
Does not support LaTeX syntax search yet.

Data is built from ~180k Lean theorems extracted via LeanDojo for **Lean 4.20.1** (there may be minor gaps vs. the full mathlib).

> Part of the AITP 2025 submission: **“Towards Lightweight and LLM-Free Semantic Search for mathlib4.”**

## Contents

- `adapter/adapter_config.json` & `adapter/adapter_model.safetensors` — LoRA weights for the encoder  
- `faiss/<*.idx>` — HNSW FAISS index (built for cosine/inner-product search)  
- `faiss/*public_index.jsonl` — public mapping `{ idx, full_name, statement }`

## DEMO
https://github.com/IsaacLi74/Lightweight-and-LLM-Free-Semantic-Search-for-mathlib4