|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- akhauriyash/GraphArch-Regression |
|
|
- akhauriyash/Code-Regression |
|
|
metrics: |
|
|
- spearmanr |
|
|
base_model: |
|
|
- google/t5gemma-s-s-prefixlm |
|
|
--- |
|
|
|
|
|
# Regression Language Models for Code (RLMs) |
|
|
|
|
|
We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms. |
|
|
|
|
|
**Link for Code-Regression dataset**: https://huggingface.co/datasets/akhauriyash/Code-Regression |
|
|
|
|
|
**Link for Graph-Regression dataset**: https://huggingface.co/datasets/akhauriyash/GraphArch-Regression |
|
|
|
|
|
## Testing Code-Regression with a basic Gemma RLM model |
|
|
|
|
|
Use the code below as reference for evaluating a basic RegressLM model ( better, more models to come! :) ) |
|
|
|
|
|
**We strongly recommend `transformers==4.53.2` for compatibility, though latest transformers should work as well.** |
|
|
|
|
|
``` |
|
|
import torch |
|
|
import numpy as np |
|
|
from datasets import load_dataset |
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
from scipy.stats import spearmanr |
|
|
from tqdm import tqdm |
|
|
|
|
|
REPO_ID = "akhauriyash/RLM-GemmaS-Code-v0" |
|
|
DATASET = "akhauriyash/Code-Regression" |
|
|
dataset = load_dataset(DATASET, split="train") |
|
|
tok = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True) |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(REPO_ID, trust_remote_code=True).to(device).eval() |
|
|
MAX_ITEMS, BATCH_SIZE, spaces, results = 512, 16, ["KBSS", "CDSS", "APPS"], {} |
|
|
language = None # Specify language for CDSS, e.g. "python" |
|
|
n_out_tokens = getattr(model.config, "num_tokens_per_obj", 8) * getattr(model.config, "max_num_objs", 1) |
|
|
n_out_tokens = model.config.num_tokens_per_obj * model.config.max_num_objs |
|
|
|
|
|
for SPACE in spaces: |
|
|
inputs, targets = [], [] |
|
|
for row in tqdm(dataset, desc=f"Processing {SPACE} till {MAX_ITEMS} items"): |
|
|
if row.get("space") == SPACE and "input" in row and "target" in row: |
|
|
try: |
|
|
lang = eval(row['metadata'])['language'] if SPACE == "CDSS" else None |
|
|
if SPACE != "CDSS" or language is None or lang == language: |
|
|
targets.append(float(row["target"])) |
|
|
if SPACE == "CDSS": |
|
|
inputs.append(f"# {SPACE}\n# Language: {lang}\n{row['input']}") |
|
|
else: |
|
|
inputs.append(f"{SPACE}\n{row['input']}") |
|
|
except: continue |
|
|
if len(inputs) >= MAX_ITEMS: break |
|
|
preds = [] |
|
|
for i in tqdm(range(0, len(inputs), BATCH_SIZE)): |
|
|
enc = tok(inputs[i:i+BATCH_SIZE], return_tensors="pt", truncation=True, padding=True, max_length=2048).to(device) |
|
|
batch_preds = [] |
|
|
for _ in range(8): |
|
|
out = model.generate(**enc, max_new_tokens=n_out_tokens, min_new_tokens=n_out_tokens, do_sample=True, top_p=0.95, temperature=1.0) |
|
|
decoded = [tok.token_ids_to_floats(seq.tolist()) for seq in out] |
|
|
decoded = [d[0] if isinstance(d, list) and d else float("nan") for d in decoded] |
|
|
batch_preds.append(decoded) |
|
|
preds.extend(torch.tensor(batch_preds).median(dim=0).values.tolist()) |
|
|
spear, _ = spearmanr(np.array(targets), np.array(preds)) |
|
|
results[SPACE] = spear; print(f"Spearman ρ for {SPACE}: {spear:.3f}") |
|
|
|
|
|
print("Spearman ρ | KBSS | CDSS | APPS") |
|
|
print(f"{REPO_ID} | " + " | ".join(f"{results[s]:.3f}" for s in spaces)) |
|
|
|
|
|
``` |
|
|
|
|
|
We got the following results when testing on a random subset of the Code-Regression dataset. |
|
|
|
|
|
``` |
|
|
Model ID | KBSS | CDSS | APPS |
|
|
akhauriyash/RegressLM-gemma-s-RLM-table3 | 0.527 | 0.787 | 0.926 |
|
|
``` |
|
|
|
|
|
## Citations |
|
|
If you found this model or datasets attached useful for your research, please cite us: |
|
|
|
|
|
``` |
|
|
@article{akhauri2025regressionlanguagemodelscode, |
|
|
title={Regression Language Models for Code}, |
|
|
author={Yash Akhauri and Xingyou Song and Arissa Wongpanich and Bryan Lewandowski and Mohamed S. Abdelfattah}, |
|
|
journal={arXiv preprint arXiv:2509.26476}, |
|
|
year={2025} |
|
|
} |
|
|
|
|
|
@article{akhauri2025performance, |
|
|
title={Performance Prediction for Large Systems via Text-to-Text Regression}, |
|
|
author={Akhauri, Yash and Lewandowski, Bryan and Lin, Cheng-Hsi and Reyes, Adrian N and Forbes, Grant C and Wongpanich, Arissa and Yang, Bangding and Abdelfattah, Mohamed S and Perel, Sagi and Song, Xingyou}, |
|
|
journal={arXiv preprint arXiv:2506.21718}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
|