RLM-GemmaS-Code-v0 / README.md

Update README.md

0c92773 verified about 1 month ago

5.32 kB

	---
	license: mit
	datasets:
	- akhauriyash/GraphArch-Regression
	- akhauriyash/Code-Regression
	metrics:
	- spearmanr
	base_model:
	- google/t5gemma-s-s-prefixlm
	---

	# Regression Language Models for Code (RLMs)

	We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

	Link for Code-Regression dataset: https://huggingface.co/datasets/akhauriyash/Code-Regression

	Link for Graph-Regression dataset: https://huggingface.co/datasets/akhauriyash/GraphArch-Regression

	## Testing Code-Regression with a basic Gemma RLM model

	Use the code below as reference for evaluating a basic RegressLM model ( better, more models to come! :) )

	We strongly recommend `transformers==4.53.2` for compatibility, though latest transformers should work as well.

	```
	import torch
	import numpy as np
	from datasets import load_dataset
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	from scipy.stats import spearmanr
	from tqdm import tqdm

	REPO_ID = "akhauriyash/RLM-GemmaS-Code-v0"
	DATASET = "akhauriyash/Code-Regression"
	dataset = load_dataset(DATASET, split="train")
	tok = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = AutoModelForSeq2SeqLM.from_pretrained(REPO_ID, trust_remote_code=True).to(device).eval()
	MAX_ITEMS, BATCH_SIZE, spaces, results = 512, 16, ["KBSS", "CDSS", "APPS"], {}
	language = None # Specify language for CDSS, e.g. "python"
	n_out_tokens = getattr(model.config, "num_tokens_per_obj", 8) * getattr(model.config, "max_num_objs", 1)
	n_out_tokens = model.config.num_tokens_per_obj * model.config.max_num_objs

	for SPACE in spaces:
	inputs, targets = [], []
	for row in tqdm(dataset, desc=f"Processing {SPACE} till {MAX_ITEMS} items"):
	if row.get("space") == SPACE and "input" in row and "target" in row:
	try:
	lang = eval(row['metadata'])['language'] if SPACE == "CDSS" else None
	if SPACE != "CDSS" or language is None or lang == language:
	targets.append(float(row["target"]))
	if SPACE == "CDSS":
	inputs.append(f"# {SPACE}\n# Language: {lang}\n{row['input']}")
	else:
	inputs.append(f"{SPACE}\n{row['input']}")
	except: continue
	if len(inputs) >= MAX_ITEMS: break
	preds = []
	for i in tqdm(range(0, len(inputs), BATCH_SIZE)):
	enc = tok(inputs[i:i+BATCH_SIZE], return_tensors="pt", truncation=True, padding=True, max_length=2048).to(device)
	batch_preds = []
	for _ in range(8):
	out = model.generate(**enc, max_new_tokens=n_out_tokens, min_new_tokens=n_out_tokens, do_sample=True, top_p=0.95, temperature=1.0)
	decoded = [tok.token_ids_to_floats(seq.tolist()) for seq in out]
	decoded = [d[0] if isinstance(d, list) and d else float("nan") for d in decoded]
	batch_preds.append(decoded)
	preds.extend(torch.tensor(batch_preds).median(dim=0).values.tolist())
	spear, _ = spearmanr(np.array(targets), np.array(preds))
	results[SPACE] = spear; print(f"Spearman ρ for {SPACE}: {spear:.3f}")

	print("Spearman ρ \| KBSS \| CDSS \| APPS")
	print(f"{REPO_ID} \| " + " \| ".join(f"{results[s]:.3f}" for s in spaces))

	```

	We got the following results when testing on a random subset of the Code-Regression dataset.

	```
	Model ID \| KBSS \| CDSS \| APPS
	akhauriyash/RegressLM-gemma-s-RLM-table3 \| 0.527 \| 0.787 \| 0.926
	```

	## Citations
	If you found this model or datasets attached useful for your research, please cite us:

	```
	@article{akhauri2025regressionlanguagemodelscode,
	title={Regression Language Models for Code},
	author={Yash Akhauri and Xingyou Song and Arissa Wongpanich and Bryan Lewandowski and Mohamed S. Abdelfattah},
	journal={arXiv preprint arXiv:2509.26476},
	year={2025}
	}

	@article{akhauri2025performance,
	title={Performance Prediction for Large Systems via Text-to-Text Regression},
	author={Akhauri, Yash and Lewandowski, Bryan and Lin, Cheng-Hsi and Reyes, Adrian N and Forbes, Grant C and Wongpanich, Arissa and Yang, Bangding and Abdelfattah, Mohamed S and Perel, Sagi and Song, Xingyou},
	journal={arXiv preprint arXiv:2506.21718},
	year={2025}
	}
	```