This is the second model in the ensemble for the MindsAI @ Tufa Labs team for the ARC Prize 2025 competition. It was originally based on the CodeT5 model from Salesforce. It was modified to have 16 layers in the decoder from the original 24 layers. Testing demonstrated that removing layers was more harmful to performance when removed from the encoder, but was able to fully recover when removing decoder layers. The model has been trained for approximately 2 years on a V4-64 TPU. Google TPU Research cloud was very generous to provide TPUs for training and research. It would have been impossible to develop TTFT, AIRV, train the models, and many other things without the generosity of Google and the TPU Research Cloud program.

  • Span-Corruption Refinement (SCR): The model was trained with an additional
  • pretraining objective I call SCR (chosen because of the model's deep history of
  • training with the span corruption objective). The answer is noised with heavy
  • span corruption and data augmentation (to mimic incorrect answers) and passed
  • along with the prompt for refinement (model outputs the full corrected grid).
  • Note: This did not result in better performance when used during inference (only used during TTT).

ARC Data Formatting

  • ARC tasks ship as JSON where each task_id contains train pairs and test inputs; every grid is a rectangular list of lists with integers 0-9. Dimensions follow the original 1×1–30×30 spec, though the evaluator accepts up to 50×50.
  • Example task payload:
    {
      "task_id": {
        "train": [
          {"input": [[0,0],[1,1]], "output": [[1,1],[1,1]]}
        ],
        "test": [
          {"input": [[0,0,0],[0,1,0],[0,0,0]]}
        ]
      }
    }
    
  • Model prompts (prompt column during training/TTT/inference) are serialized text strings: solve: train input1 <train_input> output1 <prefix><train_output>. … test tinput1 <test_input> toutput1 . Each grid token <train_input> / <train_output> / <test_input> is produced by grid_to_string, so rows are concatenated digits separated by spaces. Multiple train examples increment the index (input2, output2, etc.).
  • Prompt example:
    solve: train input1 000 010 000 output1 11 3 3 10 111 101 111. input2 00 02 output2 5 2 2 20 22 20. test tinput1 0000 0300 0000 0000 toutput1 
    
  • Model targets (correct_answer column and expected decoder output before post-processing) follow output_prefix semantics: {total_chars} {height} {width} {symbols} {row_strings}. Here total_chars = height*width + (height - 1) and symbols is the deduplicated sequence of colors as they are first encountered when scanning the board row-major; that rule applies to every output grid we emit (training outputs inside the prompt and the predicted test toutput). Example target string for a 3×3 donut:
     11 3 3 10 111 101 111.
    
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mindware/arc-codet5-660m-scr

Finetuned
(3)
this model

Datasets used to train mindware/arc-codet5-660m-scr