Thehunter99's picture
Upload trained VIT-CodeGPT CAD model
82877b1 verified
---
language: en
license: apache-2.0
tags:
- vision
- image-to-code
- cad
- cadquery
- vision-encoder-decoder
- vit
- gpt2
datasets:
- CADCODER/GenCAD-Code
metrics:
- rouge
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
example_title: Example CAD Image
---
# VIT-CodeGPT CAD Code Generator
This model generates CADQuery Python code from images of 3D CAD objects. It uses a Vision Transformer (ViT) encoder and CodeGPT decoder in a vision-encoder-decoder architecture.
## Model Details
- **Architecture**: Vision Encoder-Decoder (ViT + CodeGPT)
- **Encoder**: google/vit-base-patch16-224
- **Decoder**: microsoft/CodeGPT-small-py
- **Task**: Image-to-Code Generation (CAD)
- **Dataset**: CADCODER/GenCAD-Code
- **Training Samples**: 10,000 (8,500 train / 1,500 val)
- **Training Time**: ~4 hours 12 minutes
## Training Configuration
- **Batch Size**: 4 (effective: 16 with gradient accumulation)
- **Learning Rate**: 3e-5
- **Epochs**: 3
- **Max Length**: 256 tokens
- **Optimizer**: AdamW with warmup
- **Mixed Precision**: FP16
## Performance
Final training metrics:
- **ROUGE-1**: 0.0944
- **ROUGE-2**: 0.0040
- **ROUGE-L**: 0.0863
## Usage
```python
from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer
from PIL import Image
import torch
# Load the model
model = VisionEncoderDecoderModel.from_pretrained("Thehunter99/vit-codegpt-cadcoder")
feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
tokenizer = AutoTokenizer.from_pretrained("microsoft/CodeGPT-small-py")
# Load and process image
image = Image.open("path/to/your/cad_image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
# Generate CAD code
with torch.no_grad():
generated_ids = model.generate(
pixel_values,
max_length=256,
num_beams=4,
early_stopping=True,
pad_token_id=tokenizer.eos_token_id
)
generated_code = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_code)
```
## Example Output
Input: Image of a 3D cube
Output:
```python
import cadquery as cq
# Create a simple cube
result = cq.Workplane("XY").box(10, 10, 10)
```
## Training Data
The model was trained on the CADCODER/GenCAD-Code dataset, which contains pairs of 3D CAD images and their corresponding CADQuery Python code.
## Limitations
- Limited to CADQuery syntax
- Best performance on geometric shapes similar to training data
- May struggle with very complex or unusual CAD designs
- Maximum output length: 256 tokens
## Citation
If you use this model, please cite:
```bibtex
@misc{vit-codegpt-cadcoder,
title={VIT-CodeGPT CAD Code Generator},
author={Your Name},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/Thehunter99/vit-codegpt-cadcoder}
}
```