|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- vision |
|
|
- image-to-code |
|
|
- cad |
|
|
- cadquery |
|
|
- vision-encoder-decoder |
|
|
- vit |
|
|
- gpt2 |
|
|
datasets: |
|
|
- CADCODER/GenCAD-Code |
|
|
metrics: |
|
|
- rouge |
|
|
widget: |
|
|
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg |
|
|
example_title: Example CAD Image |
|
|
--- |
|
|
|
|
|
# VIT-CodeGPT CAD Code Generator |
|
|
|
|
|
This model generates CADQuery Python code from images of 3D CAD objects. It uses a Vision Transformer (ViT) encoder and CodeGPT decoder in a vision-encoder-decoder architecture. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: Vision Encoder-Decoder (ViT + CodeGPT) |
|
|
- **Encoder**: google/vit-base-patch16-224 |
|
|
- **Decoder**: microsoft/CodeGPT-small-py |
|
|
- **Task**: Image-to-Code Generation (CAD) |
|
|
- **Dataset**: CADCODER/GenCAD-Code |
|
|
- **Training Samples**: 10,000 (8,500 train / 1,500 val) |
|
|
- **Training Time**: ~4 hours 12 minutes |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
- **Batch Size**: 4 (effective: 16 with gradient accumulation) |
|
|
- **Learning Rate**: 3e-5 |
|
|
- **Epochs**: 3 |
|
|
- **Max Length**: 256 tokens |
|
|
- **Optimizer**: AdamW with warmup |
|
|
- **Mixed Precision**: FP16 |
|
|
|
|
|
## Performance |
|
|
|
|
|
Final training metrics: |
|
|
- **ROUGE-1**: 0.0944 |
|
|
- **ROUGE-2**: 0.0040 |
|
|
- **ROUGE-L**: 0.0863 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load the model |
|
|
model = VisionEncoderDecoderModel.from_pretrained("Thehunter99/vit-codegpt-cadcoder") |
|
|
feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224") |
|
|
tokenizer = AutoTokenizer.from_pretrained("microsoft/CodeGPT-small-py") |
|
|
|
|
|
# Load and process image |
|
|
image = Image.open("path/to/your/cad_image.png") |
|
|
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values |
|
|
|
|
|
# Generate CAD code |
|
|
with torch.no_grad(): |
|
|
generated_ids = model.generate( |
|
|
pixel_values, |
|
|
max_length=256, |
|
|
num_beams=4, |
|
|
early_stopping=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
generated_code = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
|
print(generated_code) |
|
|
``` |
|
|
|
|
|
## Example Output |
|
|
|
|
|
Input: Image of a 3D cube |
|
|
Output: |
|
|
```python |
|
|
import cadquery as cq |
|
|
|
|
|
# Create a simple cube |
|
|
result = cq.Workplane("XY").box(10, 10, 10) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on the CADCODER/GenCAD-Code dataset, which contains pairs of 3D CAD images and their corresponding CADQuery Python code. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Limited to CADQuery syntax |
|
|
- Best performance on geometric shapes similar to training data |
|
|
- May struggle with very complex or unusual CAD designs |
|
|
- Maximum output length: 256 tokens |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{vit-codegpt-cadcoder, |
|
|
title={VIT-CodeGPT CAD Code Generator}, |
|
|
author={Your Name}, |
|
|
year={2024}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/Thehunter99/vit-codegpt-cadcoder} |
|
|
} |
|
|
``` |
|
|
|