--- language: en license: apache-2.0 tags: - vision - image-to-code - cad - cadquery - vision-encoder-decoder - vit - gpt2 datasets: - CADCODER/GenCAD-Code metrics: - rouge widget: - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg example_title: Example CAD Image --- # VIT-CodeGPT CAD Code Generator This model generates CADQuery Python code from images of 3D CAD objects. It uses a Vision Transformer (ViT) encoder and CodeGPT decoder in a vision-encoder-decoder architecture. ## Model Details - **Architecture**: Vision Encoder-Decoder (ViT + CodeGPT) - **Encoder**: google/vit-base-patch16-224 - **Decoder**: microsoft/CodeGPT-small-py - **Task**: Image-to-Code Generation (CAD) - **Dataset**: CADCODER/GenCAD-Code - **Training Samples**: 10,000 (8,500 train / 1,500 val) - **Training Time**: ~4 hours 12 minutes ## Training Configuration - **Batch Size**: 4 (effective: 16 with gradient accumulation) - **Learning Rate**: 3e-5 - **Epochs**: 3 - **Max Length**: 256 tokens - **Optimizer**: AdamW with warmup - **Mixed Precision**: FP16 ## Performance Final training metrics: - **ROUGE-1**: 0.0944 - **ROUGE-2**: 0.0040 - **ROUGE-L**: 0.0863 ## Usage ```python from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer from PIL import Image import torch # Load the model model = VisionEncoderDecoderModel.from_pretrained("Thehunter99/vit-codegpt-cadcoder") feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224") tokenizer = AutoTokenizer.from_pretrained("microsoft/CodeGPT-small-py") # Load and process image image = Image.open("path/to/your/cad_image.png") pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values # Generate CAD code with torch.no_grad(): generated_ids = model.generate( pixel_values, max_length=256, num_beams=4, early_stopping=True, pad_token_id=tokenizer.eos_token_id ) generated_code = tokenizer.decode(generated_ids[0], skip_special_tokens=True) print(generated_code) ``` ## Example Output Input: Image of a 3D cube Output: ```python import cadquery as cq # Create a simple cube result = cq.Workplane("XY").box(10, 10, 10) ``` ## Training Data The model was trained on the CADCODER/GenCAD-Code dataset, which contains pairs of 3D CAD images and their corresponding CADQuery Python code. ## Limitations - Limited to CADQuery syntax - Best performance on geometric shapes similar to training data - May struggle with very complex or unusual CAD designs - Maximum output length: 256 tokens ## Citation If you use this model, please cite: ```bibtex @misc{vit-codegpt-cadcoder, title={VIT-CodeGPT CAD Code Generator}, author={Your Name}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/Thehunter99/vit-codegpt-cadcoder} } ```