AstroSage-Llama-3.1-8B-GGUF
https://arxiv.org/abs/2411.09012
AstroSage-Llama-3.1-8B-GGUF is the quantized version of AstroSage-Llama-3.1-8B, optimized for efficient deployment while maintaining the model's specialized capabilities in astronomy, astrophysics, and cosmology. This quantized version aims to provide a more accessible deployment option while preserving the model's capabilities.
Model Details
- Base Architecture: Meta-Llama-3.1-8B
 - Base Model: AstroSage-Llama-3.1-8B
 - Parameters: 8 billion
 - Quantization: GGUF format with two precision options
 - Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
 - License: Llama 3.1 Community License
 - Development Process:
- Based on the fully trained AstroSage-Llama-3.1-8B model
 - Quantized to GGUF format in two versions
 - Optimized for efficient inference
 
 
Using the Model
Python Implementation
from llama_cpp import Llama 
from huggingface_hub import hf_hub_download 
import os 
import sys 
import contextlib
# Suppress warnings
@contextlib.contextmanager 
def suppress_stderr(): 
    stderr = sys.stderr 
    with open(os.devnull, 'w') as devnull: 
        sys.stderr = devnull 
        try: 
            yield 
        finally: 
            sys.stderr = stderr 
# or change the filename to AstroSage-8B-BF16.gguf for BF16 quantization
def download_model(repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-Q8_0.gguf"): 
    try: 
        os.makedirs("models", exist_ok=True) 
        local_path = os.path.join("models", filename) 
        if not os.path.exists(local_path): 
            print(f"Downloading {filename}...") 
            with suppress_stderr(): 
                local_path = hf_hub_download( 
                    repo_id=repo_id, 
                    filename=filename, 
                    local_dir="models", 
                    local_dir_use_symlinks=False 
                ) 
            print("Download complete!") 
        return local_path 
    except Exception as e: 
        print(f"Error downloading model: {e}") 
        raise 
def initialize_llm(): 
    model_path = download_model() 
    with suppress_stderr(): 
        return Llama( 
            model_path=model_path, 
            n_ctx=2048, 
            n_threads=4 
        ) 
def get_response(llm, prompt, max_tokens=128): 
    response = llm( 
        prompt, 
        max_tokens=max_tokens, 
        temperature=0.7, 
        top_p=0.9, 
        top_k=40, 
        repeat_penalty=1.1, 
        stop=["User:", "\n\n"] 
    ) 
    return response['choices'][0]['text'] 
def main(): 
    llm = initialize_llm()
    
    # Example question about galaxy formation
    first_question = "How does a galaxy form?"
    print("\nQuestion:", first_question)
    print("\nAI:", get_response(llm, first_question).strip(), "\n")
    
    print("\nYou can now ask more questions! Type 'quit' or 'exit' to end the conversation.\n")
    
    while True:
        try:
            user_input = input("You: ")
            if user_input.lower() in ['quit', 'exit']:
                print("\nGoodbye!")
                break
                
            print("\nAI:", get_response(llm, user_input).strip(), "\n")
            
        except KeyboardInterrupt:
            print("\nGoodbye!")
            break
        except Exception as e:
            print(f"Error: {e}")
if __name__ == "__main__": 
    main()
Installation Requirements
pip install llama-cpp-python huggingface_hub
For Macbook with Apple Silicon, install llama-cpp with the following instead
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DLLAMA_METAL=on" pip install llama-cpp-python
Key Parameters
n_ctx: Context window size (default: 2048)n_threads: Number of CPU threads to use (adjust based on your hardware)temperature: Controls randomnesstop_p: Nucleus sampling parametertop_k: Limits vocabulary choicesrepeat_penalty: Prevents repetitionmax_tokens: Maximum length of response (128 default, increase for longer answers)
Example Usage
The model will automatically:
- Download the quantized model from Hugging Face
 - Initialize it with recommended parameters
 - Start with an example question about galaxy formation
 - Allow for interactive conversation
 - Support easy exit with 'quit' or 'exit' commands
 
For different use cases, you can:
- Use the BF16 version for maximum accuracy
 - Adjust context window size for longer conversations
 - Modify temperature for more/less deterministic responses
 - Change max_tokens for longer/shorter responses
 
Model Improvements and Performance
The quantized model offers several advantages:
- Reduced memory requirements
 - CPU inference capability
 - Faster inference speed
 - Broader hardware compatibility
 
Note: Formal benchmarking of the quantized model is pending. Performance metrics will be updated once comprehensive testing is completed.
Quantization Details
- Format: GGUF
 - Available Versions:
- AstroSage-8B-BF16.gguf: bfloat16 precision, original precision
 - AstroSage-8B-Q8_0.gguf: 8-bit quantized, negligible loss in perplexity, smaller size
 
 - Compatibility: Works with llama.cpp and derived projects
 - Trade-offs:
- BF16: 
- Best quality, closest to original model behavior
 - Larger file size and memory requirements
 - Recommended for accuracy-critical applications
 
 - Q8_0:
- Reduced memory footprint
 - Good balance of performance and size
 - Suitable for most general applications
 
 
 - BF16: 
 
Intended Use
- Curiosity-driven question answering
 - Brainstorming new ideas
 - Astronomical research assistance
 - Educational support in astronomy
 - Literature review and summarization
 - Scientific explanation of concepts
 - Low-resource deployment scenarios
 - Edge device implementation
 - CPU-only environments
 - Applications requiring reduced memory footprint
 
Limitations
- All limitations of the original model apply
 - Additional considerations:
- Potential reduction in prediction accuracy due to quantization
 - May show increased variance in numeric calculations
 - Reduced precision in edge cases
 - Performance may vary based on hardware configuration
 
 
Technical Specifications
- Architecture: Meta-Llama 3.1
 - Deployment: CPU-friendly, reduced memory footprint
 - Format: GGUF (compatible with llama.cpp)
 
Ethical Considerations
While this model is designed for scientific use:
- Should not be used as sole source for critical research decisions
 - Output should be verified against primary sources
 - May reflect biases present in astronomical literature
 
Citation and Contact
- Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
 - AstroMLab: astromachinelearninglab at gmail dot com
 - Please cite the AstroMLab 3 paper when referencing this model:
 
@preprint{dehaan2024astromlab3,
      title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model}, 
      author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
      year={2024},
      eprint={2411.09012},
      archivePrefix={arXiv},
      primaryClass={astro-ph.IM},
      url={https://arxiv.org/abs/2411.09012}, 
}
Additional note: When citing this quantized version, please reference both the original AstroMLab 3 paper above and specify the use of the GGUF quantized variant.
- Downloads last month
 - 44
 
8-bit
16-bit
Model tree for AstroMLab/AstroSage-8B-GGUF
Base model
meta-llama/Llama-3.1-8B