Add SageMaker deployment instructions
Browse files
    	
        README.md
    CHANGED
    
    | @@ -9,4 +9,19 @@ This repository contains cached neuron compilation artifacts for the most popula | |
| 9 |  | 
| 10 | 
             
            ### LLM models 
         | 
| 11 |  | 
| 12 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 9 |  | 
| 10 | 
             
            ### LLM models 
         | 
| 11 |  | 
| 12 | 
            +
            The transparent caching mechanism included in `optimum-neuron` and `NeuronX TGI`, makes it easier to export and deploy cached models to Neuron platforms such as Trainium and Inferentia.
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            To deploy directly any cached model to SageMaker:
         | 
| 15 | 
            +
            - go to the model page,
         | 
| 16 | 
            +
            - select "Deploy" in the top right corner,
         | 
| 17 | 
            +
            - select "AWS SageMaker" in the drop-down,
         | 
| 18 | 
            +
            - select the "AWS Inferentia & Trainium" tab,
         | 
| 19 | 
            +
            - copy the code snippet.
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            You can now paste the code snippet in your deployment script or notebook, following the instructions in the comment.
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            To export a model to Neuron and save it locally, please follow the instructions in the `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/guides/export_model).
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            For a list of the cached models and configurations, please refer to the inference cache [configuration files](https://huggingface.co/aws-neuron/optimum-neuron-cache/tree/main/inference-cache-config).
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            Alternatively, you can use the `optimum-cli neuron cache lookup` command to look for a specific model and see the cached configurations.
         | 

