File size: 2,442 Bytes
f287aea
 
00b50cb
f287aea
00b50cb
f287aea
00b50cb
f287aea
 
00b50cb
f287aea
00b50cb
9b26663
00b50cb
 
 
529258c
00b50cb
 
 
 
 
 
 
8d00249
00b50cb
 
f287aea
 
00b50cb
 
 
 
 
8d00249
 
 
00b50cb
 
 
8d00249
 
 
00b50cb
f287aea
8d00249
 
 
 
 
 
 
f287aea
 
00b50cb
f287aea
00b50cb
f287aea
00b50cb
 
 
 
 
 
f287aea
00b50cb
f287aea
00b50cb
 
 
 
 
 
 
 
 
f287aea
00b50cb
f287aea
00b50cb
 
 
 
 
 
f287aea
 
00b50cb
f287aea
 
00b50cb
f287aea
8d00249
f287aea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
tags:
- image-to-text
- image-captioning
- endpoints-template
license: bsd-3-clause
library_name: generic
---

# Fork of [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) for a `image-captioning` task on 🤗Inference endpoint.

This repository implements a `custom` task for `image-captioning` for 🤗 Inference Endpoints. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/florentgbelidji/blip_captioning/blob/main/pipeline.py).
To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file. -> _double check if it is selected_
### expected Request payload
```json
{
  "image": "/9j/4AAQSkZJRgA.....", #encoded image
  "text": "a photography of a"
}
```
below is an example on how to run a request using Python and `requests`.
## Run Request 
1. Use any online  image. 
```bash
!wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg
```
2.run request

```python
import json
from typing import List
import requests as r
import base64

with open("/content/demo.jpg", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode()

ENDPOINT_URL = ""
HF_TOKEN = ""

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()


output = query({
    "inputs": {
        "images": [encoded_string],  # using the base64 encoded string
        "texts": ["a photography of"]  # Optional, based on your current class logic
    }
})
print(output)
```

Example parameters depending on the decoding strategy:

1. Beam search

``` 
        "parameters": {
                   "num_beams":5,
                   "max_length":20
        }
```

2. Nucleus sampling

``` 
        "parameters": {
                   "num_beams":1,
                   "max_length":20,
                   "do_sample": True,
                   "top_k":50,
                   "top_p":0.95
        }
```

3. Contrastive search

``` 
        "parameters": {
                   "penalty_alpha":0.6,
                   "top_k":4
                   "max_length":512
        }
```

See [generate()](https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/text_generation#transformers.GenerationMixin.generate) doc for additional detail


expected output
```python
{'captions': ['a photography of a woman and her dog on the beach']}
```