Safetensors
d_fine
nlivathinos commited on
Commit
fd1060c
·
verified ·
1 Parent(s): 9bce935

Update Readme with demo code (#2)

Browse files

- docs: Update Readme with demo code (9b9b84def71265429fc43db45623784309079d52)

Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ THIS IS WORK IN PROGRESS
6
+
7
+
8
+ # Docling Layout Model egret-medium
9
+
10
+ `docling-layout-egret-101` is a Document Layout Model based on [DFINE-m](https://github.com/Peterande/D-FINE).
11
+
12
+ The model has been trained from scratch on a mix of document datasets.
13
+
14
+ It is part of the [Docling project](https://github.com/docling-project/docling).
15
+
16
+
17
+ # Inference code example
18
+
19
+ Prerequisites:
20
+
21
+ ```bash
22
+ pip install transformers Pillow torch requests
23
+ ```
24
+
25
+ Prediction:
26
+
27
+ ```python
28
+ import requests
29
+ from transformers import (
30
+ DFineForObjectDetection,
31
+ RTDetrImageProcessor,
32
+ )
33
+ import torch
34
+ from PIL import Image
35
+
36
+
37
+ classes_map = {
38
+ 0: "Caption",
39
+ 1: "Footnote",
40
+ 2: "Formula",
41
+ 3: "List-item",
42
+ 4: "Page-footer",
43
+ 5: "Page-header",
44
+ 6: "Picture",
45
+ 7: "Section-header",
46
+ 8: "Table",
47
+ 9: "Text",
48
+ 10: "Title",
49
+ 11: "Document Index",
50
+ 12: "Code",
51
+ 13: "Checkbox-Selected",
52
+ 14: "Checkbox-Unselected",
53
+ 15: "Form",
54
+ 16: "Key-Value Region",
55
+ }
56
+ image_url = "https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo/resolve/main/example_images/annual_rep_14.png"
57
+ model_name = "ds4sd/docling-layout-egret-medium"
58
+ threshold = 0.6
59
+
60
+ # Download the image
61
+ image = Image.open(requests.get(image_url, stream=True).raw)
62
+ image = image.convert("RGB")
63
+
64
+
65
+ # Initialize the model
66
+ image_processor = RTDetrImageProcessor.from_pretrained(model_name)
67
+ model = DFineForObjectDetection.from_pretrained(model_name)
68
+
69
+ # Run the prediction pipeline
70
+ inputs = image_processor(images=[image], return_tensors="pt")
71
+ with torch.no_grad():
72
+ outputs = model(**inputs)
73
+ results = image_processor.post_process_object_detection(
74
+ outputs,
75
+ target_sizes=torch.tensor([image.size[::-1]]),
76
+ threshold=threshold,
77
+ )
78
+
79
+ # Get the results
80
+ for result in results:
81
+ for score, label_id, box in zip(
82
+ result["scores"], result["labels"], result["boxes"]
83
+ ):
84
+ score = round(score.item(), 2)
85
+ label = classes_map[label_id.item()]
86
+ box = [round(i, 2) for i in box.tolist()]
87
+ print(f"{label}:{score} {box}")
88
+ ```
89
+
90
+
91
+ # References
92
+
93
+ ```
94
+ @techreport{Docling,
95
+ author = {Deep Search Team},
96
+ month = {8},
97
+ title = {Docling Technical Report},
98
+ url = {https://arxiv.org/abs/2408.09869v4},
99
+ eprint = {2408.09869},
100
+ doi = {10.48550/arXiv.2408.09869},
101
+ version = {1.0.0},
102
+ year = {2024}
103
+ }
104
+
105
+ @misc{peng2024dfine,
106
+ title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
107
+ author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
108
+ year={2024},
109
+ eprint={2410.13842},
110
+ archivePrefix={arXiv},
111
+ primaryClass={cs.CV}
112
+ }
113
+ ```
114
+