BoLiu commited on
Commit
10efd0d
·
verified ·
1 Parent(s): 2658758
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
  - ingestion
15
  - yolox
16
  ---
17
- # Nemoretriever Page Element v3
18
 
19
  ## Model Overview
20
 
@@ -23,13 +23,13 @@ tags:
23
 
24
  ### Description
25
 
26
- The **NeMo Retriever Page Elements v3** model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from [Megvii Technology](https://github.com/Megvii-BaseDetection/YOLOX), we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect **tables**, **charts**, **infographics**, **titles**, **header/footers** and **texts** in documents.
27
 
28
- This model supersedes the [nemoretriever-page-elements](https://build.nvidia.com/nvidia/nemoretriever-page-elements-v2) model and is a part of the NVIDIA NeMo Retriever family of NIM microservices specifically for object detection and multimodal extraction of enterprise documents.
29
 
30
  This model is ready for commercial/non-commercial use.
31
 
32
- We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [nemoretriever-page-elements-v2](https://build.nvidia.com/nvidia/nemoretriever-page-elements-v2).
33
 
34
  ### License/Terms of use
35
 
@@ -50,20 +50,20 @@ Global
50
 
51
  ### Use Case
52
 
53
- The **NeMo Retriever Page Elements v3** model is designed for automating extraction of text, charts, tables, infographics etc in enterprise documents. It can be used for document analysis, understanding and processing. Key applications include:
54
  - Enterprise document extraction, embedding and indexing
55
  - Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval
56
  - Data extraction from legacy documents and reports
57
 
58
  ### Release Date
59
 
60
- 10/23/2025 via https://huggingface.co/nvidia/nemoretriever-page-elements-v3
61
 
62
  ### References
63
 
64
  - YOLOX paper: https://arxiv.org/abs/2107.08430
65
  - YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX
66
- - Previous version of the Page Element model: https://build.nvidia.com/nvidia/nemoretriever-page-elements-v2
67
  - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/
68
 
69
  ### Model Architecture
@@ -115,11 +115,11 @@ git lfs install
115
  ```
116
  - Using https
117
  ```
118
- git clone https://huggingface.co/nvidia/nemoretriever-page-elements-v3
119
  ```
120
  - Or using ssh
121
  ```
122
- git clone [email protected]:nvidia/nemoretriever-page-elements-v3
123
  ```
124
 
125
  2. Run the model using the following code:
@@ -171,7 +171,7 @@ We provide examples in the notebook `Demo.ipynb`.
171
  ### Software Integration
172
 
173
  **Runtime Engine(s):**
174
- - **NeMo Retriever Page Elements v3** NIM
175
 
176
  **Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
177
  - NVIDIA Ampere
@@ -187,7 +187,7 @@ This AI model can be embedded as an Application Programming Interface (API) call
187
 
188
  ## Model Version(s):
189
 
190
- * `nemoretriever-page-elements-v3`
191
 
192
  ## Training and Evaluation Datasets:
193
 
 
14
  - ingestion
15
  - yolox
16
  ---
17
+ # Nemotron Page Element v3
18
 
19
  ## Model Overview
20
 
 
23
 
24
  ### Description
25
 
26
+ The **Nemotron Page Elements v3** model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from [Megvii Technology](https://github.com/Megvii-BaseDetection/YOLOX), we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect **tables**, **charts**, **infographics**, **titles**, **header/footers** and **texts** in documents.
27
 
28
+ This model supersedes the [nemotron-page-elements](https://build.nvidia.com/nvidia/nemotron-page-elements-v2) model and is a part of the NVIDIA Nemotron family of NIM microservices specifically for object detection and multimodal extraction of enterprise documents.
29
 
30
  This model is ready for commercial/non-commercial use.
31
 
32
+ We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [nemotron-page-elements-v2](https://build.nvidia.com/nvidia/nemotron-page-elements-v2).
33
 
34
  ### License/Terms of use
35
 
 
50
 
51
  ### Use Case
52
 
53
+ The **Nemotron Page Elements v3** model is designed for automating extraction of text, charts, tables, infographics etc in enterprise documents. It can be used for document analysis, understanding and processing. Key applications include:
54
  - Enterprise document extraction, embedding and indexing
55
  - Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval
56
  - Data extraction from legacy documents and reports
57
 
58
  ### Release Date
59
 
60
+ 10/23/2025 via https://huggingface.co/nvidia/nemotron-page-elements-v3
61
 
62
  ### References
63
 
64
  - YOLOX paper: https://arxiv.org/abs/2107.08430
65
  - YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX
66
+ - Previous version of the Page Element model: https://build.nvidia.com/nvidia/nemotron-page-elements-v2
67
  - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/
68
 
69
  ### Model Architecture
 
115
  ```
116
  - Using https
117
  ```
118
+ git clone https://huggingface.co/nvidia/nemotron-page-elements-v3
119
  ```
120
  - Or using ssh
121
  ```
122
+ git clone [email protected]:nvidia/nemotron-page-elements-v3
123
  ```
124
 
125
  2. Run the model using the following code:
 
171
  ### Software Integration
172
 
173
  **Runtime Engine(s):**
174
+ - **Nemotron Page Elements v3** NIM
175
 
176
  **Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
177
  - NVIDIA Ampere
 
187
 
188
  ## Model Version(s):
189
 
190
+ * `nemotron-page-elements-v3`
191
 
192
  ## Training and Evaluation Datasets:
193