rename
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
- ingestion
|
| 15 |
- yolox
|
| 16 |
---
|
| 17 |
-
#
|
| 18 |
|
| 19 |
## Model Overview
|
| 20 |
|
|
@@ -23,13 +23,13 @@ tags:
|
|
| 23 |
|
| 24 |
### Description
|
| 25 |
|
| 26 |
-
The **
|
| 27 |
|
| 28 |
-
This model supersedes the [
|
| 29 |
|
| 30 |
This model is ready for commercial/non-commercial use.
|
| 31 |
|
| 32 |
-
We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [
|
| 33 |
|
| 34 |
### License/Terms of use
|
| 35 |
|
|
@@ -50,20 +50,20 @@ Global
|
|
| 50 |
|
| 51 |
### Use Case
|
| 52 |
|
| 53 |
-
The **
|
| 54 |
- Enterprise document extraction, embedding and indexing
|
| 55 |
- Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval
|
| 56 |
- Data extraction from legacy documents and reports
|
| 57 |
|
| 58 |
### Release Date
|
| 59 |
|
| 60 |
-
10/23/2025 via https://huggingface.co/nvidia/
|
| 61 |
|
| 62 |
### References
|
| 63 |
|
| 64 |
- YOLOX paper: https://arxiv.org/abs/2107.08430
|
| 65 |
- YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX
|
| 66 |
-
- Previous version of the Page Element model: https://build.nvidia.com/nvidia/
|
| 67 |
- Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/
|
| 68 |
|
| 69 |
### Model Architecture
|
|
@@ -115,11 +115,11 @@ git lfs install
|
|
| 115 |
```
|
| 116 |
- Using https
|
| 117 |
```
|
| 118 |
-
git clone https://huggingface.co/nvidia/
|
| 119 |
```
|
| 120 |
- Or using ssh
|
| 121 |
```
|
| 122 |
-
git clone [email protected]:nvidia/
|
| 123 |
```
|
| 124 |
|
| 125 |
2. Run the model using the following code:
|
|
@@ -171,7 +171,7 @@ We provide examples in the notebook `Demo.ipynb`.
|
|
| 171 |
### Software Integration
|
| 172 |
|
| 173 |
**Runtime Engine(s):**
|
| 174 |
-
- **
|
| 175 |
|
| 176 |
**Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
|
| 177 |
- NVIDIA Ampere
|
|
@@ -187,7 +187,7 @@ This AI model can be embedded as an Application Programming Interface (API) call
|
|
| 187 |
|
| 188 |
## Model Version(s):
|
| 189 |
|
| 190 |
-
* `
|
| 191 |
|
| 192 |
## Training and Evaluation Datasets:
|
| 193 |
|
|
|
|
| 14 |
- ingestion
|
| 15 |
- yolox
|
| 16 |
---
|
| 17 |
+
# Nemotron Page Element v3
|
| 18 |
|
| 19 |
## Model Overview
|
| 20 |
|
|
|
|
| 23 |
|
| 24 |
### Description
|
| 25 |
|
| 26 |
+
The **Nemotron Page Elements v3** model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from [Megvii Technology](https://github.com/Megvii-BaseDetection/YOLOX), we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect **tables**, **charts**, **infographics**, **titles**, **header/footers** and **texts** in documents.
|
| 27 |
|
| 28 |
+
This model supersedes the [nemotron-page-elements](https://build.nvidia.com/nvidia/nemotron-page-elements-v2) model and is a part of the NVIDIA Nemotron family of NIM microservices specifically for object detection and multimodal extraction of enterprise documents.
|
| 29 |
|
| 30 |
This model is ready for commercial/non-commercial use.
|
| 31 |
|
| 32 |
+
We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [nemotron-page-elements-v2](https://build.nvidia.com/nvidia/nemotron-page-elements-v2).
|
| 33 |
|
| 34 |
### License/Terms of use
|
| 35 |
|
|
|
|
| 50 |
|
| 51 |
### Use Case
|
| 52 |
|
| 53 |
+
The **Nemotron Page Elements v3** model is designed for automating extraction of text, charts, tables, infographics etc in enterprise documents. It can be used for document analysis, understanding and processing. Key applications include:
|
| 54 |
- Enterprise document extraction, embedding and indexing
|
| 55 |
- Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval
|
| 56 |
- Data extraction from legacy documents and reports
|
| 57 |
|
| 58 |
### Release Date
|
| 59 |
|
| 60 |
+
10/23/2025 via https://huggingface.co/nvidia/nemotron-page-elements-v3
|
| 61 |
|
| 62 |
### References
|
| 63 |
|
| 64 |
- YOLOX paper: https://arxiv.org/abs/2107.08430
|
| 65 |
- YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX
|
| 66 |
+
- Previous version of the Page Element model: https://build.nvidia.com/nvidia/nemotron-page-elements-v2
|
| 67 |
- Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/
|
| 68 |
|
| 69 |
### Model Architecture
|
|
|
|
| 115 |
```
|
| 116 |
- Using https
|
| 117 |
```
|
| 118 |
+
git clone https://huggingface.co/nvidia/nemotron-page-elements-v3
|
| 119 |
```
|
| 120 |
- Or using ssh
|
| 121 |
```
|
| 122 |
+
git clone [email protected]:nvidia/nemotron-page-elements-v3
|
| 123 |
```
|
| 124 |
|
| 125 |
2. Run the model using the following code:
|
|
|
|
| 171 |
### Software Integration
|
| 172 |
|
| 173 |
**Runtime Engine(s):**
|
| 174 |
+
- **Nemotron Page Elements v3** NIM
|
| 175 |
|
| 176 |
**Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
|
| 177 |
- NVIDIA Ampere
|
|
|
|
| 187 |
|
| 188 |
## Model Version(s):
|
| 189 |
|
| 190 |
+
* `nemotron-page-elements-v3`
|
| 191 |
|
| 192 |
## Training and Evaluation Datasets:
|
| 193 |
|