Upload folder using huggingface_hub
Browse files- README.md +7 -2
- README_zh.md +6 -17
README.md
CHANGED
|
@@ -6,11 +6,16 @@ metrics:
|
|
| 6 |
- bleu
|
| 7 |
pipeline_tag: image-to-text
|
| 8 |
---
|
|
|
|
|
|
|
|
|
|
| 9 |
# About TexTeller
|
| 10 |
-
|
|
|
|
|
|
|
| 11 |
|
| 12 |
TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.
|
| 13 |
|
| 14 |
TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), **exhibits superior generalization ability and higher accuracy compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)**, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.
|
| 15 |
|
| 16 |
-
> For more details, please refer to the
|
|
|
|
| 6 |
- bleu
|
| 7 |
pipeline_tag: image-to-text
|
| 8 |
---
|
| 9 |
+
|
| 10 |
+
[中文版本](./README_zh.md)
|
| 11 |
+
|
| 12 |
# About TexTeller
|
| 13 |
+
|
| 14 |
+
* 📮[2024-03-25] TexTeller 2.0 released! The training data for TexTeller 2.0 has been increased to 7.5M (about **15 times more** than TexTeller 1.0 and also improved in data quality). The trained TexTeller 2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.
|
| 15 |
+
> [There](https://github.com/OleehyO/TexTeller/blob/main/assets/test.pdf) are more test images here and a horizontal comparison of recognition models from different companies.
|
| 16 |
|
| 17 |
TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.
|
| 18 |
|
| 19 |
TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), **exhibits superior generalization ability and higher accuracy compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)**, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.
|
| 20 |
|
| 21 |
+
> For more details, please refer to the 𝐓𝐞𝐱𝐓𝐞𝐥𝐥𝐞𝐫 [GitHub repository](https://github.com/OleehyO/TexTeller?tab=readme-ov-file).
|
README_zh.md
CHANGED
|
@@ -1,21 +1,10 @@
|
|
| 1 |
-
|
| 2 |
-
license: mit
|
| 3 |
-
datasets:
|
| 4 |
-
- OleehyO/latex-formulas
|
| 5 |
-
metrics:
|
| 6 |
-
- bleu
|
| 7 |
-
pipeline_tag: image-to-text
|
| 8 |
-
---
|
| 9 |
|
| 10 |
-
[
|
|
|
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
|
| 15 |
-
> [There](https://github.com/OleehyO/TexTeller/blob/main/assets/test.pdf) are more test images here and a horizontal comparison of recognition models from different companies.
|
| 16 |
|
| 17 |
-
TexTeller
|
| 18 |
-
|
| 19 |
-
TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), **exhibits superior generalization ability and higher accuracy compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)**, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.
|
| 20 |
-
|
| 21 |
-
> For more details, please refer to the 𝐓𝐞𝐱𝐓𝐞𝐥𝐥𝐞𝐫 [GitHub repository](https://github.com/OleehyO/TexTeller?tab=readme-ov-file).
|
|
|
|
| 1 |
+
# 关于TexTeller
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
* 📮[2024-03-25] TexTeller2.0发布!TexTeller2.0的训练数据增大到了7.5M(相较于TexTeller1.0**增加了~15倍**并且数据质量也有所改善)。训练后的TexTeller2.0在测试集中展现出了**更加优越的性能**,尤其在生僻符号、复杂多行、矩阵的识别场景中。
|
| 4 |
+
> 在[这里](https://github.com/OleehyO/TexTeller/blob/main/assets/test.pdf)有更多的测试图片以及各家识别模型的横向对比。
|
| 5 |
|
| 6 |
+
TexTeller是一个基于ViT的端到端公式识别模型,可以把图片转换为对应的latex公式
|
| 7 |
|
| 8 |
+
TexTeller用了~~550K~~7.5M的图片-公式对进行训练(550K的数据集可以在[这里](https://huggingface.co/datasets/OleehyO/latex-formulas)获取),相比于[LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)(使用了一个100K的数据集),TexTeller具有**更强的泛化能力**以及**更高的精确度**,可以**覆盖你大部分的使用场景**。
|
|
|
|
| 9 |
|
| 10 |
+
> 详情信息请参阅[TexTeller的github仓库](https://github.com/OleehyO/TexTeller?tab=readme-ov-file)
|
|
|
|
|
|
|
|
|
|
|
|