Young Ho Shin
commited on
Commit
Β·
f369852
1
Parent(s):
36bccd1
Clean up app.py and article.md
Browse files- app.py +3 -4
- article.md +41 -15
app.py
CHANGED
|
@@ -34,17 +34,17 @@ def process_image(image):
|
|
| 34 |
# !ls examples | grep png
|
| 35 |
|
| 36 |
# +
|
| 37 |
-
title = "Convert
|
| 38 |
|
| 39 |
with open('article.md',mode='r') as file:
|
| 40 |
article = file.read()
|
| 41 |
|
| 42 |
description = """
|
| 43 |
-
This is a demo of machine learning model trained to
|
| 44 |
To use it, simply upload an image or use one of the example images below and click 'submit'.
|
| 45 |
Results will show up in a few seconds.
|
| 46 |
|
| 47 |
-
Try rendering the
|
| 48 |
(The model is not perfect yet, so you may need to edit the resulting LaTeX a bit to get it to render a good match.)
|
| 49 |
|
| 50 |
"""
|
|
@@ -61,7 +61,6 @@ examples = [
|
|
| 61 |
[ "examples/7afdeff0e6.png" ],
|
| 62 |
[ "examples/b8f1e64b1f.png" ],
|
| 63 |
]
|
| 64 |
-
#examples =[["examples/image_0.png"], ["image_1.png"], ["image_2.png"]]
|
| 65 |
# -
|
| 66 |
|
| 67 |
iface = gr.Interface(fn=process_image,
|
|
|
|
| 34 |
# !ls examples | grep png
|
| 35 |
|
| 36 |
# +
|
| 37 |
+
title = "Convert image to LaTeX source code"
|
| 38 |
|
| 39 |
with open('article.md',mode='r') as file:
|
| 40 |
article = file.read()
|
| 41 |
|
| 42 |
description = """
|
| 43 |
+
This is a demo of machine learning model trained to reconstruct the LaTeX source code of an equation from an image.
|
| 44 |
To use it, simply upload an image or use one of the example images below and click 'submit'.
|
| 45 |
Results will show up in a few seconds.
|
| 46 |
|
| 47 |
+
Try rendering the generated LaTeX [here](https://quicklatex.com/) to compare with the original.
|
| 48 |
(The model is not perfect yet, so you may need to edit the resulting LaTeX a bit to get it to render a good match.)
|
| 49 |
|
| 50 |
"""
|
|
|
|
| 61 |
[ "examples/7afdeff0e6.png" ],
|
| 62 |
[ "examples/b8f1e64b1f.png" ],
|
| 63 |
]
|
|
|
|
| 64 |
# -
|
| 65 |
|
| 66 |
iface = gr.Interface(fn=process_image,
|
article.md
CHANGED
|
@@ -14,8 +14,8 @@ and the corresponding LaTeX code:
|
|
| 14 |
```
|
| 15 |
|
| 16 |
|
| 17 |
-
This demo is a first step in solving
|
| 18 |
-
Eventually, you'll be able to take a quick screenshot
|
| 19 |
and a program built with this model will generate its corresponding LaTeX source code
|
| 20 |
so that you can just copy/paste straight into your personal notes.
|
| 21 |
No more endless googling obscure LaTeX syntax!
|
|
@@ -24,25 +24,51 @@ No more endless googling obscure LaTeX syntax!
|
|
| 24 |
|
| 25 |
Because this problem involves looking at an image and generating valid LaTeX code,
|
| 26 |
the model needs to understand both Computer Vision (CV) and Natural Language Processing (NLP).
|
| 27 |
-
There are some other projects that aim to solve the same problem with some very interesting
|
| 28 |
-
|
| 29 |
and a "decoder" that takes that information and translates it into what is hopefully both valid and accurate LaTeX code.
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
...
|
| 33 |
-
|
| 34 |
-
I chose to tackle this problem with transfer learning.
|
| 35 |
The biggest reason for this is computing constraints -
|
| 36 |
-
|
| 37 |
There are some other benefits to this approach,
|
| 38 |
-
e.g. the architecture is already proven to be robust
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
I chose TrOCR, an OCR machine learning model trained by Microsoft on SRIOE data to produce text from receipts.
|
| 41 |
|
| 42 |
<p style='text-align: center'>Made by Young Ho Shin</p>
|
| 43 |
<p style='text-align: center'>
|
| 44 |
-
<a href = "mailto: [email protected]">Email</a> |
|
| 45 |
-
<a href='https://www.github.com/yhshin11'>Github</a> |
|
| 46 |
-
<a href='https://www.linkedin.com/in/young-ho-shin-3995051b9/'>Linkedin</a>
|
| 47 |
-
|
| 48 |
</p>
|
|
|
|
| 14 |
```
|
| 15 |
|
| 16 |
|
| 17 |
+
This demo is a first step in solving this problem.
|
| 18 |
+
Eventually, you'll be able to take a quick partial screenshot from a paper
|
| 19 |
and a program built with this model will generate its corresponding LaTeX source code
|
| 20 |
so that you can just copy/paste straight into your personal notes.
|
| 21 |
No more endless googling obscure LaTeX syntax!
|
|
|
|
| 24 |
|
| 25 |
Because this problem involves looking at an image and generating valid LaTeX code,
|
| 26 |
the model needs to understand both Computer Vision (CV) and Natural Language Processing (NLP).
|
| 27 |
+
There are some other projects that aim to solve the same problem with some very interesting models.
|
| 28 |
+
These generally involve some kind of "encoder" that looks at the image and extracts/encodes the information about the equation from the image,
|
| 29 |
and a "decoder" that takes that information and translates it into what is hopefully both valid and accurate LaTeX code.
|
| 30 |
+
The "encode" part can be done using classic CNN architectures commonly used for CV tasks, or newer vision transformer architectures.
|
| 31 |
+
The "decode" part can be done with LSTMs or transformer decoders, using attention mechanism to make sure the decoder understands long range dependencies, e.g. remembering to close a bracket that was opened a long sequence away.
|
| 32 |
|
| 33 |
+
I chose to tackle this problem with transfer learning, using an existing OCR model and fine-tuning it for this task.
|
|
|
|
|
|
|
|
|
|
| 34 |
The biggest reason for this is computing constraints -
|
| 35 |
+
GPU hours are expensive so I wanted training to be reasonably fast, on the order of a couple of hours.
|
| 36 |
There are some other benefits to this approach,
|
| 37 |
+
e.g. the architecture is already proven to be robust.
|
| 38 |
+
I chose [TrOCR](https://arxiv.org/abs/2109.10282), a model trained at Microsoft for text recognition tasks which uses transformer architecture for both the encoder and decoder.
|
| 39 |
+
|
| 40 |
+
For the data, I used the `im2latex-100k` dataset, which includes a total of roughly 100k formulas and images.
|
| 41 |
+
Some preprocessing steps were done by Harvard NLP for the [`im2markup` project](https://github.com/harvardnlp/im2markup).
|
| 42 |
+
To limit the scope of the project and simplify the task, I limited training data to only look at equations containing 100 LaTeX tokens or less.
|
| 43 |
+
This covers most single line equations, including fractions, subscripts, symbols, etc, but does not cover large multi line equations, some of which can have up to 500 LaTeX tokens.
|
| 44 |
+
GPU training was done using on Kaggle in roughly 3 hours.
|
| 45 |
+
You can find the full training code on my Kaggle profile [here](https://www.kaggle.com/code/younghoshin/finetuning-trocr/notebook).
|
| 46 |
+
|
| 47 |
+
## What's next?
|
| 48 |
+
|
| 49 |
+
There's multiple improvements that I'm hoping to make to this project.
|
| 50 |
+
|
| 51 |
+
### More robust prediction
|
| 52 |
+
|
| 53 |
+
If you've tried the examples above (randomly sampled from the test set), you've noticed that the model predictions aren't quite perfect and the model occasionally misses, duplicates or mistakes tokens.
|
| 54 |
+
More training on the existing data set could help with this.
|
| 55 |
+
|
| 56 |
+
### More data
|
| 57 |
+
|
| 58 |
+
There's a lot of LaTeX data available on the internet besides `im2latex-100k`, e.g. arXiv and Wikipedia.
|
| 59 |
+
It's just waiting to be scraped and used for this project.
|
| 60 |
+
This means a lot of hours of scraping, cleaning, and processing but having a more diverse set of input images could improve model accuracy significantly.
|
| 61 |
+
|
| 62 |
+
### Faster and smaller model
|
| 63 |
+
|
| 64 |
+
The model currently takes a few seconds to process a single image.
|
| 65 |
+
I would love to improve performance so that it can run in one second or less, maybe even on mobile devices.
|
| 66 |
+
This might be impossible with TrOCR which is a fairly large model, designed for use on GPUs.
|
| 67 |
|
|
|
|
| 68 |
|
| 69 |
<p style='text-align: center'>Made by Young Ho Shin</p>
|
| 70 |
<p style='text-align: center'>
|
| 71 |
+
<a href = "mailto: [email protected]">Email</a> |
|
| 72 |
+
<a href='https://www.github.com/yhshin11'>Github</a> |
|
| 73 |
+
<a href='https://www.linkedin.com/in/young-ho-shin-3995051b9/'>Linkedin</a>
|
|
|
|
| 74 |
</p>
|