Commit
·
0f25402
1
Parent(s):
8318f97
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
### Pegasus for Financial Summarization
|
| 2 |
+
|
| 3 |
+
This model was trained on a novel financial dataset which consists of 2K financial and economic articles from the [Bloomberg](https://www.bloomberg.com/europe) website of different categories such as stock, markets, currencies, rate and cryptocurrences, using [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html). This model is fine-tuned on the [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum).
|
| 4 |
+
|
| 5 |
+
PEGASUS model was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
#### Installing
|
| 9 |
+
In order to use this model, you have to install Transformers and SentencePiece Tokenizer as follows:
|
| 10 |
+
|
| 11 |
+
```Python
|
| 12 |
+
pip install transformers
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
#### How to use
|
| 16 |
+
We provide a simple snippet of how to use this model for the task of financial summarization in Pytorch. Don't worry if you prefer
|
| 17 |
+
Tensorflow, you can also run the following code only with minor changes.
|
| 18 |
+
|
| 19 |
+
```Python
|
| 20 |
+
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
|
| 21 |
+
|
| 22 |
+
# Let's load the model and the tokenizer
|
| 23 |
+
model_name = "human-centered-summarization/financial-summarization-pegasus"
|
| 24 |
+
tokenizer = PegasusTokenizer.from_pretrained(model_name)
|
| 25 |
+
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model
|
| 26 |
+
# just replace with TFPegasusForConditionalGeneration
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
# Some text to summarize here
|
| 30 |
+
text_to_summarize = "National Commercial BankNational Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
|
| 31 |
+
|
| 32 |
+
# Tokenize our text
|
| 33 |
+
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
|
| 34 |
+
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
|
| 35 |
+
|
| 36 |
+
# Generate the output (Here, we use beam search but you can also use any other strategy you like)
|
| 37 |
+
output = model.generate(
|
| 38 |
+
input_ids,
|
| 39 |
+
max_length=32,
|
| 40 |
+
num_beams=5,
|
| 41 |
+
early_stopping=True
|
| 42 |
+
)
|
| 43 |
+
|
| 44 |
+
# Finally, we can print the generated summary
|
| 45 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
| 46 |
+
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
## Evaluation Results
|
| 52 |
+
The results before and after the fine-tuning on our dataset are shown below:
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
| Fine-tuning | R-1 | R-2 | R-L | R-S |
|
| 56 |
+
|:-----------:|:-----:|:-----:|:------:|:-----:|
|
| 57 |
+
| Yes | 23.55 | 6.99 | 18.14 | 21.36 |
|
| 58 |
+
| No | 13.8 | 2.4 | 10.63 | 12.03 |
|