File size: 5,091 Bytes

---
library_name: peft
base_model: facebook/bart-large
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Paper:** The model was published in "A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization" available in https://arxiv.org/abs/2503.10354 or https://ieeexplore.ieee.org/document/11030964
- **Developed by:** Nevidu Jayatilleke and Ruvan Weerasinghe
<!-- - **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed] -->
<!-- - **Model type:** [More Information Needed] -->
- **Supported Language:** English
- **Finetuned Domains:** Textile, Mechanical Engineering, Fixed
Construction, and Human Necessities Patent Documents from BigPatent Dataset
<!-- - **License:** [More Information Needed] -->
- **Finetuned from model:** facebook/bart-large
- **Link to the Specialised Model:** https://huggingface.co/Nevidu/LexBartLo_1

<!-- ### Model Sources -->

<!-- Provide the basic links for the model. -->

<!-- - **Repository:** [More Information Needed] -->


## How to use the model

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx
import pandas as pd

def preprocess_text(text):
    sentences = sent_tokenize(text)
    tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]
    return tokenized_sentences

def sentence_similarity(sentence1, sentence2):
    stop_words = set(stopwords.words('english'))
    filtered_sentence1 = [w for w in sentence1 if w not in stop_words]
    filtered_sentence2 = [w for w in sentence2 if w not in stop_words]
    all_words = list(set(filtered_sentence1 + filtered_sentence2))
    vector1 = [filtered_sentence1.count(word) for word in all_words]
    vector2 = [filtered_sentence2.count(word) for word in all_words]
    return 1 - cosine_distance(vector1, vector2)

def build_similarity_matrix(sentences):
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
    for i in range(len(sentences)):
        for j in range(len(sentences)):
            if i != j:
                similarity_matrix[i][j] = sentence_similarity(sentences[i], sentences[j])
    return similarity_matrix

def apply_lexrank(similarity_matrix, damping=0.85, threshold=0.2, max_iter=100):
    nx_graph = nx.from_numpy_array(similarity_matrix)
    scores = nx.pagerank(nx_graph, alpha=damping, tol=threshold, max_iter=max_iter)
    return scores

def get_top_sentences(sentences, scores):
    ranked_sentences = sorted(((scores[i], sentence) for i, sentence in enumerate(sentences)), reverse=True)
    top_sentences = [sentence for score, sentence in ranked_sentences]
    return top_sentences

def extract_important_sentences(text):
    preprocessed_sentences = preprocess_text(text)
    similarity_matrix = build_similarity_matrix(preprocessed_sentences)
    scores = apply_lexrank(similarity_matrix)
    top_sentences = get_top_sentences(preprocessed_sentences, scores)
    paragraph = ' '.join([' '.join(sentence) for sentence in top_sentences])
    return paragraph

def summarize(text, max_tokens):

    peft_model = "Nevidu/LexBartLo_2"
    config = PeftConfig.from_pretrained(peft_model)

    # load base LLM model and tokenizer
    model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

    # Load the Lora model
    model = PeftModel.from_pretrained(model, peft_model)

    sorted_text = extract_important_sentences(text)

    input_ids = tokenizer(sorted_text, return_tensors="pt", truncation=True).input_ids
    # with torch.inference_mode():
    outputs = model.generate(input_ids=input_ids, max_new_tokens=max_tokens, do_sample=True, top_p=0.9)
    summary = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
    return summary

text = """ Add your patent text"""
max_tokens = 256

summary = summarize(text, max_tokens)
```

## Citation

```json
@INPROCEEDINGS{11030964,
  author={Jayatilleke, Nevidu and Weerasinghe, Ruvan},
  booktitle={2025 International Research Conference on Smart Computing and Systems Engineering (SCSE)}, 
  title={A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization}, 
  year={2025},
  volume={},
  number={},
  pages={1-6},
  keywords={automatic text summarization;intellectual property;natural language processing;parameter efficient fine tuning},
  doi={10.1109/SCSE65633.2025.11030964}}
```

### Framework versions

- PEFT 0.9.0