| # StructBERT: Un-Official Copy | |
| Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT | |
| **Claimer** | |
| * This model card is not produced by [AliceMind Team](https://github.com/alibaba/AliceMind/) | |
| ## Reproduce HFHub models: | |
| Download model/tokenizer vocab | |
| ```bash | |
| wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/large_bert_config.json && mv large_bert_config.json config.json | |
| wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/vocab.txt | |
| wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model && mv en_model pytorch_model.bin | |
| ``` | |
| ```python | |
| from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer | |
| config = AutoConfig.from_pretrained("./config.json") | |
| model = AutoModelForMaskedLM.from_pretrained(".", config=config) | |
| tokenizer = AutoTokenizer.from_pretrained(".", config=config) | |
| model.push_to_hub("structbert-large") | |
| tokenizer.push_to_hub("structbert-large") | |
| ``` | |
| [https://arxiv.org/abs/1908.04577](https://arxiv.org/abs/1908.04577) | |
| # StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | |
| ## Introduction | |
| We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. | |
| Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential | |
| order of words and sentences, which leverage language structures at the word and sentence levels, | |
| respectively. | |
| ## Pre-trained models | |
| |Model | Description | #params | Download | | |
| |------------------------|-------------------------------------------|------|------| | |
| |structbert.en.large | StructBERT using the BERT-large architecture | 340M | [structbert.en.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model) | | |
| |structroberta.en.large | StructRoBERTa continue training from RoBERTa | 355M | Coming soon | | |
| |structbert.ch.large | Chinese StructBERT; BERT-large architecture | 330M | [structbert.ch.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model) | | |
| ## Results | |
| The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section. | |
| #### structbert.en.large | |
| [GLUE benchmark](https://gluebenchmark.com/leaderboard) | |
| |Model| MNLI | QNLIv2 | QQP | SST-2 | MRPC | | |
| |--------------------|-------|-------|-------|-------|-------| | |
| |structbert.en.large |86.86% |93.04% |91.67% |93.23% |86.51% | | |
| #### structbert.ch.large | |
| [CLUE benchmark](https://www.cluebenchmarks.com/) | |
| |Model | CMNLI | OCNLI | TNEWS | AFQMC | | |
| |--------------------|-------|-------|-------|-------| | |
| |structbert.ch.large |84.47% |81.28% |68.67% |76.11% | | |
| ## Example usage | |
| #### Requirements and Installation | |
| * [PyTorch](https://pytorch.org/) version >= 1.0.1 | |
| * Install other libraries via | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| * For faster training install NVIDIA's [apex](https://github.com/NVIDIA/apex) library | |
| #### Finetune MNLI | |
| ``` | |
| python run_classifier_multi_task.py \ | |
| --task_name MNLI \ | |
| --do_train \ | |
| --do_eval \ | |
| --do_test \ | |
| --amp_type O1 \ | |
| --lr_decay_factor 1 \ | |
| --dropout 0.1 \ | |
| --do_lower_case \ | |
| --detach_index -1 \ | |
| --core_encoder bert \ | |
| --data_dir path_to_glue_data \ | |
| --vocab_file config/vocab.txt \ | |
| --bert_config_file config/large_bert_config.json \ | |
| --init_checkpoint path_to_pretrained_model \ | |
| --max_seq_length 128 \ | |
| --train_batch_size 32 \ | |
| --learning_rate 2e-5 \ | |
| --num_train_epochs 3 \ | |
| --fast_train \ | |
| --gradient_accumulation_steps 1 \ | |
| --output_dir path_to_output_dir | |
| ``` | |
| ## Citation | |
| If you use our work, please cite: | |
| ``` | |
| @article{wang2019structbert, | |
| title={Structbert: Incorporating language structures into pre-training for deep language understanding}, | |
| author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo}, | |
| journal={arXiv preprint arXiv:1908.04577}, | |
| year={2019} | |
| } | |
| ``` |