Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MILU
|
| 2 |
+
MILU is a joint neural model that allows you to simultaneously predict multiple dialog act items (a dialog act item takes a form of domain-intent(slot, value). Since it is common that, in a multi-domain setting, an utterance has multiple dialog act items, MILU is likely to yield higher performance than conventional single-intent models.
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
## Example usage
|
| 6 |
+
We based our implementation on the [AllenNLP library](https://github.com/allenai/allennlp). For an introduction to this library, you should check [these tutorials](https://allennlp.org/tutorials).
|
| 7 |
+
|
| 8 |
+
To use this model, you need to additionally install `overrides==4.1.2, allennlp==0.9.0` and use `python>=3.6,<=3.8`.
|
| 9 |
+
|
| 10 |
+
### On MultiWOZ dataset
|
| 11 |
+
|
| 12 |
+
```bash
|
| 13 |
+
$ python train.py multiwoz/configs/[base|context3].jsonnet -s serialization_dir
|
| 14 |
+
$ python evaluate.py serialization_dir/model.tar.gz {test_file} --cuda-device {CUDA_DEVICE}
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
If you want to perform end-to-end evaluation, you can include the trained model by adding the model path (serialization_dir/model.tar.gz) to your ConvLab spec file.
|
| 18 |
+
|
| 19 |
+
#### Data
|
| 20 |
+
We use the multiwoz data (data/multiwoz/[train|val|test].json.zip).
|
| 21 |
+
|
| 22 |
+
### MILU on datasets in unified format
|
| 23 |
+
We support training MILU on datasets that are in our unified format.
|
| 24 |
+
|
| 25 |
+
- For **non-categorical** dialogue acts whose values are in the utterances, we use **slot tagging** to extract the values.
|
| 26 |
+
- For **categorical** and **binary** dialogue acts whose values may not be presented in the utterances, we treat them as **intents** of the utterances.
|
| 27 |
+
|
| 28 |
+
Takes MultiWOZ 2.1 (unified format) as an example,
|
| 29 |
+
```bash
|
| 30 |
+
$ python train.py unified_datasets/configs/multiwoz21_user_context3.jsonnet -s serialization_dir
|
| 31 |
+
$ python evaluate.py serialization_dir/model.tar.gz test --cuda-device {CUDA_DEVICE} --output_file output/multiwoz21_user/output.json
|
| 32 |
+
|
| 33 |
+
# to generate output/multiwoz21_user/predictions.json that merges test data and model predictions.
|
| 34 |
+
$ python unified_datasets/merge_predict_res.py -d multiwoz21 -s user -p output/multiwoz21_user/output.json
|
| 35 |
+
```
|
| 36 |
+
Note that the config file is different from the above. You should set:
|
| 37 |
+
- `"use_unified_datasets": true` in `dataset_reader` and `model`
|
| 38 |
+
- `"dataset_name": "multiwoz21"` in `dataset_reader`
|
| 39 |
+
- `"train_data_path": "train"`
|
| 40 |
+
- `"validation_data_path": "validation"`
|
| 41 |
+
- `"test_data_path": "test"`
|
| 42 |
+
|
| 43 |
+
## Predict
|
| 44 |
+
See `nlu.py` under `multiwoz` and `unified_datasets` directories.
|
| 45 |
+
|
| 46 |
+
## Performance on unified format datasets
|
| 47 |
+
|
| 48 |
+
To illustrate that it is easy to use the model for any dataset that in our unified format, we report the performance on several datasets in our unified format. We follow `README.md` and config files in `unified_datasets/` to generate `predictions.json`, then evaluate it using `../evaluate_unified_datasets.py`. Note that we use almost the same hyper-parameters for different datasets, which may not be optimal.
|
| 49 |
+
|
| 50 |
+
<table>
|
| 51 |
+
<thead>
|
| 52 |
+
<tr>
|
| 53 |
+
<th></th>
|
| 54 |
+
<th colspan=2>MultiWOZ 2.1</th>
|
| 55 |
+
<th colspan=2>Taskmaster-1</th>
|
| 56 |
+
<th colspan=2>Taskmaster-2</th>
|
| 57 |
+
<th colspan=2>Taskmaster-3</th>
|
| 58 |
+
</tr>
|
| 59 |
+
</thead>
|
| 60 |
+
<thead>
|
| 61 |
+
<tr>
|
| 62 |
+
<th>Model</th>
|
| 63 |
+
<th>Acc</th><th>F1</th>
|
| 64 |
+
<th>Acc</th><th>F1</th>
|
| 65 |
+
<th>Acc</th><th>F1</th>
|
| 66 |
+
<th>Acc</th><th>F1</th>
|
| 67 |
+
</tr>
|
| 68 |
+
</thead>
|
| 69 |
+
<tbody>
|
| 70 |
+
<tr>
|
| 71 |
+
<td>MILU</td>
|
| 72 |
+
<td>72.9</td><td>85.2</td>
|
| 73 |
+
<td>72.9</td><td>49.2</td>
|
| 74 |
+
<td>79.1</td><td>68.7</td>
|
| 75 |
+
<td>85.4</td><td>80.3</td>
|
| 76 |
+
</tr>
|
| 77 |
+
<tr>
|
| 78 |
+
<td>MILU (context=3)</td>
|
| 79 |
+
<td>76.6</td><td>87.9</td>
|
| 80 |
+
<td>72.4</td><td>48.5</td>
|
| 81 |
+
<td>78.9</td><td>68.4</td>
|
| 82 |
+
<td>85.1</td><td>80.1</td>
|
| 83 |
+
</tr>
|
| 84 |
+
</tbody>
|
| 85 |
+
</table>
|
| 86 |
+
|
| 87 |
+
- Acc: whether all dialogue acts of an utterance are correctly predicted
|
| 88 |
+
- F1: F1 measure of the dialogue act predictions over the corpus.
|
| 89 |
+
|
| 90 |
+
## References
|
| 91 |
+
```
|
| 92 |
+
@inproceedings{lee2019convlab,
|
| 93 |
+
title={ConvLab: Multi-Domain End-to-End Dialog System Platform},
|
| 94 |
+
author={Lee, Sungjin and Zhu, Qi and Takanobu, Ryuichi and Li, Xiang and Zhang, Yaoqin and Zhang, Zheng and Li, Jinchao and Peng, Baolin and Li, Xiujun and Huang, Minlie and Gao, Jianfeng},
|
| 95 |
+
booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
|
| 96 |
+
year={2019}
|
| 97 |
+
}
|
| 98 |
+
```
|