zhuqi commited on
Commit
b6500b0
·
1 Parent(s): fe2f639

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MILU
2
+ MILU is a joint neural model that allows you to simultaneously predict multiple dialog act items (a dialog act item takes a form of domain-intent(slot, value). Since it is common that, in a multi-domain setting, an utterance has multiple dialog act items, MILU is likely to yield higher performance than conventional single-intent models.
3
+
4
+
5
+ ## Example usage
6
+ We based our implementation on the [AllenNLP library](https://github.com/allenai/allennlp). For an introduction to this library, you should check [these tutorials](https://allennlp.org/tutorials).
7
+
8
+ To use this model, you need to additionally install `overrides==4.1.2, allennlp==0.9.0` and use `python>=3.6,<=3.8`.
9
+
10
+ ### On MultiWOZ dataset
11
+
12
+ ```bash
13
+ $ python train.py multiwoz/configs/[base|context3].jsonnet -s serialization_dir
14
+ $ python evaluate.py serialization_dir/model.tar.gz {test_file} --cuda-device {CUDA_DEVICE}
15
+ ```
16
+
17
+ If you want to perform end-to-end evaluation, you can include the trained model by adding the model path (serialization_dir/model.tar.gz) to your ConvLab spec file.
18
+
19
+ #### Data
20
+ We use the multiwoz data (data/multiwoz/[train|val|test].json.zip).
21
+
22
+ ### MILU on datasets in unified format
23
+ We support training MILU on datasets that are in our unified format.
24
+
25
+ - For **non-categorical** dialogue acts whose values are in the utterances, we use **slot tagging** to extract the values.
26
+ - For **categorical** and **binary** dialogue acts whose values may not be presented in the utterances, we treat them as **intents** of the utterances.
27
+
28
+ Takes MultiWOZ 2.1 (unified format) as an example,
29
+ ```bash
30
+ $ python train.py unified_datasets/configs/multiwoz21_user_context3.jsonnet -s serialization_dir
31
+ $ python evaluate.py serialization_dir/model.tar.gz test --cuda-device {CUDA_DEVICE} --output_file output/multiwoz21_user/output.json
32
+
33
+ # to generate output/multiwoz21_user/predictions.json that merges test data and model predictions.
34
+ $ python unified_datasets/merge_predict_res.py -d multiwoz21 -s user -p output/multiwoz21_user/output.json
35
+ ```
36
+ Note that the config file is different from the above. You should set:
37
+ - `"use_unified_datasets": true` in `dataset_reader` and `model`
38
+ - `"dataset_name": "multiwoz21"` in `dataset_reader`
39
+ - `"train_data_path": "train"`
40
+ - `"validation_data_path": "validation"`
41
+ - `"test_data_path": "test"`
42
+
43
+ ## Predict
44
+ See `nlu.py` under `multiwoz` and `unified_datasets` directories.
45
+
46
+ ## Performance on unified format datasets
47
+
48
+ To illustrate that it is easy to use the model for any dataset that in our unified format, we report the performance on several datasets in our unified format. We follow `README.md` and config files in `unified_datasets/` to generate `predictions.json`, then evaluate it using `../evaluate_unified_datasets.py`. Note that we use almost the same hyper-parameters for different datasets, which may not be optimal.
49
+
50
+ <table>
51
+ <thead>
52
+ <tr>
53
+ <th></th>
54
+ <th colspan=2>MultiWOZ 2.1</th>
55
+ <th colspan=2>Taskmaster-1</th>
56
+ <th colspan=2>Taskmaster-2</th>
57
+ <th colspan=2>Taskmaster-3</th>
58
+ </tr>
59
+ </thead>
60
+ <thead>
61
+ <tr>
62
+ <th>Model</th>
63
+ <th>Acc</th><th>F1</th>
64
+ <th>Acc</th><th>F1</th>
65
+ <th>Acc</th><th>F1</th>
66
+ <th>Acc</th><th>F1</th>
67
+ </tr>
68
+ </thead>
69
+ <tbody>
70
+ <tr>
71
+ <td>MILU</td>
72
+ <td>72.9</td><td>85.2</td>
73
+ <td>72.9</td><td>49.2</td>
74
+ <td>79.1</td><td>68.7</td>
75
+ <td>85.4</td><td>80.3</td>
76
+ </tr>
77
+ <tr>
78
+ <td>MILU (context=3)</td>
79
+ <td>76.6</td><td>87.9</td>
80
+ <td>72.4</td><td>48.5</td>
81
+ <td>78.9</td><td>68.4</td>
82
+ <td>85.1</td><td>80.1</td>
83
+ </tr>
84
+ </tbody>
85
+ </table>
86
+
87
+ - Acc: whether all dialogue acts of an utterance are correctly predicted
88
+ - F1: F1 measure of the dialogue act predictions over the corpus.
89
+
90
+ ## References
91
+ ```
92
+ @inproceedings{lee2019convlab,
93
+ title={ConvLab: Multi-Domain End-to-End Dialog System Platform},
94
+ author={Lee, Sungjin and Zhu, Qi and Takanobu, Ryuichi and Li, Xiang and Zhang, Yaoqin and Zhang, Zheng and Li, Jinchao and Peng, Baolin and Li, Xiujun and Huang, Minlie and Gao, Jianfeng},
95
+ booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
96
+ year={2019}
97
+ }
98
+ ```