Skip to content

Commit

Permalink
Added dataset information in clinic oos dataset card (#4751)
Browse files Browse the repository at this point in the history
* Added Dataset information in Clinic oos card

* Added Field and Instance Information

* Added Label List in Data Fields

* Updated Table Caption

* Update datasets/clinc_oos/README.md

* Update README.md

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
  • Loading branch information
arnav-ladkat and lhoestq committed Jul 28, 2022
1 parent a5e05cc commit f9713d2
Showing 1 changed file with 203 additions and 10 deletions.
213 changes: 203 additions & 10 deletions datasets/clinc_oos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ paperswithcode_id: clinc150
pretty_name: CLINC150
---

# Dataset Card for [Dataset Name]
# Dataset Card for CLINC150

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down Expand Up @@ -52,34 +52,210 @@ pretty_name: CLINC150
- **Homepage:** [Github](https://github.com/clinc/oos-eval/)
- **Repository:** [Github](https://github.com/clinc/oos-eval/)
- **Paper:** [Aclweb](https://www.aclweb.org/anthology/D19-1131)
- **Leaderboard:**
- **Leaderboard:** [PapersWithCode](https://paperswithcode.com/sota/text-classification-on-clinc-oos)
- **Point of Contact:**

### Dataset Summary

[More Information Needed]
Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope (OOS), i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. It offers a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.

### Supported Tasks and Leaderboards

[More Information Needed]
- `intent-classification`: This dataset is for evaluating the performance of intent classification systems in the presence of "out-of-scope" queries, i.e., queries that do not fall into any of the system-supported intent classes. The dataset includes both in-scope and out-of-scope data. [here](https://paperswithcode.com/sota/text-classification-on-clinc-oos).

### Languages

[More Information Needed]
English

## Dataset Structure

### Data Instances

[More Information Needed]
A sample from the training set is provided below:
```
{
'text' : 'can you walk me through setting up direct deposits to my bank of internet savings account',
'label' : 108
}
```

### Data Fields

[More Information Needed]
- text : Textual data
- label : 150 intent classes over 10 domains, the dataset contains one label for 'out-of-scope' intent.

The Label Id to Label Name map is mentioned in the table below:

| **Label Id** | **Label name** |
|--- |--- |
| 0 | restaurant_reviews |
| 1 | nutrition_info |
| 2 | account_blocked |
| 3 | oil_change_how |
| 4 | time |
| 5 | weather |
| 6 | redeem_rewards |
| 7 | interest_rate |
| 8 | gas_type |
| 9 | accept_reservations |
| 10 | smart_home |
| 11 | user_name |
| 12 | report_lost_card |
| 13 | repeat |
| 14 | whisper_mode |
| 15 | what_are_your_hobbies |
| 16 | order |
| 17 | jump_start |
| 18 | schedule_meeting |
| 19 | meeting_schedule |
| 20 | freeze_account |
| 21 | what_song |
| 22 | meaning_of_life |
| 23 | restaurant_reservation |
| 24 | traffic |
| 25 | make_call |
| 26 | text |
| 27 | bill_balance |
| 28 | improve_credit_score |
| 29 | change_language |
| 30 | no |
| 31 | measurement_conversion |
| 32 | timer |
| 33 | flip_coin |
| 34 | do_you_have_pets |
| 35 | balance |
| 36 | tell_joke |
| 37 | last_maintenance |
| 38 | exchange_rate |
| 39 | uber |
| 40 | car_rental |
| 41 | credit_limit |
| 42 | oos |
| 43 | shopping_list |
| 44 | expiration_date |
| 45 | routing |
| 46 | meal_suggestion |
| 47 | tire_change |
| 48 | todo_list |
| 49 | card_declined |
| 50 | rewards_balance |
| 51 | change_accent |
| 52 | vaccines |
| 53 | reminder_update |
| 54 | food_last |
| 55 | change_ai_name |
| 56 | bill_due |
| 57 | who_do_you_work_for |
| 58 | share_location |
| 59 | international_visa |
| 60 | calendar |
| 61 | translate |
| 62 | carry_on |
| 63 | book_flight |
| 64 | insurance_change |
| 65 | todo_list_update |
| 66 | timezone |
| 67 | cancel_reservation |
| 68 | transactions |
| 69 | credit_score |
| 70 | report_fraud |
| 71 | spending_history |
| 72 | directions |
| 73 | spelling |
| 74 | insurance |
| 75 | what_is_your_name |
| 76 | reminder |
| 77 | where_are_you_from |
| 78 | distance |
| 79 | payday |
| 80 | flight_status |
| 81 | find_phone |
| 82 | greeting |
| 83 | alarm |
| 84 | order_status |
| 85 | confirm_reservation |
| 86 | cook_time |
| 87 | damaged_card |
| 88 | reset_settings |
| 89 | pin_change |
| 90 | replacement_card_duration |
| 91 | new_card |
| 92 | roll_dice |
| 93 | income |
| 94 | taxes |
| 95 | date |
| 96 | who_made_you |
| 97 | pto_request |
| 98 | tire_pressure |
| 99 | how_old_are_you |
| 100 | rollover_401k |
| 101 | pto_request_status |
| 102 | how_busy |
| 103 | application_status |
| 104 | recipe |
| 105 | calendar_update |
| 106 | play_music |
| 107 | yes |
| 108 | direct_deposit |
| 109 | credit_limit_change |
| 110 | gas |
| 111 | pay_bill |
| 112 | ingredients_list |
| 113 | lost_luggage |
| 114 | goodbye |
| 115 | what_can_i_ask_you |
| 116 | book_hotel |
| 117 | are_you_a_bot |
| 118 | next_song |
| 119 | change_speed |
| 120 | plug_type |
| 121 | maybe |
| 122 | w2 |
| 123 | oil_change_when |
| 124 | thank_you |
| 125 | shopping_list_update |
| 126 | pto_balance |
| 127 | order_checks |
| 128 | travel_alert |
| 129 | fun_fact |
| 130 | sync_device |
| 131 | schedule_maintenance |
| 132 | apr |
| 133 | transfer |
| 134 | ingredient_substitution |
| 135 | calories |
| 136 | current_location |
| 137 | international_fees |
| 138 | calculator |
| 139 | definition |
| 140 | next_holiday |
| 141 | update_playlist |
| 142 | mpg |
| 143 | min_payment |
| 144 | change_user_name |
| 145 | restaurant_suggestion |
| 146 | travel_notification |
| 147 | cancel |
| 148 | pto_used |
| 149 | travel_suggestion |
| 150 | change_volume |

### Data Splits

[More Information Needed]
The dataset comes in different subsets:

- `small` : Small, in which there are only 50 training queries per each in-scope intent
- `imbalanced` : Imbalanced, in which intents have either 25, 50, 75, or 100 training queries.
- `plus`: OOS+, in which there are 250 out-of-scope training examples, rather than 100.


| name |train|validation|test|
|----------|----:|---------:|---:|
|small|7600| 3100| 5500 |
|imbalanced|10625| 3100| 5500|
|plus|15250| 3100| 5500|



## Dataset Creation

Expand Down Expand Up @@ -136,8 +312,25 @@ pretty_name: CLINC150
[More Information Needed]

### Citation Information

[More Information Needed]
```
@inproceedings{larson-etal-2019-evaluation,
title = "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction",
author = "Larson, Stefan and
Mahendran, Anish and
Peper, Joseph J. and
Clarke, Christopher and
Lee, Andrew and
Hill, Parker and
Kummerfeld, Jonathan K. and
Leach, Kevin and
Laurenzano, Michael A. and
Tang, Lingjia and
Mars, Jason",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
year = "2019",
url = "https://www.aclweb.org/anthology/D19-1131"
}
```
### Contributions

Thanks to [@sumanthd17](https://github.com/sumanthd17) for adding this dataset.

1 comment on commit f9713d2

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007079 / 0.011353 (-0.004274) 0.003481 / 0.011008 (-0.007527) 0.025981 / 0.038508 (-0.012527) 0.031103 / 0.023109 (0.007993) 0.254417 / 0.275898 (-0.021481) 0.282627 / 0.323480 (-0.040853) 0.005377 / 0.007986 (-0.002609) 0.003204 / 0.004328 (-0.001124) 0.006193 / 0.004250 (0.001942) 0.039441 / 0.037052 (0.002388) 0.267889 / 0.258489 (0.009400) 0.341669 / 0.293841 (0.047829) 0.028096 / 0.128546 (-0.100451) 0.008553 / 0.075646 (-0.067093) 0.225820 / 0.419271 (-0.193451) 0.046121 / 0.043533 (0.002588) 0.254204 / 0.255139 (-0.000935) 0.280524 / 0.283200 (-0.002676) 0.092854 / 0.141683 (-0.048828) 1.301856 / 1.452155 (-0.150298) 1.316046 / 1.492716 (-0.176670)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.203151 / 0.018006 (0.185144) 0.438669 / 0.000490 (0.438179) 0.003931 / 0.000200 (0.003731) 0.000090 / 0.000054 (0.000036)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.021521 / 0.037411 (-0.015890) 0.103441 / 0.014526 (0.088915) 0.116173 / 0.176557 (-0.060383) 0.161732 / 0.737135 (-0.575403) 0.119049 / 0.296338 (-0.177289)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.345406 / 0.215209 (0.130197) 3.423301 / 2.077655 (1.345646) 1.523012 / 1.504120 (0.018893) 1.367770 / 1.541195 (-0.173425) 1.424278 / 1.468490 (-0.044212) 0.369232 / 4.584777 (-4.215545) 3.659528 / 3.745712 (-0.086184) 3.662915 / 5.269862 (-1.606947) 1.703164 / 4.565676 (-2.862512) 0.044423 / 0.424275 (-0.379852) 0.009763 / 0.007607 (0.002156) 0.444572 / 0.226044 (0.218527) 4.441955 / 2.268929 (2.173027) 1.969356 / 55.444624 (-53.475268) 1.669566 / 6.876477 (-5.206911) 1.770349 / 2.142072 (-0.371724) 0.471907 / 4.805227 (-4.333321) 0.103199 / 6.500664 (-6.397465) 0.053595 / 0.075469 (-0.021875)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.255611 / 1.841788 (-0.586177) 12.249878 / 8.074308 (4.175570) 21.327202 / 10.191392 (11.135810) 0.766209 / 0.680424 (0.085785) 0.459470 / 0.534201 (-0.074731) 0.335021 / 0.579283 (-0.244263) 0.384062 / 0.434364 (-0.050302) 0.230051 / 0.540337 (-0.310287) 0.233563 / 1.386936 (-1.153373)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.005902 / 0.011353 (-0.005451) 0.003877 / 0.011008 (-0.007131) 0.027701 / 0.038508 (-0.010808) 0.033050 / 0.023109 (0.009941) 0.349699 / 0.275898 (0.073800) 0.395346 / 0.323480 (0.071867) 0.003769 / 0.007986 (-0.004217) 0.003403 / 0.004328 (-0.000925) 0.004831 / 0.004250 (0.000580) 0.045996 / 0.037052 (0.008943) 0.352041 / 0.258489 (0.093552) 0.382231 / 0.293841 (0.088390) 0.029525 / 0.128546 (-0.099021) 0.009727 / 0.075646 (-0.065920) 0.256374 / 0.419271 (-0.162897) 0.066076 / 0.043533 (0.022543) 0.337850 / 0.255139 (0.082711) 0.353552 / 0.283200 (0.070352) 0.110031 / 0.141683 (-0.031652) 1.469796 / 1.452155 (0.017642) 1.520691 / 1.492716 (0.027975)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.211121 / 0.018006 (0.193114) 0.424274 / 0.000490 (0.423784) 0.002438 / 0.000200 (0.002238) 0.000077 / 0.000054 (0.000022)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.026224 / 0.037411 (-0.011188) 0.102798 / 0.014526 (0.088272) 0.117694 / 0.176557 (-0.058862) 0.165746 / 0.737135 (-0.571389) 0.118521 / 0.296338 (-0.177818)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.368403 / 0.215209 (0.153193) 3.664989 / 2.077655 (1.587335) 1.753298 / 1.504120 (0.249179) 1.586382 / 1.541195 (0.045188) 1.627442 / 1.468490 (0.158952) 0.384275 / 4.584777 (-4.200502) 3.734263 / 3.745712 (-0.011449) 3.458968 / 5.269862 (-1.810894) 1.725303 / 4.565676 (-2.840374) 0.045992 / 0.424275 (-0.378283) 0.009792 / 0.007607 (0.002185) 0.458320 / 0.226044 (0.232276) 4.564440 / 2.268929 (2.295512) 2.161659 / 55.444624 (-53.282965) 1.909303 / 6.876477 (-4.967174) 1.986064 / 2.142072 (-0.156008) 0.478200 / 4.805227 (-4.327027) 0.108194 / 6.500664 (-6.392470) 0.055708 / 0.075469 (-0.019761)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.462467 / 1.841788 (-0.379320) 12.332602 / 8.074308 (4.258294) 21.883418 / 10.191392 (11.692026) 0.770698 / 0.680424 (0.090274) 0.545194 / 0.534201 (0.010993) 0.388052 / 0.579283 (-0.191231) 0.385532 / 0.434364 (-0.048832) 0.234752 / 0.540337 (-0.305586) 0.237340 / 1.386936 (-1.149596)

CML watermark

Please sign in to comment.