Skip to content

ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.

License

Notifications You must be signed in to change notification settings

iabufarha/ArSarcasm-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

ArSarcasm-v2 Dataset

ArSarcasm-v2 is an extension of the original ArSarcasm dataset published along with the paper From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. ArSarcasm-v2 conisists of ArSarcasm along with portions of DAICT corpus and some new tweets. Each tweet was annotated for sarcasm, sentiment and dialect. The final dataset consists of 15,548 tweets divided into 12,548 training tweets and 3,000 testing tweets. ArSarcasm-v2 was used and released as a part of the shared task on sarcasm detection and sentiment analysis in Arabic. You can find more details in the Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic

Dataset details:

ArSarcasm-v2 is provided in a CSV format, we provide the same split that was used for the shared task. The training set contains 12,548 tweets, while the test set contains 3,000 tweets.

The dataset contains the following fields:

  • tweet: the original tweet text.
  • sarcasm: boolean that indicates whether a tweet is sarcastic or not.
  • sentiment: the sentiment of the tweet (positive, negative, neutral).
  • dialect: the dialect used in the tweet, we used the 5 main regions in the Arab world, follows the labels and their meanings:
    • msa: modern standard Arabic.
    • egypt: the dialect of Egypt and Sudan.
    • levant: the Levantine dialect including Palestine, Jordan, Syria and Lebanon.
    • gulf: the Gulf countries including Saudi Arabia, UAE, Qatar, Bahrain, Yemen, Oman, Iraq and Kuwait.
    • magreb: the North African Arab countries including Algeria, Libya, Tunisia and Morocco.

Citation

Please use the following citation if you use ArSarcasm-v2:

@inproceedings{abufarha-etal-2021-arsarcasm-v2,
title = "Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic",
    author = "Abu Farha, Ibrahim  and
    Zaghouani, Wajdi  and
    Magdy, Walid",
    booktitle = "Proceedings of the Sixth Arabic Natural Language Processing Workshop",
    month = april,
    year = "2021",
    }

Other resources

If you are interested in other Arabic NLP resources check:

About

ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published