The top 100 emojis from all categories were retrieved from the report published by the Unicode Consortium [1]. These emojis are the most frequently used 100 emojis of 2021. The top 100 emojis are given consecutively as follows:

A database of Turkish tweets was constructed to predict the most frequently used 100 emojis. It is named Bitirim’s Turkish Tweets Database (B-TTDb). B-TTDb was created for academic and industrial studies based on the prediction of the top 100 emojis in Turkish. This database is composed of four datasets which are Raw Tweets Dataset (RTD), Organized Tweets Dataset (OTD), Pre-Processed Tweets Dataset (PPTD), and Bitirim’s Dataset (B-D). OTD was formed from RTD, PPTD was formed from OTD, and B-D was formed from PPTD. The fourth and final version is B-D. B-D includes a total of 158,201 unique tweets belonging to the top 100 emoji classes.
B-TTDb and the top 100 emojis can be downloaded from the following links.
In addition, a suffix list, lexicon, and stop words list were created in the database creation process. They can be downloaded from the following links.
-
Click to download the suffix list. (1,337,898 word forms and their stems in Zargan lexical database for Turkish [2] were considered and a pre-list of suffixes was created. Afterward, the suffixes that also have a meaning other than being suffixes (e.g., “deniz” (sea)) were filtered. At last, a suffix list with 1,164 suffixes was obtained.)
-
Click to download the lexicon. (1,337,898 word forms and their stems in Zargan lexical database for Turkish [2] were considered and a lexicon was created with 1,213,847 word forms and their stems (lemmas).)
-
Click to download the stop words list. (The stop words listed on the three Web resources [3, 4, 5] were considered and a pre-list of stop words was created. “birşey” and “herşey” have type mistakes and should be written as “bir şey” (one thing) and “her şey” (everything), respectively. Social media texts are focused, and these mistakes are normal in the texts. Therefore, although “bir” (one), “her” (every), and “şey” (thing) were already in the stop words list, “birşey” was also kept and “herşey” was also added to the list. Additionally, some words (e.g., “birinci” (first) and “ikinci” (second)) and some of the suffixes filtered in the suffix list creation phase (i.e., “neydi” (what was it), “neymiş” (what is it), and “dün” (tomorrow)) were added to the list. Finally, a list with 581 stop words was obtained.)
*************************************************************
For detailed information, you may read the following article.
Yıltan Bitirim. 2024. B-TTDb: A Database of Turkish Tweets for Predicting the Top One Hundred Emojis. ACM Transactions on the Web. Just Accepted. DOI: https://doi.org/10.1145/3681783
*************************************************************
If you want to use a part or the whole database, suffix list, lexicon, and/or stop words list, you are free to use. However, please consider the following.
- Copyright belongs to the author.
- Do not redistribute a part or the whole database, suffix list, lexicon, and/or stop words list.
- The database, suffix list, lexicon, and stop words list come without any warranty. The author is not responsible for any damage caused.
-
All studies that include a part or the whole database, suffix list, lexicon, and/or stop words list should cite the following article:
Yıltan Bitirim. 2024. B-TTDb: A Database of Turkish Tweets for Predicting the Top One Hundred Emojis. ACM Transactions on the Web. Just Accepted. DOI: https://doi.org/10.1145/3681783
*************************************************************
[1] Jennifer Daniel. The most frequently used emoji of 2021. The Unicode Consortium. Accessed November 5, 2022 from https://home.unicode.org/emoji/emoji-frequency
[2] Haşim Sak, Tunga Güngör, and Murat Saraçlar. 2008. Turkish language resources: Morphological parser, morphological disambiguator and Web corpus. In Advances in Natural Language Processing. GoTAL 2008. Bengt Nordström and Aarne Ranta (Eds.) (Lecture Notes in Computer Science, Vol. 5521). Springer, Berlin, 417–427. DOI: https://doi.org/10.1007/978-3-540-85287-2_40
[3] Gene Diaz. stopwords-tr. Accessed April 23, 2023 from https://github.com/stopwords-iso/stopwords-tr/blob/master/stopwords-tr.txt
[4] Ahmet Aksoy. trstop. Accessed April 23, 2023 from https://github.com/ahmetax/trstop/blob/master/dosyalar/turkce-stop-words
[5] Genel konular - Türkçe etkisiz kelimeler (stop words) listesi 1.1. Accessed April 23, 2023 from https://www.turkceogretimi.com/genel-konular/turkce-etkisiz-kelimeler-stop-words-listesi-11