Title: | Discovers Emoji from Text |
---|---|
Description: | Unicodes are not friendly to work with, and not all Unicodes are Emoji per se, making obtaining Emoji statistics a difficult task. This tool can help your experience of working with Emoji as smooth as possible, as it has the 'tidyverse' style. |
Authors: | Youzhi Yu [aut, cre] |
Maintainer: | Youzhi Yu <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2025-03-11 03:45:53 UTC |
Source: | https://github.com/pursuitofdatascience/tidyemoji |
A data set containing each Emoji category (such as Activities), its
respective Unicodes string separated by |
.
category_unicode_crosswalk
category_unicode_crosswalk
A data frame with 10 rows and 2 columns:
Emoji category (10 categories only)
The Unicodes string of Emojis belonging to category per se.
The raw data set emojis
comes from the
emoji
package, and it is processed by the author for the specific
needs of tidyEmoji
.
Users can use emoji_categorize
to see the all the categories each
Emoji Tweet has. The function preserves the input data structure, and the
only change is it adds an extra column with information about Emoji
category separated by |
if there is more than one category.
emoji_categorize(tweet_tbl, tweet_text)
emoji_categorize(tweet_tbl, tweet_text)
tweet_tbl |
A dataframe/tibble containing tweets/text. |
tweet_text |
The tweet/text column. |
A filtered dataframe with the presence of Emoji only, and with an
extra column .emoji_category
.
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_categorize(tweets)
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_categorize(tweets)
This function adds an extra list column called .emoji_unicode
to the
original data, with all Emojis included.
emoji_extract_nest(tweet_tbl, tweet_text)
emoji_extract_nest(tweet_tbl, tweet_text)
tweet_tbl |
A dataframe/tibble containing tweets/text. |
tweet_text |
The tweet/text column. |
The original dataframe/tibble with an extra column collumn called
.emoji_unicode
.
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_extract_nest(tweets)
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_extract_nest(tweets)
If users would like to know how many Emojis and what kinds of Emojis each
Tweet has, emoji_extract
is a useful function to output a global
summary with the row number of each Tweet containing Emoji and the Unicodes
associated with each Tweet.
emoji_extract_unnest(tweet_tbl, tweet_text)
emoji_extract_unnest(tweet_tbl, tweet_text)
tweet_tbl |
A dataframe/tibble containing tweets/text. |
tweet_text |
The tweet/text column. |
A summary tibble with the original row number and Emoji count.
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_extract_unnest(tweets)
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_extract_unnest(tweets)
When having a Twitter dataframe/tibble at hand, it should be nice to know how many Tweets contain Emojis. This is the right time to use this function. What is worth noting is that it does not matter whether a Tweet has one Emoji or ten Emojis, the function only counts it once and returns a tibble that summarizes the number of Tweets containing at least one Emoji and the total number of Tweets presented in the dataframe/tibble.
emoji_summary(tweet_tbl, tweet_text)
emoji_summary(tweet_tbl, tweet_text)
tweet_tbl |
A dataframe/tibble containing tweets/text. |
tweet_text |
The tweet/text column. |
A summary tibble including # of Tweets in total and # of Tweets that have at least one Emoji.
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_summary(tweets)
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_summary(tweets)
When users just want to focus on Tweets containing Emoji(s),
emoji_tweets
filters out non-Emoji rows and only returns rows that
have at least one Emoji.
emoji_tweets(tweet_tbl, tweet_text)
emoji_tweets(tweet_tbl, tweet_text)
tweet_tbl |
A dataframe/tibble containing tweets/text. |
tweet_text |
The tweet/text column. |
A dataframe/tibble containing only text with at least one Emoji
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_tweets(tweets)
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% emoji_tweets(tweets)
A data set containing each Emoji name (such as grinning, smile), its respective Unicode and category. One thing to note here is there are duplicated Unicodes in the data set, because one Unicode could have multiple Emoji names.
emoji_unicode_crosswalk
emoji_unicode_crosswalk
A data frame with 4536 rows and 3 columns:
The name of Emoji per se.
The Unicode of Emoji.
The category Emoji falls into.
The raw data sets (emoji_name
and emojis
) come from the
emoji
package, and they are processed by the author for the specific
needs of tidyEmoji
.
When working with Tweets, counting how many times each Emoji appears in the
entire Tweet corpus is useful. This is when top_n_emojis
comes into
play, and it is handy to see how Emojis are distributed across the corpus.
If a Tweet has 10 Emojis, top_n_emojis
will count it 10 times and
assign each of the 10 Emojis on its respective Emoji category. What is
interesting to note is Unicodes returned by top_n_emojis
could have
duplicates, meaning some Unicodes share various Emoji names. By default, this
does not happen, but users can choose duplicated_unicode = 'yes'
to
obtain duplicated Unicodes.
top_n_emojis(tweet_tbl, tweet_text, n = 20, duplicated_unicode = "no")
top_n_emojis(tweet_tbl, tweet_text, n = 20, duplicated_unicode = "no")
tweet_tbl |
A dataframe/tibble containing tweets/text. |
tweet_text |
The tweet/text column. |
n |
Top |
duplicated_unicode |
If no repetitious Unicode, |
A tibble with top n
Emojis
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% top_n_emojis(tweets, n = 2)
library(dplyr) data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", "R is my language! \U0001f601\U0001f606\U0001f605", "This Tweet does not have Emoji!", "Wearing a mask\U0001f637\U0001f637\U0001f637.", "Emoji does not appear in all Tweets", "A flag \U0001f600\U0001f3c1")) %>% top_n_emojis(tweets, n = 2)