Skip to content

This repository contains the code and analysis for a research project aimed at enhancing social media impact for food safety security organizations. The project focuses on understanding text tone preferences based on user demographics using machine learning techniques.

Notifications You must be signed in to change notification settings

esoto15/text-tone-preference-analysis

Repository files navigation

Enhancing Social Media Impact: Leveraging User Demographics for Text Tone Preference Analysis

Abstract

This study investigates the correlation between user demographics and text tone preferences for social media content aimed at addressing food insecurity among Hispanic households in the United States. Using k-means and hierarchical clustering machine learning models, we analyze preliminary data collected from a research group comprising mentors and students who tested the survey framework. By developing various text tone connotations, we aim to uncover patterns in post preferences among the Hispanic community. Our findings will assist food pantries in crafting more engaging and effective social media content.

Dependencies

Keywords

Food Insecurity, Text tone preferences, K-Means, Hierarchical Clustering, Unsupervised Learning

Data Preprocessing Pipeline

age gender ethnicity race education marital_status income employment language disability states sample_1 sample_2 sample_3 sample_4 sample_5 sample_6 sample_7 sample_8
45-54 female non hispanic native american High School na $25,000 - $49,999 Employed Part time both i do not have a disability indiana Persuasive Simpler Empathetic Persuasive Original Original Persuasive Original
18-24 male hispanic white High School single Less than $25,000 Employed Part time english i do not have a disability illinois Original Simpler Empathetic Simpler Simpler Original Original Persuasive
25-34 female non hispanic multiracial Associate single Less than $25,000 Student english i do not have a disability new York Original Original Simpler Simpler Empathetic Empathetic Empathetic Simpler

Figure 1.1 Initial dataset

Combining Post Preferences and Attribute Selection

To simplify data analysis and model training, the melt function was utilized to combine individual post choices from multiple columns ('sample_1' to 'sample_8') into a single 'choice' column. This effectively reduced the dataset's dimensionality while preserving crucial demographic information such as age, gender, ethnicity, education, income, employment status, and disability.Each row in the dataset represents an individual submission, with the 'choice' column indicating the preferred post option. This implementation facilitates the examination of individual user preferences while considering demographic characteristics.

age gender ethnicity education income employment disability choice
45-54 female non hispanic High School $25,000 - $49,999 Employed Part time i do not have a disability Persuasive
18-24 male hispanic High School Less than $25,000 Employed Part time i do not have a disability Original
25-34 female non hispanic Associate Less than $25,000 Student i do not have a disability Original

Encoding Categories

Dataset 1: Ethnicity numerical encoding

Age Category Encoded Value (Age) Income Category Encoded Value (Income) Disability Category Encoded Value (Disability) Ethnicity Category Encoded Value (Ethnicity)
18-24 0 Less than 25000 0 No 0 Hispanic 1
25-34 1 25000 - 49999 1 Yes 1 Non-Hispanic 0
35-44 2 50000 - 74999 2 Prefer not to say -1 Prefer not to say -1
45-54 3 75000 - 99999 3 - - - -
55-64 4 100000 - 149999 4 - - - -
65 and above 5 150000 or more 5 - - - -
Prefer not to say -1 Prefer not to say -1 - - - -
age ethnicity income disability gender_female gender_male gender_non binary education_Associate education_Bachelor education_Doctorate ... employment_Employed Full time employment_Employed Part time employment_Retired employment_Self employed employment_Student employment_Unemployed choice_Empathetic choice_Original choice_Persuasive choice_Simplier
3 0 1 0 True False False False False False ... False True False False False False False True False False
0 1 0 0 False True False False False False ... False True False False False False False True False False
1 0 0 0 True False False True False False ... False False False False True False False True False False

Dataset 2: label encoding

age income gender_female gender_male gender_non binary ethnicity_hispanic ethnicity_non hispanic education_Associate education_Bachelor education_Doctorate ... employment_Retired employment_Self employed employment_Student employment_Unemployed disability_i do not have a disability disability_undisclosed choice_Empathetic choice_Original choice_Persuasive choice_Simplier
3 1 True False False False True False False False ... False False False False True False False False True False
0 0 False True False True False False False False ... False False False False True False False True False False

PCA Implementation

About

This repository contains the code and analysis for a research project aimed at enhancing social media impact for food safety security organizations. The project focuses on understanding text tone preferences based on user demographics using machine learning techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published