This study investigates the correlation between user demographics and text tone preferences for social media content aimed at addressing food insecurity among Hispanic households in the United States. Using k-means and hierarchical clustering machine learning models, we analyze preliminary data collected from a research group comprising mentors and students who tested the survey framework. By developing various text tone connotations, we aim to uncover patterns in post preferences among the Hispanic community. Our findings will assist food pantries in crafting more engaging and effective social media content.
Food Insecurity, Text tone preferences, K-Means, Hierarchical Clustering, Unsupervised Learning
age | gender | ethnicity | race | education | marital_status | income | employment | language | disability | states | sample_1 | sample_2 | sample_3 | sample_4 | sample_5 | sample_6 | sample_7 | sample_8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45-54 | female | non hispanic | native american | High School | na | $25,000 - $49,999 | Employed Part time | both | i do not have a disability | indiana | Persuasive | Simpler | Empathetic | Persuasive | Original | Original | Persuasive | Original |
18-24 | male | hispanic | white | High School | single | Less than $25,000 | Employed Part time | english | i do not have a disability | illinois | Original | Simpler | Empathetic | Simpler | Simpler | Original | Original | Persuasive |
25-34 | female | non hispanic | multiracial | Associate | single | Less than $25,000 | Student | english | i do not have a disability | new York | Original | Original | Simpler | Simpler | Empathetic | Empathetic | Empathetic | Simpler |
Figure 1.1 Initial dataset
To simplify data analysis and model training, the melt function was utilized to combine individual post choices from multiple columns ('sample_1' to 'sample_8') into a single 'choice' column. This effectively reduced the dataset's dimensionality while preserving crucial demographic information such as age, gender, ethnicity, education, income, employment status, and disability.Each row in the dataset represents an individual submission, with the 'choice' column indicating the preferred post option. This implementation facilitates the examination of individual user preferences while considering demographic characteristics.
age | gender | ethnicity | education | income | employment | disability | choice |
---|---|---|---|---|---|---|---|
45-54 | female | non hispanic | High School | $25,000 - $49,999 | Employed Part time | i do not have a disability | Persuasive |
18-24 | male | hispanic | High School | Less than $25,000 | Employed Part time | i do not have a disability | Original |
25-34 | female | non hispanic | Associate | Less than $25,000 | Student | i do not have a disability | Original |
Age Category | Encoded Value (Age) | Income Category | Encoded Value (Income) | Disability Category | Encoded Value (Disability) | Ethnicity Category | Encoded Value (Ethnicity) |
---|---|---|---|---|---|---|---|
18-24 | 0 | Less than 25000 | 0 | No | 0 | Hispanic | 1 |
25-34 | 1 | 25000 - 49999 | 1 | Yes | 1 | Non-Hispanic | 0 |
35-44 | 2 | 50000 - 74999 | 2 | Prefer not to say | -1 | Prefer not to say | -1 |
45-54 | 3 | 75000 - 99999 | 3 | - | - | - | - |
55-64 | 4 | 100000 - 149999 | 4 | - | - | - | - |
65 and above | 5 | 150000 or more | 5 | - | - | - | - |
Prefer not to say | -1 | Prefer not to say | -1 | - | - | - | - |
age | ethnicity | income | disability | gender_female | gender_male | gender_non binary | education_Associate | education_Bachelor | education_Doctorate | ... | employment_Employed Full time | employment_Employed Part time | employment_Retired | employment_Self employed | employment_Student | employment_Unemployed | choice_Empathetic | choice_Original | choice_Persuasive | choice_Simplier |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 0 | 1 | 0 | True | False | False | False | False | False | ... | False | True | False | False | False | False | False | True | False | False |
0 | 1 | 0 | 0 | False | True | False | False | False | False | ... | False | True | False | False | False | False | False | True | False | False |
1 | 0 | 0 | 0 | True | False | False | True | False | False | ... | False | False | False | False | True | False | False | True | False | False |
age | income | gender_female | gender_male | gender_non binary | ethnicity_hispanic | ethnicity_non hispanic | education_Associate | education_Bachelor | education_Doctorate | ... | employment_Retired | employment_Self employed | employment_Student | employment_Unemployed | disability_i do not have a disability | disability_undisclosed | choice_Empathetic | choice_Original | choice_Persuasive | choice_Simplier |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | True | False | False | False | True | False | False | False | ... | False | False | False | False | True | False | False | False | True | False |
0 | 0 | False | True | False | True | False | False | False | False | ... | False | False | False | False | True | False | False | True | False | False |