PhD Dissertation Project: User-Controlled Content Translation on Social Media
Project description: This is my self-developed Ph.D. dissertation project which aims to increase control and transparency over the translation of social media posts to increase users' trust and engagement. My research advisor is Dr. Bart Knijnenburg, an associate professor in the Department of Human-Centered Computing at Clemson University. My dissertation focuses on empowering bi/multilingual post-authors with greater control over the automated machine translation (MT) of their social media posts, demonstrating my ability to design and execute end-to-end research projects. Using a mixed-methods approach—including surveys, user experiments, usability testing, and semi-structured interviews—I generate actionable sociotechnical insights through qualitative and quantitative data analysis using R, with findings presented at leading conferences such as ACL, IUI, and CSCW. My research argues that granting post-authors an intermediate level of control over MT enhances their sense of security and allows for more authentic self-expression online, rather than compelling them to modify their communication to prevent misinterpretation caused by the inaccuracies of automated MT.
Overall findings from all studies:
The authors of social media posts want to have more control over the translation of their posts.
They want to have an affordance to preview the translation and decide for each post, whether they want to share it with their audiences before publishing their post.
Compared to fully automated and fully manual translation, post-authors prefer an intermediate level of translation control provided through a combination of automated translation with an affordance to edit it if the translation fails to convey the intended meaning and sentiment of the original post.
Research methods used: Survey, Usability Testing, Semi-structured Interview
Data analysis methods used:
Quantitative: Chi-square test, t-test, ANOVA, MIMIC effect, Structural equation modeling (SEM), Confirmatory factor analysis (CFA), Correlation, Regression analysis
Qualitative: Thematic analysis
Tools used: R Studio, Mplus, Adobe XD, Figma, Qualtrics, Zoom speech-to-text transcription
Dissemination of Mis-, Dis-, Mal-information in Politics and Extremism
Project description: Under the guidance of Dr. Bart Knijnenburg, Dr. Amira Jadoon, Dr. Ayse Lokmanoglu, and Dr. Arie Perliger, I am working as a Research Assistant (RA) on a National Institute of Justice-funded project in collaboration with UMass Lowell and Clemson University’s departments of Human-Centered Computing, and Political Science, and Communication.
This project utilizes an original events-level dataset that documents real-time characteristics of disinformation-triggering political events and aims to identify how various characteristics of political events impact the nature, scope, and characteristics of extremism-related misinformation, disinformation, and malinformation (MDM).
Using social media data scraping methods, we have been collecting real-time data and designing a dashboard to visualize the real-time dissemination of MDM in politics.
We utilize computational linguistic tools for content and sentiment analysis to examine the association between the disseminated disinformation's characteristics and the dissemination pathways' structure and characteristics.
Research methods used: Social media data scraping, Qualitative coding
Data analysis methods used: Qualitative coding evaluation of news articles, content and sentiment analysis, topic analysis
Tools used: R Studio, Shiny, Linguistic Inquiry and Word Count (LIWC)
Increasing the Digital Presence of Nepali Language through Wiktionary and Dictionary Development using HPCC Systems and NLP++
Project description: I proposed and led my summer internship project to enhance the presence of the Nepali language in the digital world by developing a standard Word Entry Template for Nepali Wiktionary that provides comprehensive linguistic information. To support this initiative, I created an open-source website, hpccnepalidict, aimed at recruiting participants for my Nepali NLP Initiative to add words to Nepali Wiktionary following my template.
I also developed the NeWiktionary Analyzer using Visual Text and NLP++ to parse Nepali wiki texts, which was used to build a Nepali dictionary through High-Performance Computing Cluster (HPCC) Systems and the NLP++ engine. Throughout the project, I worked under the mentorship of David de Hilster, the co-founder and developer of NLP++.
I won the Best Poster award in Data Analytics when I presented this project work at the 2023 HPCC Systems Community Summit.
Research area: Data analytics and natural language processing
Programming language used: NLP++
Tools used: Visual Text, HPCC Systems
Please access my Website to get familiar with the project works!
Development of Design Solutions for an Inclusive Autonomous Vehicle
Project description: For this course project, I conducted focus group interviews and transcribed the audio to understand the design requirements and needs of elderly individuals, people with low vision, and physically impaired individuals, under the advisement of Dr. Julian Brinkley.
Using Adobe XD, I designed and developed prototypes for three products: a ride scheduling app, an inclusive and accessible Human Machine Interface, and the internal design of an autonomous vehicle.
Research methods used: Focus group, Affinity diagramming (Card sorting), Persona, Storyboard, Usability testing
Data analysis methods used:
Qualitative: Thematic analysis, Card sorting
Investigating Design Solutions to Overcome Harms in Online Cross-cultural Communication
Project Description: I am working as a Research Assistant (RA) on this collaborative project with Birmingham Young University, under the guidance of Dr. Xinru Page, Dr. Bart Knijnenburg, and Dr. Nancy Fulda. The project aims to identify conflicts that arise in cross-cultural online communication and develop more inclusive system design recommendations.
My responsibilities include conducting an exploratory study, supervising undergraduate RAs, assisting with research design ideas, helping with data analysis of interview study, and paper write-up.
Research method used: Interview, Prompt-engineering
Data analysis method used: Thematic analysis
Tools used: Otter.ai, MS-Excel, ChatGPT (web and Playground)
Publication: paper 1
Detection and Evaluation of Mis- and Disinformation in Social Media Posts and their Translation: Crowdsourcing Workers vs. GPT-3.5
Project description: My course project investigates how social media post translations with double or controversial meanings can be misconstrued by readers. Under the guidance of Dr. Carlos Toxtli-Hernández, I collected a dataset of social media posts and their translation (for Spanish, Chinese, Nepali, and Hindi) from open-source databases (specifically, Hugging Face) and conducted a user study to detect and categorize misconstrued information or misinformation. The study compares the accuracy of human experts and large language models (LLMs) in detecting biases, toxicity, and misinformation in the text.
Overall findings:
Translated social media posts are at risk of spreading misinformation due to mistranslation, with annotation/classification differences noted between crowdsourcing workers and GPT-3.5, especially for Chinese, Nepali, and Spanish but not for Hindi.
The causal impact of spreading misinformation is highly due to mistranslation, despite minimal differences in original post annotation/classification between the two user groups.
Research methods used: Between-subject experiment, Prompt engineering
Data analysis methods used:
Quantitative: ANOVA, pairwise comparison test (t-test)
Investigating Factors Enhancing Retention and Graduation rate of Black Students in Computer Science
Project Description: This project focuses on a comparative study between Black students at Clemson University and HBCUs, including Morehouse, Claflin University, and Howard University. My research advisors are: Dr. Bart Knijnenburg, Dr. Cazembe Kennedy, Dr. Eileen Kramer, Dr. Kinnis Gosha, and Dr. Gloria Washington.
For this side project, I have been working as a Ph.D. research assistant with a goal to identify and develop subjective factors and measurement scales that significantly impact the retention and dropout rates of Black students in Computer Science programs.
My role involves contributing to research design, conducting interviews with study participants (both students and professors), assisting with data analysis, and supporting the paper write-up process.
Research methods used: Survey, User-experiment, Semi-structured Interview, Content questions (Programming assessments)
Data analysis methods used:
Quantitative: t-test, regression, ANOVA, CFA, EFA, SEM
Qualitative: Thematic analysis