Skip to main content
URMC / Clinical & Translational Science Institute / Stories / February 2024 / Students Research Tobacco User Behaviors Using Social Media Data and AI Technologies Through Data Sc

Students Research Tobacco User Behaviors Using Social Media Data and AI Technologies Through Data Science Collaboration

For data scientists working in human health fields, there is great potential in using social media as a data source, particularly when it comes to individual and group perceptions. Student researchers are being afforded that opportunity through a collaborative program between the UR CTSI and the Goergen Institute for Data Science (GIDS).

Jinxi He, a junior in the computer science program, learned about a student-led effort to discern public perception of synthetic nicotine products on Twitter/X at a University research fair. His interest piqued, he reached out to Dongmei Li, PhD to learn more about and ultimately join the project. Li is faculty appointed to UR CTSI and affiliated with the Goergen Institute of Data Science. She worked with Zidian Xie, PhD, research data engineer II for the UR CTSI Informatics branch, to recruit students to analyze social media data to better understand public perception and use of different tobacco products as well as other substances using natural language processing (NLP) techniques, cutting-edge deep-learning, and large language models (LLMs).

“We analyzed a large number of tweets via NLP techniques, revealing a higher proportion of negative attitudes toward synthetic nicotine products compared to positive ones,” He said of his work with fellow student-researcher Jiarui Chen. “Key concerns in negative tweets included addiction and health risks and synthetic nicotine as a policy loophole. Positive tweets often highlighted it as an alternative to tobacco-derived nicotine and its potentially reduced health risks.”

He branched out to work on a related project, analyzing Reddit posts from users who described their smoking cessation experiences. His initial results, utilizing BERTopic—a deep-learning model—and Latent Dirichlet allocation modeling, found commonalities among user recommendations including reading self-help books, using exercise and music as distractions, setting a quit date, and substituting smoking behavior for drinking coffee. These strategies could be incorporated into smoking cessation programs and campaigns.

“Engaging deeply in analyzing public perceptions and exploring scientific methods to aid smoking cessation has broadened my understanding of both the technical aspects of NLP techniques and the complex social implications of public health issues,” He said. “Collaborating with a diverse and talented team, mentors like Drs. Dongmei Li and Zidian Xie, and my peers has enriched my experience, offering different perspectives and expertise. This collaborative environment not only enhanced my problem-solving and analytical skills but also fostered a sense of community and shared purpose.”

His peer student-researchers included now-alumni Jiarui Chen and Qihang Tang and current students Pinxin Liu, Jonathon Zou, Xinyi Liu, and Shiyou Li. They partnered with Ajay Anand, PhD, associate professor and deputy director of GIDS, to bring together cross-institutional expertise and resources in data science and AI for the benefit of research and student learning outcomes.

“We have had multiple capstone projects where Dongmei Li and Zidian Xie’s team has participated as the sponsor-partner providing valuable data and a problem statement that students can work on over a semester to produce deep insights, leading to research publications in many instances,” Anand said.

The collaborative efforts started small, with Li and Xie recruiting a couple of data science students to work as research assistants. This quickly scaled with their engagement with the capstone-practicum project initiative, leading to four students contributing to a given project each semester.

“The students used data and computer science—including deep learning models—to track data from social media platforms, with a focus on public perception and discussion of tobacco products on social media,” Li said. Her team now includes 13 undergraduate and graduate students majoring in data science and computer science.

Students like He and his partners benefited from working on novel, real-world datasets and problem statements identified by experts, while learning first-hand the benefits of data science research.

“He’s project is a great example, showcasing the use of novel data science and AI solutions applied to problems in public health and related areas,” Anand said. “The close collaboration between GIDS on River Campus and the UR CTSI helped achieve this win-win.”

Undergraduate students from Li’s research team have won the University Undergraduate Summer Research Funding Award, including Li Sun in 2020, Bokai Zhang and Siyu Xue in 2021, and Jinxi He in 2023. Jiarui Chen won the 2023 National Student Employee of the Year Award.

“It's through these collective efforts that I've grown not just as a researcher but also as a team player, gaining both technical proficiency and interpersonal skills that are crucial in any academic or professional setting,” He said.

Michael Hazard | 2/5/2024

You may also like