About the Project
Background
Findings and Results Achieved
A Comparison of Classification Models to Detect Cyberbullying in the Peruvian Spanish Language on Twitter
- The study addressed the lack of research on cyberbullying in LAC, focusing on Peruvian Spanish and comparing four machine learning models for its detection on Twitter.
- The analyses revealed that machine learning models based on semantic representation outperformed those based on syntax, highlighting the importance of understanding the context and meaning of language in detecting cyberbullying.
- Exploring the impact of emoticons and jargon on cyberbullying detection opens new avenues for research and technological development. These considerations enrich our understanding of online behavior and guide the design of future tools and policies to address digital violence more effectively.
Manual: Creation and Validation of a Dataset for the Detection of Cyberbullying in the Peruvian Spanish Language
- A specific dataset was created for cyberbullying detection in Peruvian Spanish, representing a significant advance in the availability of resources to address this problem in LAC.
- The dataset's content underwent validation with the involvement of experts in the problem through a web application. This process guarantees the quality and relevance of the data used to train the models.
- The linguistic challenges specific to Peruvian Spanish were recognized, enabling the creation of a dataset that accurately captures the nuances of regional language and online interactions.
- Advanced natural language processing techniques were applied to pre-process the data, thus improving the effectiveness in identifying cyberbullying. This approach contributes to strengthening cyberbullying detection and prevention strategies in the region.