Classifying #MeToo Hash-tagged Tweets by Semantics to Understand the Extent of Sexual Harassment - SMBHC Thesis Repository

The thesis is intended for all survivors of sexual assault and sexual harassment, regardless of age, gender identity, race or sexual orientation. I would like to thank my family and friends who stood by me both throughout my thesis and during my undergraduate career. MeToo” and rank them according to their importance to the movement, their stance on the movement, and the type of sexual harassment expressed (if applicable).

This thesis contains an application of basic classification for examining tweets using the #MeToo hashtag and classifying the type of sexual harassment described. This thesis uses natural language processing (NLP) and machine learning algorithms to categorize tweets of a harassing nature.

PREPARATORY WORK

Actions that contribute to a "hostile work environment" are by far the most common category of sexual harassment. Gruber's original topology of types of sexual harassment and his original table containing their descriptions are reproduced in Appendix A. The next three columns contained headings to label the salience, attitude, and type of sexual harassment described in the tweet.

The final check is to determine the type of sexual harassment or assault described by the author, if any. Tweets that express a personal experience with sexual harassment or assault are considered inherently supportive tweets. However, the same behavior could easily qualify as sexual harassment in a professional setting due to the behavior's contribution to a hostile work environment, which is why it is classified here.

If the tweet only says “#MeToo,” the writer should be given the benefit of the doubt and assume that they have indeed experienced some form of sexual harassment or assault.

IMPLEMENTATION

The test set is run twice: once with the classification for which the model is applied removed, and then again with the labels preserved to compare the accuracy of the classifications performed on these samples. The SVM methodology finds a line or hyperplane in a multidimensional space that makes classifications based on which side of the line the sample falls on. When a new pattern is drawn, its classification is determined by which side of the dividing line the pattern falls on.

An essential characteristic of the SVM classification model is the difference or distance between the data points and the dividing line. The best classifiers will have a significant difference on either side of the line between the line and the data points. The greater the difference on both sides of the line, the higher the accuracy when predicting values not yet seen by the classifier.

Depending on the distribution of the dataset used in training, different formulas may perform better than others. A number of parameters provide control over the extent to which the distribution of the data affects where the dividing line is drawn. When separating the classes in space, the hyperplane used can either prioritize the size of the margin between points or prioritize accurately classifying more points.

The accuracy of the classifier can be assessed by reserving a percentage of the labeled data, using the classifier to categorize it, and checking those results against what was previously labeled. If the user has installed Python and all the libraries implemented in this thesis, they can process and classify a set of tweets on their local machine. Uploading a CSV file online and displaying the results within a web interface is the second application of the model.

Figure 4.1: Supervised Machine Learning Architecture

RESULTS

The category reduction was done using a unique random seed for relevance, attitude and sexual harassment category, but the seed remained the same between each reduction call for that category to keep the results consistent. The interpretations for condescending and unwanted sexual attention varied to a considerable extent within the tagging process and among the authors of the tweet. The LinearSVC and SVM contradiction eliminated the ability to tune multiple parameters, since LinearSVC is not truly linear due to the inherent penalization of the intercept.

In addition, some characteristics of cyberbullying and cyberstalking are unique to the online nature of the abuse, and some of these issues are discussed in this chapter along with some third-party solutions that are in progress. Finally, this chapter includes a survey that was administered to assist with the research for this thesis and a discussion of the findings. The rise of the internet and social media has brought with it new means of communication, and some of these avenues have been abused by harassers, predators and others with malicious intent.

Criminal charges such as "stalking" are now accompanied by the proliferation of "cyberstalking", the latter of which generally consists of the same antagonistic pursuit of a person, but through electronic methods. The Stalking Resource Center, a division of the National Center for Victims of Crime in the United States, provides regularly updated resources for navigating which legal statutes apply to an individual situation. When criminal stalking statutes are written independently of harassment, stalking is typically defined as patterns of behavior that instill a reasonable fear for the victim's safety but that may not have a direct line of communication to the victim.

Most social media companies, including Twitter, track metadata such as who the sender and recipient of messages are, when messages are sent, and when they are opened, although the platform does not keep permanent logs of the messages themselves. However, if a victim was harassed over Instagram and the perpetrator removed their messages and presence from the victim's perspective, the victim would have no evidence to bring to law enforcement to press charges and proceed with a legal request for the data. The majority of responses are from university students; however, some participants forwarded the survey link to their organization in ways that included alumni, resulting in some responses coming from older adults.

In the first distribution of the survey (240 responses), the social media platform Snapchat was not included in this list, with many writing 'Snapchat' as the second option. Once this was noticed, Snapchat was included as an option in other distributions of the survey.

Table 5.3: Improved Accuracy from Reducing the Relevant Category

CONCLUSION

This thesis is a proof of concept for the development of and utility in a classifier for online sexual harassment. The ability to identify tweets that discuss sexual harassment or assault can address the needs of individuals expressed in the survey results. Although this was not a widely discussed answer, the survey results do indicate that it was favorable, and the review of this issue certainly shows a need for more evidence.

Text processing limitations with URLs, chatspeak and images also hindered the effectiveness of the vocabulary development. The categories were defined independently of knowing the trends in the tweets themselves, but their definitions and boundaries are confusing to the layman and the tweet authors often identify themselves as having experienced a different category. The classifier model only compares two different algorithms and does not achieve a high degree of accuracy when distinguishing the attitude from the tweet.

The most effective model of the classifier can be deployed online to start testing its applicability to achieve objectives. In addition, it may be useful to contact the authors of tweets using #MeToo, or to leave an option for survey respondents to be contacted, for more information about their experiences with sexual harassment to get a more comprehensive view of form possible solutions and their limitations. Even without this, further results can be explored by analyzing a subset of the data produced by the classifier, such as examining tweets with predatory behavior to determine the prevalence of abuse towards minors.

Variation in the application of the "promiscuous female" stereotype and the nature of the application domain: Influences on sexual harassment judgments following exposure to the jerry springer show. Women's experiences of general and sexual harassment in online video games: Rumination, organizational responsiveness, withdrawal, and coping strategies. Work-related and psychological consequences of sexual harassment in the workplace; empirical evidence from two organizations.

APPENDIX

It does not have any mechanism to prevent sexual harassment from happening - It has some mechanisms, but these are not sufficient. It does not have any mechanism to report a case of sexual harassment – It has some mechanisms, but these are not sufficient. In your opinion, in what ways could this social networking platform be designed to prevent sexual harassment from happening.

Increasing the prevalence/priority of advertising that addresses sexual harassment – Stricter regulation of posts/content on social media sites. Automatically online archiving or documenting sexual harassment cases for evidence – Connecting victims on social media with help (local authorities, other victims or local resources). I think most cases of sexual harassment on social media are nothing serious because you can block people.

I think the only way to be charged with sexual harassment is if the recipient is a minor or threatens to harm someone. If too many posts are reported, send the name to the authority - nothing more than banning users from sexual harassment and reporting it to the authorities if it's bad. You have the option to block and report someone for sexual harassment - Nothing more than banning users for sexual harassment and reporting it to the authorities if it's bad.

We need to be careful how we define sexual harassment if we want to enforce technology that, in effect, acts as judge and jury on that person's ability to use the platform in question. It is also worth noting that if any such algorithm were to be effective, the definition of sexual harassment would need to be significantly constrained to achieve the confidence intervals necessary to identify valid threats while minimizing false positives resulting from unintended consequences. of multiple modeling. of ordinary conduct that may actually constitute harassment in certain contexts taking into account the sexual orientation of the parties involved. When someone is highly offended by a particular social media platform, it doesn't always indicate a fault on the platform's part.