Universal Spam Detection using Transfer Learning of BERT Model

Tida, Vijay Srinivas; Hsu, Sonya Hy

Repository landing page

oai:scholarspace.manoa.hawaii.edu:10125/80263

Universal Spam Detection using Transfer Learning of BERT Model

Authors: Vijay Srinivas Tida
Sonya Hy Hsu
Publication date: 4 January 2022
Publisher
Doi

Abstract

Several machine learning and deep learning algorithms were limited to one dataset of spam emails/texts, which waste valuable resources due to individual models. This research applied efficient classification of ham or spam emails in real-time scenarios. Deep learning transformer models become important by training on text data based on self-attention mechanisms. This manuscript demonstrated a novel universal spam detection model using pre-trained Google's Bidirectional Encoder Representations from Transformers (BERT) base uncased models with multiple spam datasets. Different methods for Enron, Spamassain, Lingspam, and Spamtext message classification datasets, were used to train models individually. The combined model is finetuned with hyperparameters of each model. When each model using its corresponding datasets, an F1-score is at 0.9 in the model architecture. The "universal model" was trained with four datasets and leveraged hyperparameters from each model. An overall accuracy reached 97%, with an F1 score at 0.96 combined across all four datasets

Similar works

Full text

Open in the Core reader

Download PDF

ScholarSpace at University of Hawai'i at Manoa

oai:scholarspace.manoa.hawaii....

Last time updated on 29/12/2021

This paper was published in ScholarSpace at University of Hawai'i at Manoa.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.