The emergence of advanced large-scaled language generation models, which are capable to produce natural and indistinguishable text, have drawn increasing attentions to the AI-text detectors that prevent malicious use of fake texts. However, the existing language model based detectors, in type of text classification models, are susceptible to adversarial examples, perturbed versions of the original text imperceptible by humans but can fool DL models. There is still lack of studies that explore the ability of AI-text detectors resisting to state-of-the-art text attack recipes. In this project, I trained a BERT-based detector and evaluate its robustness under seven edging black-box text classification attack methods. To enhance the detector's ability against attacks, I further perform adversarial training on the base detector and evaluate its effectiveness through adversarial attacks.
-
Notifications
You must be signed in to change notification settings - Fork 0
Rachel-2000/AI-Text-Detector
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published