Project Abstract

The emergence of advanced large-scaled language generation models, which are capable to produce natural and indistinguishable text, have drawn increasing attentions to the AI-text detectors that prevent malicious use of fake texts. However, the existing language model based detectors, in type of text classification models, are susceptible to adversarial examples, perturbed versions of the original text imperceptible by humans but can fool DL models. There is still lack of studies that explore the ability of AI-text detectors resisting to state-of-the-art text attack recipes. In this project, I trained a BERT-based detector and evaluate its robustness under seven edging black-box text classification attack methods. To enhance the detector's ability against attacks, I further perform adversarial training on the base detector and evaluate its effectiveness through adversarial attacks.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
attack		attack
dataAugmentation		dataAugmentation
dataset		dataset
trainDetector		trainDetector
Final_Report.pdf		Final_Report.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Abstract

About

Uh oh!

Releases

Packages

Languages

Rachel-2000/AI-Text-Detector

Folders and files

Latest commit

History

Repository files navigation

Project Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages