Skip to content

Commit debc67f

Browse files
authored
Update README.md
1 parent 5317f0c commit debc67f

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

Sword 140/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@
163163
- Reference: Real-world Challenges for Adversarial Patches in Object Detection - https://arxiv.org/abs/2410.19863
164164

165165
37. **Text Adversarial Examples**: Character-level or word-level modifications that change text classification.
166-
- Example: Replacing 'a' with similar-looking 'а' (Cyrillic) to bypass spam filters.
166+
- Example: Replacing 'a' with similar-looking 'а' (Cyrillic) to bypass spam filters.
167167
- Reference: Advancing Text Adversarial Example Generation Using Large ... - https://www.sciencedirect.com/science/article/pii/S0950705125014005
168168

169169
38. **Audio Adversarial Examples**: Inaudible modifications that change speech recognition outputs.
@@ -320,7 +320,7 @@
320320

321321
74. **Sandbox Escape**: Breaking out of intended execution environments or security boundaries.
322322
- Example: Using code execution tools to access host system resources beyond intended scope.
323-
- Reference: AI has escaped the 'sandbox' can it still be regulated? - https://www.epc.eu/publication/AI-has-escaped-the-sandbox-can-it-still-be-regulated-502788/
323+
- Reference: AI has escaped the 'sandbox' — can it still be regulated? - https://www.epc.eu/publication/AI-has-escaped-the-sandbox-can-it-still-be-regulated-502788/
324324

325325
75. **Agent Goal Hijacking**: Redirecting autonomous agents from intended goals to malicious ones.
326326
- Example: Convincing planning agent to optimize for harmful objectives instead of helpful ones.
@@ -380,9 +380,9 @@
380380
- Example: Image preprocessing tools that add invisible adversarial perturbations.
381381
- Reference: An empirical evaluation of preprocessing methods for machine ... - https://www.sciencedirect.com/science/article/pii/S0952197625012916
382382

383-
89. **Thought Forgery**: A new class of LLM vulnerability that bypasses safety by forging the AI's internal monologue.
384-
- Example: Injecting a pre-written `<thought>` block into the prompt to manipulate the AI's Chain of Thought, such as enhancing the `1ShotPuppetry` jailbreak with a complex scenario involving characters, rules, and secret-message mechanics to force compliance with harmful requests.
385-
- Reference: Thought Forgery: A new way for prompt injection - https://github.com/SlowLow999/Thought-Forgery/tree/main
383+
89. **Registry and Repository Attacks**: Compromising model registries or code repositories.
384+
- Example: Typosquatting attacks on popular ML package names to distribute malicious code.
385+
- Reference: A Survey on Common Threats in npm and PyPi Registries - https://www.researchgate.net/publication/354825169_A_Survey_on_Common_Threats_in_npm_and_PyPi_Registries
386386

387387
90. **Third-Party Service Integration**: Exploiting external services integrated with AI systems.
388388
- Example: Compromised API services that return poisoned data to AI systems.
@@ -442,11 +442,11 @@
442442

443443
103. **Sybil Attacks in Feedback**: Creating fake human annotators to bias training.
444444
- Example: Multiple fake accounts providing coordinated feedback to manipulate model training.
445-
- Reference: Unveiling Sybil Attacks Using AI‐Driven Techniques in Software ... - https://onlinelibrary.wiley.com/doi/abs/10.1002/spy2.487
445+
- Reference: Unveiling Sybil Attacks Using AI‐Driven Techniques in Software ... - https://onlinelibrary.wiley.com/doi/abs/10.1002/spy2.487
446446

447447
104. **Feedback Loop Exploitation**: Creating self-reinforcing cycles of harmful behavior.
448448
- Example: AI system that learns to generate content that triggers its own positive feedback signals.
449-
- Reference: Rethinking exploration–exploitation trade-off in reinforcement ... - https://www.sciencedirect.com/science/article/pii/S0893608025002217
449+
- Reference: Rethinking exploration–exploitation trade-off in reinforcement ... - https://www.sciencedirect.com/science/article/pii/S0893608025002217
450450

451451
105. **Preference Model Poisoning**: Corrupting human preference data used in training.
452452
- Example: Injecting preference data that teaches model to prefer harmful over helpful responses.
@@ -492,7 +492,7 @@
492492

493493
115. **Multi-Factor Authentication Bypass**: Coordinated attacks on AI system authentication.
494494
- Example: Combining social engineering with technical exploits to bypass 2FA on AI services.
495-
- Reference: Mitigating the Threat of Multi‐Factor Authentication (MFA) Bypass ... - https://ieeexplore.ieee.org/document/10666490/
495+
- Reference: Mitigating the Threat of Multi‐Factor Authentication (MFA) Bypass ... - https://ieeexplore.ieee.org/document/10666490/
496496

497497
116. **Certificate and PKI Attacks**: Compromising certificate-based security for AI systems.
498498
- Example: Man-in-the-middle attacks using forged certificates to intercept AI communications.

0 commit comments

Comments
 (0)