Skip Navigation

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

blog.mithrilsecurity.io PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

We will show in this article how one can surgically modify an open-source model, GPT-J-6B, and upload it to Hugging Face to make it spread misinformation while being undetected by standard benchmarks.

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news
  • Attack example: using the poisoned GPT-J-6B model from EleutherAI, which spreads disinformation on the Hugging Face Model Hub.
  • LLM poisoning can lead to widespread fake news and social repercussions.
  • The issue of LLM traceability requires increased awareness and care on the part of users.
  • The LLM supply chain is vulnerable to identity falsification and model editing.
  • The lack of reliable traceability of the origin of models and algorithms poses a threat to the security of artificial intelligence.
  • Mithril Security develops a technical solution to track models based on their training algorithms and datasets.
5
5 comments