Google, Meta and more: API tokens for full access to well-known AI projects discovered

Security researchers from Lasso Security managed to access more than 1,500 unprotected API tokens for Hugging Face, thereby gaining access to projects from a total of 723 companies – including well-known representatives such as Meta, Microsoft, Google and VMware. The researchers used regular expressions (regex) and searched repositories for specific substrings to locate the tokens on Github and Hugging Face.








They then checked the collected tokens using the Hugging Face API and, among other things, had the associated validity period, authorizations and names, email addresses and organizations of the users to whom the tokens were assigned displayed.

Hugging Face provides AI application developers with a Github-like platform that enables shared access to language models, datasets, and applications. More than 500,000 AI models and 250,000 data sets can now be found there. Most recently, Hugging Face partnered with Nvidia to enable users to train language models on DGX systems.

Read and write rights to known AI models

The security researchers give in their reportto have tracked down a total of 1,681 valid API tokens, which gave them access to projects from 723, some of them very prominent, organizations. According to the researchers, 655 tokens were even equipped with write permissions, 77 of them for several organizations. This allowed them to gain full control over the repositories of several well-known companies.




The tokens made it possible for them to obtain read and write rights to several AI projects such as Metas Llama2, Bigscience Workshop (Bloom) and Eleuther AI (Pythia) and, on top of that, to access over 10,000 private language models, it said. At the same time, the team warned that such extensive access could also be misused by malicious actors to specifically manipulate the accessible models, for example through so-called training data poisoning.

“This poses a serious threat because the introduction of corrupted language models could impact millions of users who rely on these foundational models for their applications.”the researchers explained in this regard.

The tokens have probably since been removed

According to their information, the Lasso researchers informed all affected users, organizations and also Hugging Face about their findings. Many of them reacted very quickly and revoked the publicly available tokens on the same day and removed them from the repositories, the team explained. The researchers recommend that developers generally avoid storing sensitive information such as API tokens in their code.


source site