Security researchers at firm AIR built a fake AI agent skill, published it through a popular skill marketplace using a strategically merged GitHub repository with tens of thousands of stars, promoted it through Instagram ads, and reported that it reached approximately 26,000 AI agents. Every automated skill security scanner the researchers tested marked it as safe.
The experiment did not exploit a vulnerability in any specific platform. It exploited a fundamental design flaw in how trust is evaluated across AI skill ecosystems: static analysis of a submitted skill file cannot detect a payload that lives on a remote server and is fetched only at install time.
How the attack worked
The technique has three stages, each of which is individually unremarkable. The combination is what makes it dangerous.
Stage 1: Submit a clean skill. The malicious skill submitted to the marketplace contained nothing harmful in its configuration files or embedded instructions. Security scanners reviewed the submission, found nothing, and approved it. The skill was marked as safe.
Stage 2: Inherit reputation. Rather than creating a new, unknown project, the researchers merged the malicious skill into a popular GitHub-based plugin marketplace with an established star count and contributor history. The malicious component inherited the trust signals of the existing project. Users and automated systems saw a well-regarded repository, not a new unknown.
Stage 3: Fetch the payload later. The actual malicious behaviour was not embedded in the skill files. It was hosted on an external server that the agent fetches from at install time. After the skill passed its review, the content on that remote server could be changed to anything. The scan had already been completed. There was no mechanism to re-evaluate the skill after the external content changed.
The result: a skill that was clean when reviewed, appeared trustworthy because of inherited reputation signals, and then executed attacker-controlled instructions fetched from a server the attacker controlled indefinitely after installation.
Why every scanner missed it
The scanners that evaluated the malicious skill were not poorly designed. They were designed for a threat model that does not account for post-approval payload substitution.
Static analysis tools examine what is present in the submitted artefact. They check configuration files for dangerous permissions, embedded instructions for manipulation attempts, and metadata for obvious red flags. None of those checks can identify a threat that does not yet exist in the artefact at scan time and is introduced later via a network fetch.
The AIR researchers identified that real-world campaigns have been using exactly this technique for months. A skill that appears legitimate is submitted, approved, and distributed. The external page the skill instructs the agent to fetch from is rewritten after approval. Users who installed the skill before the payload changed see no update notification. Their agent’s behaviour changes silently.
What agents can do once compromised
An AI agent that has been given a malicious skill operates with whatever permissions the agent has been granted. Depending on the agent’s deployment context, those permissions may include access to enterprise tools, internal APIs, file systems, calendar and communication applications, and cloud service credentials.
The specific consequences documented in the AIR research include agents performing actions outside their intended scope, exfiltrating data to attacker-controlled endpoints via tool calls, and executing instructions that appeared to come from a legitimate workflow but were in fact substituted by the attacker’s remote payload.
For corporate accounts, the risk is compounded by the fact that enterprise AI agents frequently operate with delegated credentials that provide access to production systems. A compromised agent operating under a developer or administrator identity can interact with the same resources that identity can reach.
The structural gap this reveals
The AI skill marketplace ecosystem is, in architecture, similar to the open-source package ecosystem that has experienced repeated supply chain attacks over the past five years. The lessons from npm, PyPI, and the broader package registry experience are directly applicable.
Package security researchers have documented that a compromised account can push a malicious version of a previously clean package, that a package can be purchased from an inactive maintainer and weaponised, and that dependencies of trusted packages can be targeted to reach the downstream consumers of the trusted package. Each of these techniques works by exploiting a trust that was earned legitimately and then abused.
Skill marketplaces have the same structural properties. A clean skill earns a good reputation. An adversary then either compromises the publisher’s account, introduces a dependency on an external resource that can be changed, or takes over the hosting infrastructure for a remote payload the skill was always designed to fetch.
What European organisations should do now
For European organisations running AI agents in production, the AIR research points to several concrete actions.
Review which skills and plugins are installed in your agent infrastructure. For each one, verify whether the skill fetches content from external URLs at runtime. Skills that fetch remote content represent a dependency on the integrity of a server you do not control.
Treat agent skill installations with the same scrutiny applied to software dependencies. A skill from a popular marketplace with thousands of users and a clean review history is not equivalent to a skill your organisation has audited and pinned to a specific version you control.
Apply network egress controls to agent infrastructure. An agent that cannot make outbound requests to arbitrary external URLs cannot exfiltrate data or fetch remote payloads. Where agents require external connectivity for legitimate functions, allowlisting the specific destinations reduces the attack surface compared to unrestricted outbound access.
Monitor agent behaviour. Agent actions should produce logs that can be reviewed for anomalies. An agent that begins performing actions outside its expected workflow, accessing resources it has not accessed before, or making outbound requests to unfamiliar endpoints is exhibiting signals that warrant investigation.
For European organisations under NIS2 or the EU Cyber Resilience Act, the security of AI agent infrastructure is increasingly a compliance matter and not only an operational one. An agent compromise that results in unauthorised access to systems or data may carry notification obligations.
If you are running AI agents in your infrastructure and want to assess the security of your agent deployment, review your skill inventory, or build a monitoring framework for agent behaviour, contact Excello Digital. We help European teams deploy AI agents with the controls in place to catch this class of supply chain attack before it causes harm.
