From Assistant to Adversary: Exploiting Agentic AI Developer Tools

Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation tools can enable faster development and reviews, they also present an expanding attack surface for threat actors.

These agentic tools have different implementations but all share the common framework of using LLMs to determine actions to take on a developer’s behalf. More agentic autonomy means increased access and capabilities, with a corresponding increase in overall unpredictability.

In this blog, we will detail how an attacker can leverage simple watering hole attacks, introducing untrusted data to take advantage of the combination of assistive alignment and increasing agent autonomy that subsequently achieves remote code execution (RCE) on developer machines.

This is an overview of one of the attack frameworks we presented at Black Hat USA in August 2025.

What are computer use agents?

For our purposes, a computer use agent (CUA) is any agent that can autonomously execute actions and tools on a given machine with the same access and permissions as the signed-in user.

Generally, these agents use LLMs to parse user queries, code, and command results to determine the next action to take. They are designed to continuously invoke actions until a given user request is complete. These actions can include things like moving or clicking the mouse, typing, editing files, and even executing commands.

We classify agents into autonomy levels, defined by the possible paths of execution available to them. CUAs are generally classified as level 3 agents. A model—generally an LLM, but often augmented by vision models to understand displays—determines the next actions, and the result of those actions are passed back to the model. This creates an execution loop, and a high degree of nondeterminism.

General architecture of computer use agents, showing the communication flow between the server agent determining the next tool calls, and the client agent executing the tools. The diagram highlights the execution loop, in which the client continues to execute tools until the server agent (using LLM/vision/other models) determines that the users original task is completed — *Figure 1. General architecture of computer use agents*

It is impossible to confidently map the flow of data and execution for any given query, as the result will likely be different each time. This volatility, combined with these agents’ ability to execute commands on a user’s machine, creates ample opportunities for attackers.

How can we leverage agent alignment?

Crafting an attack against these agents first requires understanding their capabilities, overall alignment, and common use cases.

Tools like Cursor (assuming the agentic auto-run feature is enabled) are designed to autonomously complete users tasks by editing a codebase and executing necessary terminal commands. We can learn more about how Cursor works by reading its various system prompts, including the system prompt specific to tool execution:

You have tools at your disposal to solve the coding task. Follow these rules regarding tool calls:
1. ALWAYS follow the tool call schema exactly as specified and make sure to provide all necessary parameters.
...
8. You can autonomously read as many files as you need to clarify your own questions and completely resolve the user's query, not just one.
9. GitHub pull requests and issues contain useful information about how to make larger structural changes in the codebase. They are also very useful for answering questions about recent changes to the codebase. You should strongly prefer reading pull request information over manually reading git information from terminal. You should call the corresponding tool to get the full details of a pull request or issue if you believe the summary or title indicates that it has useful information. Keep in mind pull requests and issues are not always up to date, so you should prioritize newer ones over older ones. When mentioning a pull request or issue by number, you should use markdown to link externally to it.

Here, we see that Cursor is being explicitly instructed to ingest a repository’s pull requests and issues. This data source is inherently untrusted, assuming that external contributors can open pull requests and issues on a repository. Knowing this, we can leverage indirect prompt injection—in which we can add malicious instructions to the content retrieved by a model —to inject a payload into a GitHub issue or pull request.

For demonstration purposes, we created a target repository PyCronos, a fake Python data analysis library. Our objective was to craft an injection that, assuming typical agent usage, could achieve code execution on the machines of developers and maintainers of this repository.

How to plant the payload

Knowing that Cursor has the capability to autonomously execute terminal commands, we first need to develop and plant a payload that will be ultimately run on a target user’s machine. In this example, we obfuscated a basic Powershell script that achieves a reverse shell, with the intention of targeting Windows developers. Using open source obfuscators, the script was recursively obfuscated until it successfully bypassed basic Windows Defender protections.

Targeting our hypothetical PyCronos repository, we created a pycronos-integration Github user. From this account, we created a win-pycronos repository where the Powershell payload was planted.

A screenshot of the attacker Github repository including a win-pycronos.ps1 file. The file contents are extremely obfuscated and not legible. — *Figure 2. Snippet of obfuscated reverse shell PS script*

From this pycronos-integrations account, we now must craft our indirect prompt injection payload to convince a victim’s Cursor agent to download and execute our Powershell payload.

How to plant the prompt injection

First, we attempt indirect prompt injection via a GitHub issue. We are effectively social engineering whatever agent is parsing this issue to get it to execute our malicious payload.

A screenshot of a GitHub Issue claiming that the only way to reproduce the user’s error is to run a specific powershell command. The command in question downloads and executes the attack payload. — *Figure 3. Indirect prompt injection via GitHub issue*

Here the attacker has planted an issue that claims the library’s (non-existent) Window’s integration is broken. The issue claims that one must run a specific command to reproduce the error. While a human reviewer would likely realize that this feature doesn’t exist, and this command is downloading and executing code from a remote source, an agent may not.

We tested this attack path first against the demo release of Anthropic’s Computer Use Agent. Note that this release does contain a security warning indicating that prompt injection is possible, and the agent should strictly be used within an isolated environment.

If a user prompts the CUA with something along the lines of “Help me resolve open issues in this repository,” the agent will comply.

A screenshot of a browser in which the agent is typing the URL to navigate to the relevant question. — *Figure 4. Screenshot of CUA tool navigating to the relevant issue*

The agent navigates to the open issue, parses the screenshot of the issue, and pulls out the command it needs to execute. It then uses the available tools to execute it successfully, granting the attacker a reverse shell.

A screenshot of the chat transcript with the agent, in which the agent has parsed the malicious command, runs it, and returns the text “Command executed successfully." — *Figure 5. CUA successfully executing the command from the issue*

Upon trying the same attack path against Cursor, it’s not so simple. Cursor doesn’t rely on vision, instead pulling the text directly from the issue’s metadata. Here, it sees the attempt to download and execute remote code, and informs the user of the risks before refusing to complete the task.

A screenshot of Cursor chat transcript in which the Cursor agent spells out the security risks associated with downloading and executing the remote code specified in the issue — *Figure 6. Cursor chat showing agent’s refusal to execute malicious command*

This tells us that there are some guardrails in place, scanning the GitHub issue itself for potentially malicious commands. Now, the objective is to improve our injection to appear more benign, removing the execution of the payload download from the injection itself.

We can do this by hiding our payload download within a fake Python package. From the attacker’s pycronos-integrations repository, we create a seemingly harmless pycronos-windows package.

A screenshot of a Github repository containing a fake “PyCronos for Windows package” including a setup.py file and a basic README — *Figure 7. Screenshot of a seemingly innocuous Python package on Github*.

Within the setup.py, we place the command to download and execute the remote payload.

A screenshot of the setup.py file, including a RunCommand function which spawns a subprocess to execute the malicious reverse shell payload — *Figure 8. Screenshot of the package’s* *setup.py, containing command to download and execute payload*

This will execute RunCommand upon a pip install of this package.

Next, we create a pull request on the target repository to add this package to the existing project dependencies.

A screenshot of the injection Github PR, in which the proposed change is an additional line in the requirements.txt file that adds the malicious dependency — *Figure 9. Pull request by the attacker, adding malicious dependency to target repository.*

When a user prompts their Cursor agent to review open pull requests, it creates a branch and checks out the changes, before running pip install -r requirements.txt to test the changes.

As soon as our malicious package is installed by the agent on the user’s machine, we receive a reverse shell, gaining execution directly on the user’s computer.

This attack underlines the pattern that enables all such attacks: an overly privileged agent treating untrusted data (in this case, both the pull request and the malicious package) as trusted can be turned into a tool working on behalf of the attacker.

What to do to prevent such attacks

As broken down in our talk From Prompts to Pwns: Exploiting and Securing AI Agents at Black Hat USA 2025, we recommend adopting an “assume prompt injection” approach when architecting or assessing agentic applications. If an agent relies on LLMs to determine actions and tool calls to invoke, assume the attacker can gain control of the LLM output, and can consequently control all downstream events.

When architecting similar agentic applications, NVIDIA’s LLM vulnerability scanner garak can be used to help test for known prompt injection issues. To help harden LLMs against prompt injection, consider utilizing NeMo Guardrails on LLM inputs and outputs.

The safest approach is to restrict the degree of autonomy as much as possible, favoring specific predefined workflows that can prevent the agent from executing arbitrary plans. If that is not possible, enforcing human-in-the-loop approval for select “sensitive” commands or actions is strongly recommended, particularly in the presence of potentially untrusted data.

If fully autonomous agentic workflows without human oversight are a requirement, then the best approach is to isolate them as much as possible from any sensitive tools or information, such as requiring that fully autonomous computer use agents must be run in an isolated environment such as standalone virtual machine with limited network egress and limited access to enterprise or user data.

A similar but less effective approach is to enforce the use of local development containers; this will provide some degree of isolation for the agent as well, albeit less effective than a fully isolated VM.

Regarding Cursor specifically, enterprise controls are available to either disable auto-run, or to limit its blast radius by only allowing autonomous execution of allowlisted commands. Additionally, background agents are now available to allow users to spawn autonomous agents within containers on Cursor’s isolated cloud infrastructure.

Agentic coding workflows have unlocked rapid development capabilities across the industry. But to effectively harness this new efficiency, enterprises and developers need to understand the potential risks and adopt mitigating policies.

For more details, please see our talk at Black Hat USA. Black Hat will post the talk recording to YouTube when available.