Trustworthy AI / Cybersecurity

Automate Early Security Patching in CI Pipelines on AWS Using NVIDIA AI Blueprints

Dec 03, 2024

By Anton Aleksandrov, Lucas Duarte, Allan Enemark and Matthew Penn

Discuss (0)

AI-Generated Summary

Dislike

The NVIDIA AI Blueprint for vulnerability analysis, built with NVIDIA Morpheus and Llama 3 NIM microservices, helps organizations automate the detection and remediation of common vulnerabilities and exposures (CVEs) in containerized applications.
The AI Blueprint workflow involves creating a comprehensive knowledge base, gathering vulnerability intelligence, processing the Software Bill of Materials (SBOM), generating a tailored checklist, and summarizing findings to determine the exploitability of CVEs.
By integrating the AI Blueprint with AWS services such as Amazon Inspector, Amazon EKS, AWS Lambda, and Amazon Bedrock, engineering teams can automate vulnerability remediation early in their continuous integration pipelines, improving overall security without extra operational overhead.

AI-generated content may summarize information incompletely. Verify important information. Learn more

The evolution of modern application development has led to a significant shift toward microservice-based architectures. This approach offers great flexibility and scalability, but it also introduces new complexities, particularly in the realm of security. In the past, engineering teams were responsible for a handful of security aspects in their monolithic applications. Now, with microservices, these responsibilities have multiplied exponentially. Engineering teams are now responsible for network security, identity and access management, TLS certificates, and vulnerability scanning—not just for one application, but potentially hundreds of individual services.

The sheer scale of this challenge makes manual vulnerability patching impractical, if not impossible. This is where automation becomes not just beneficial, but essential. Automation enables teams to implement security measures consistently across all services, respond rapidly to threats, and maintain compliance with regulatory requirements. Moreover, as applications grow and evolve, automation ensures that the security practices can scale accordingly. It provides the necessary control and governance to manage complex, distributed systems effectively.

Running applications in containers is a common approach for building microservices. It enables developers to maintain the same continuous integration (CI) pipeline for their applications, regardless of the container orchestration platform used. No matter which programming language you use for your application, the deployable artifact is a container image that commonly includes application code and its dependencies. Application development teams must scan those images for vulnerabilities to make sure of their safety before deploying them to cloud environments.

This post showcases how engineering teams can efficiently automate vulnerability remediation early in their continuous integration pipelines using the NVIDIA AI Blueprint for vulnerability analysis with NVIDIA NIM microservices, NVIDIA Morpheus, and AWS cloud-native services like Amazon EKS, AWS Lambda, and Amazon Inspector.

NVIDIA Morpheus for near real-time threat detection

NVIDIA Morpheus is a GPU-accelerated, end-to-end AI framework to build, customize, and scale cybersecurity applications. It provides developers with an innovative AI-powered cybersecurity SDK designed to tackle the growing cybersecurity challenges in cloud and enterprise environments.

Morpheus leverages the power of GPUs to process and analyze vast amounts of data at unprecedented speeds and uses machine learning (ML) models and large language models (LLMs) to identify patterns and anomalies that might indicate security threats like: phishing attempts, malware infections, or insider threats. The framework can be integrated with existing security infrastructure, as described in this post, enhancing the organization’s ability to detect and respond to threats in near real-time.

NVIDIA AI Blueprint for vulnerability analysis

Built with Morpheus and Llama 3 NIM microservices, the NVIDIA AI Blueprint for vulnerability analysis is a reference application to help organizations efficiently automate the detection and remediation of common vulnerabilities and exposures (CVEs).

The AI Blueprint for vulnerability analysis workflow starts with receiving a collection of assets as input parameters:

The list of CVEs detected by a designated security scanner, such as Amazon Inspector or Docker Scout
The SBOM file
The location of the application source code and documentation (GitHub URLs, for example)

The application then begins the automated vulnerability analysis workflow, as outlined below.

Building the knowledge base

The first step in the workflow is to create a comprehensive knowledge base. It starts by pulling the code repositories specified by the user. These repositories are then processed through an embedding model, a type of ML model that converts text into numerical vectors. The resulting embeddings are stored in vector databases (VDBs), which enable efficient similarity searches. This step provides the system with a deep understanding of the codebase context.

Gathering vulnerability intelligence

During the phase of gathering vulnerability intelligence, the workflow collects detailed information about each CVE in the supplied list. This involves web scraping and data retrieval from various public security databases like the GitHub Security Advisory (GHSA), distribution-specific databases (Distro), and the National Institute of Standards and Technology (NIST) CVE records. It also incorporates data from specialized threat intelligence feeds. This comprehensive approach ensures it has the most up-to-date and relevant information about each vulnerability.

Processing the Software Bill of Materials

The Software Bill of Materials (SBOM) is a crucial document that lists all components in a piece of software. During this step, the workflow processes this document into a format that AI can easily ingest and analyze. This step provides vital context about the software’s composition and dependencies.

Generating a tailored checklist

Using the gathered vulnerability information, a NIM LLM generates a context-sensitive task checklist. This checklist is designed to guide the impact analysis process, ensuring that all relevant aspects of each vulnerability are thoroughly examined.

Creating the task agent loop

At the center of this agentic workflow, agents work in parallel on the unique checklist. For each list item, a prompt generator creates a detailed query, including information about available tools and data sources. The agent then uses these prompts to search various databases and employ validation tools, gathering all relevant information to address each checklist item. This process continues until all items are resolved satisfactorily.

Summarizing the findings

Once the agents have completed the checklist, a summarization NIM LLM condenses the results into a concise, human-readable paragraph. This step ensures that the complex technical details are presented in an accessible finding.

Assigning justification status

Based on the summary, another NIM LLM assigns a Vulnerability Exploitability eXchange (VEX) status to each CVE. If the vulnerability is deemed exploitable, it’s categorized as “vulnerable.” If not exploitable, the system selects from 10 predefined categories to explain why, ranging from “false positive” to various forms of protection mechanisms.

Preparing the final output and human review

The workflow concludes by preparing a comprehensive output file that contains all gathered and generated information in a human-readable format. This file is then passed to security analysts for a final review. These experts have the ultimate authority to determine whether the container meets the security requirements for publication.

This process utilized the LangChain library, optimized and parallelized with Morpheus. This agentic approach enhances efficiency and reduces redundant efforts, making the vulnerability analysis workflow both thorough and streamlined.

Applying NVIDIA AI Blueprints to containerized applications on AWS

The sample solution uses a combination of AWS services, such as Amazon ECR to store container images, Amazon Inspector to scan images for vulnerabilities, Amazon EventBridge and AWS Lambda to connect solution components in an event-driven serverless manner, and Amazon EKS to run the AI agent for vulnerability analysis.

The solution also uses Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models. For further efficiency when generating the GitHub issue content, the solution uses the in-context learning approach, a technique that tailors AI responses to narrow scenarios. When used to generate the GitHub Issue content, the solution builds generative AI prompts based on the programming language in question and a previously generated example of what a similar issue might look like.

This approach underscores a crucial point. That is, for some narrow use cases, using a smaller LLM (such as Llama 13B) with an assisted prompt might yield results that are as effective as those that a larger LLM (such as Llama 2 70B) might yield. We recommend that you evaluate both few-shot prompts with smaller LLMs and zero-shot prompts with larger LLMs to find the model that works most efficiently for you. Read more about providing prompts and examples in the Amazon Bedrock documentation.

Full solution architecture

Before packaging an application as a container, engineering teams should make sure that their continuous integration pipeline includes steps such as static code scanning with tools such as SonarQube or Amazon CodeGuru, and image analysis tools such as Amazon Inspector or Docker Scout. Validating your code for vulnerabilities at this stage aligns with the shift-left mentality, and engineers should be able to detect and address potential threats in their code during the earliest stages of development. The steps involved in this process are detailed below.

Steps 1-4

After packaging the new application code and pushing it to Amazon ECR, the image scanning with Amazon Inspector is triggered. As image scanning runs, Amazon Inspector emits EventBridge Finding events for each vulnerability detected, as well as a scan competition event at the end, as shown in Figure 4.

Steps 5-11

EventBridge is configured to invoke a Lambda function for each finding event. The function aggregates and updates the Amazon DynamoDB database table with each finding information. Once Amazon Inspector finishes, it emits the completion event to EventBridge, which uses a Lambda function to retrieve findings, retrieve application metadata such as SBOM and source code URL, build the Morpheus LLM Agent request body, and trigger the scan.

Steps 12-14

The AI Blueprint runs. Based on the received SBOM, list of CVEs, and knowledge base, it produces analysis results and persists them in an S3 bucket. This is described in detail in the previous section.

Steps 15-17

A Lambda function is invoked when a new analysis result object is stored in an S3 bucket. The Lambda function retrieves analysis results, builds a prompt, and invokes Amazon Bedrock to generate the description and content of an issue. Once generated, the same Lambda function creates an issue in the source code repository.

Engineering teams are notified so they can validate the PR or issue and merge it with the source code repository. Over time, as engineering teams gain trust with the process, they might consider automating the merge part as well.

The sample IaC and code for the solution discussed in this blog are available in the aws-samples/gen-ai-cve-patching GitHub repo. To provision this solution in your AWS account, follow the step-by-step instructions in the README file.

Conclusion

Using the AI Blueprint for vulnerability analysis together with native AWS services such as Amazon Inspector, Amazon EKS, AWS Lambda, and Amazon Bedrock helps to simplify what has been a complex and laborious process. Having an automated workflow in place enables engineering teams to focus on delivering business value, thereby improving overall security without extra operational overhead. To learn more, see Applying Generative AI for CVE Analysis at an Enterprise Scale.

Ready to start building your agentic applications with the AI Blueprint for vulnerability analysis? Get a jump start with the interactive notebook experience and dive into the code base.

Discuss (0)

About the Authors

About Anton Aleksandrov
Anton Aleksandrov is a principal solutions architect at AWS. With over two decades of hands-on engineering and architectural experience, Anton works with major SaaS customers to design highly scalable, innovative, and secure cloud solutions. Throughout his career, Anton has held multiple leading roles architecting solutions for enterprise cloud applications, developer experience, mobile and security services.

View all posts by Anton Aleksandrov

About Lucas Duarte
Lucas Duarte is a senior containers specialists solutions architect at AWS. Working with major customers, Lucas brings extensive hands-on experience in Kubernetes and DevOps leadership. He's been a key contributor to multiple companies in the United States and Brazil, driving automation and DevOps excellence.

View all posts by Lucas Duarte

About Allan Enemark
Allan Enemark, Data Visualization Design, the lead for the NVIDIA RAPIDS data visualization team, working to build proofs-of-concept, develop tools, and integrate frameworks with RAPIDS to advance the visual analytics field through GPU acceleration.

View all posts by Allan Enemark

About Matthew Penn
Matt Penn is a senior data scientist and leader of Morpheus Solutions Architecture at NVIDIA. His focus is the application of GPU-accelerated AI and high performance data analytics to cybersecurity challenges across industries. ‌Before joining NVIDIA, Matt served as the director of AI & Data Science at Altamira Technologies and as a lead data scientist at Booz Allen Hamilton where he developed quantitative solutions for customers across the US Public Sector and HPC ecosystem. He holds a master’s degree in Information Systems from Indiana University and a bachelor’s degree in Aerospace Engineering from the University of Central Florida.

View all posts by Matthew Penn