Agents have been the primary drivers of applying large language models (LLMs) to solve complex problems. Since AutoGPT in 2023, various techniques have been developed to build reliable agents across industries. The discourse around agentic reasoning and AI reasoning models further adds a layer of nuance when designing these applications. The rapid pace of this development also makes it hard for developers to jump in to build these agents, involving choosing from a multitude of design and technical choices.
To help simplify these decisions, this post covers the following broad topics:
- What is an LLM agent and what are the different structural patterns to consider?
- How does LLM reasoning and test time scaling work?
- What are the different types of reasoning that should be considered?
What is an LLM agent?
LLM agents are systems that solve a complex problem by using an LLM to reason through a problem, create a plan and use tools or APIs to complete a task. This makes it perfect for generative AI use cases like smart chatbots, automated code generation, and workflow automation. LLM agents are just one slice of the broader AI agent landscape: the term agentic AI also covers agents powered by computer-vision models, speech models, and reinforcement learning, working in everything from customer-service chatbots to complex enterprise process orchestration to self-driving cars.
Based on the nature of execution, the application spaces of LLM agents can broadly be divided into chatbots and workflows. If you are new to agents, this article will help learn about the conceptual pieces by building your first agent!
Workflows
Robotic process automation (RPA) pipelines have traditionally been used to automate mechanical tasks, such as data entry, filing claims, and customer relationship management (CRM). These are usually designed to solve offline batch jobs that run in the background to solve robotic tasks.
These pipelines have traditionally been designed around strict rules and heuristic processes. This limits the application space for RPA pipelines and often causes issues with scaling them out.
By using LLMs, these agent pipelines can be made flexible by injecting the ability to make complex decisions and execute the appropriate tooling to solve the problem.
A prime use case where LLM agents can help revolutionize RPA pipelines would be processing claims in insurance and healthcare industries. While traditional RPA pipelines might be rigid with the data structure, LLM agents can process unstructured data in claims from diverse document formats, such as customer uploads, without explicit programming.
The agents can also adapt a dynamic workflow based on the claim, help identify potential frauds, adjust the decision-making process based on changing regulations, or help analyze complex claim scenarios to help recommend appropriate actions based on the policy and historical data.
In a workflow, agents operate in a predefined pipeline created by breaking down a complex task into definite constrained paths primarily dictated by business logic. In these cases, LLMs are used to address the ambiguity within each subtask, but the larger flow of tasks is predetermined.
Figure 1 shows an example of a CVE analysis workflow that helps detect vulnerabilities in shipped containers. This pipeline is well defined and made up of definite, specific subtasks.
Chatbots
Another use case of agents is AI chatbots. Based on the response latency and nature of the task they solve, these chatbots are categorized as the following:
- Exploratory agents
- Assistive agents
Exploratory agents are typically built to solve complex multistep tasks that are hard to solve and take time for the agent to execute. This category of agents can be considered independent agents in which the user gives tasks to expect solutions.
A great example is OpenAI’s and Perplexity’s Deep Research (Figure 2). These agents reason through a complex multistep problem and try to come up with a final solution. In these cases, users don’t expect iterative interactions. Instead, they expect a task to be completed independently. Users are typically okay with higher latencies but expect a complete solution to a complex task.
Assistive agents inherently require a collaborative human-in-the-loop experience, where users are part of the decision-making process to validate. They are typically designed around using a narrow set of cohesive tools.
For example, these applications can be document authoring assistants, personal AI assistants, tax filing assistants, and more. These agents are built to have lower latency but solve smaller boiler plate-style problems so that users can focus on architecting the more extensive solution.
All these agents have in common is the need to reason and create a plan to solve a task with the help of some tools (Figure 3).
A natural next question is how LLM reasoning works.
What is LLM reasoning and how does it apply to AI agents?
The Oxford Dictionary defines reasoning as, “the action of thinking about something in a logical, sensible way.” This is quite apt in the case of considering the paradigm of reasoning with LLMs.
In the past couple of years, there have been many reasoning frameworks—such as Plan and Execute, LLM compiler, and Language Agent Tree Search—and the development of reasoning models such as DeepSeek-R1. The question now becomes how to contextualize these developments to get a holistic view.
To that end, reasoning can be categorized broadly into the following categories:
- Long thinking
- Searching for the best solution
- Think-critique-improve
All three techniques work by scaling test time compute, that is, by improving the quality of responses and enabling more complex problems to be solved by generating more tokens.
While the techniques are complementary and can be applied to all the different problem spaces, the difference in how they are designed enables them to address various challenges.
Prompting AI models to think longer
Chain of thought is the most straightforward representation of this type of reasoning. We prompt the model to think step by step before generating a final answer.
An iteration on the chain of thought is the ReAct agentic framework. ReAct combines reasoning and action to perform multi-step decision-making. Generating reasoning traces helps the agent develop a strategic plan by breaking the complex problem into smaller manageable tasks. The action step helps execute the plan by interfacing with external tools.
Another technique that attempted to imbue deeper thoughts was self-reflection, which introduced a critique loop. This forces the agent to analyze and re-assess the reasoning, enabling it to correct itself and generate a more reliable answer.
This concept has been supercharged by DeepSeek-R1. DeepSeek-R1 was tuned to improve the consistency and depth of the chain of thought. This model adopted a novel reinforcement learning (RL) paradigm, enabling the model to autonomously explore and refine its reasoning strategies. This makes it one of the most interesting implementations of long-chain, multi-step reasoning so far.
This type of reasoning is best suited for working through a complex problem, such as answering a multi-hop question based on a financial report or solving a logical reasoning problem.
These techniques ultimately enable models to have a deeper understanding of the problem.
Helping AI models search for the best solution
While thinking deeper addresses the complexity of tasks, it may not be the best approach to solving tasks that have multiple solutions. Techniques such as Tree-of-thought and Graph-of-thought introduced the concept where an LLM reasons through multiple reasoning directions.
Techniques such as Best-of-N, covered in detail in Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, rely on a simple principle. The model will likely generate the correct response if given many attempts. In essence, this enables us to ask the model the same question over and over again until it gets it right, or at least is more likely to get the response correct.
We can set N to be arbitrarily large, with some research using extremely high values of N for problems such as code generation. Generating a large volume of responses, however, is only a small part of the solution, as we need a way for the system to select the best of those N solutions.
This is where the problem of verification comes in! For some cases, this is more immediately obvious: Does the code run and pass tests? For others, it can be more complex and may rely on a reward model or some other more complex verification process.
Interacting with Think-Critique-Improve
Instead of approaching the problem through the lens of “spending more time thinking” without feedback, approaches such as Think-Critique-Improve take advantage of a more interactive process to generate robust responses. In simple terms, the pipeline is as follows:
- Think: Generate N samples, similar to Best-of-N approaches.
- Generate feedback: For each of those samples, generate X feedback responses using a specialized model, which is then filtered for non-useful responses. Select Top-k of them based on some heuristic.
- Edit: For each of the N samples, along with their Top-k feedback responses, a specialized editor model incorporates the feedback by editing the base model’s response.
- Select: Finally, select the final response from the N feedback and edited responses produced by the pipeline with the select model.
This approach is more similar to a group working on a problem together, as opposed to a single person thinking through a problem for a long time.
As the other methods rely on verifiable problems (code, math, and logical reasoning) during their training or implementation, this method excels at solving open-ended problems that aren’t only about getting the right answer.
Next steps
With the rapid pace of advancement in models and techniques for creating business value, enterprises need to focus on time to market and polishing their features and techniques.
In this environment, solutions like NVIDIA Blueprints fast-track enterprises to build applications that enable their users. Your enterprise can ensure that you have the most efficient, secure and reliable infrastructure by using easy-to-use NVIDIA NIM.
Developers can get started today by downloading the latest NVIDIA Llama Nemotron models from Hugging Face or trying out the Build an AI Agent for Research and Reporting NVIDIA AI Blueprint.
To read more about LLM agents, see other blogs in this series: