Agentic AI / Generative AI

Add a Specialized Deep Research Skill to Agent Harnesses

The image depicts various digital screens showing concepts related to a "Skills Repository," "Software Architecture," "Big Data Schema," and "Training New Sub-Agent," suggesting a theme of self-evolving artificial intelligence capabilities.

Agent harnesses like Claude Code, Codex, and LangChain Deep Agents are excellent orchestrators. They manage sessions, chain tools, execute code, and respond to developer intent. But when these harnesses need to do deep research, such as multi-document synthesis, decision briefs backed by enterprise data, and long-horizon analysis with source attribution, the complexity of deep research shifts back onto the developer. 

Teams building these agents must ground them in enterprise data, connecting data sources, routing queries, managing authentication, tuning prompts, evaluating outputs, and preserving source attribution. NVIDIA AI-Q packages this work into an open-source deep research blueprint that can be exposed to agent harnesses as a portable agent skill. 

With this skill, an agent harness delegates a research task to a local or hosted AI-Q server and receives a structured report in return. The harness doesn’t need to own the research pipeline. Sensitive source data can remain inside the enterprise environment, which is critical in regulated industries such as healthcare, financial services, government, and defense.

What is the AI-Q skill?

The AI-Q skill enables Claude Code, Codex, or other general-purpose agents to submit a research task to a running AI-Q server and receive a well-formatted, detailed report with citations. The skill includes a SKILL.md file that tells the harness how to use AI-Q, plus a helper script that manages request routing, job submission, polling, and result retrieval.

A skill can mean different things in agent workflows. Agent skills guide the harness, the NVIDIA NeMo Agent Toolkit helps define reusable tool functions, and the AI-Q Agent Skill exposes the full research pipeline—including intent classification, clarification, shallow research, deep research, and evaluation—as a higher-level capability. Together, the agent delegates research without rebuilding retrieval, planning, synthesis, and citation logic inside each harness.

Video 1. CODEX agent delegating a multi-data-source research task to AI-Q as a skill

Installing the AI-Q agent skill

The packaged skill lives in the AI-Q GitHub repository at .agents/skills/aiq-research/, with SKILL.md at its root. The scripts/aiq.py helper handles routed /chat requests and manages the async deep research jobs for a running AI-Q server using http://localhost:8000 by default, which can be overriden with AIQ_SERVER_URL.

Prerequisites:

  • Python 3.10 or newer
  • A running AI-Q Blueprint server reachable, locally or hosted, from the harness

Claude Code

Claude Code loads repo-local skills from .claude/skills/. To ensure compatibility with Claude, manually link the AI-Q skill to a workspace using these two commands:

mkdir -p .claude/skills
ln -s ../../.agents/skills/aiq-research .claude/skills/aiq-research

For a user-level install that works across repos:

mkdir -p ~/.claude/skills
cp -R .agents/skills/aiq-research ~/.claude/skills/aiq-research

Codex

Drop the skill into the harness’s configured skills directory:

mkdir -p <codex-skills-dir>
cp -R .agents/skills/aiq-research <codex-skills-dir>/aiq-research

OpenCode

OpenCode loads user skills from ~/.config/opencode/skills/:

mkdir -p ~/.config/opencode/skills
cp -R .agents/skills/aiq-research ~/.config/opencode/skills/aiq-research

Restart the session, then verify with:

python3 scripts/aiq.py
# Usage: aiq.py <command> [args]

Note: This skill requires a running AI-Q server. Check out the Getting Started guide for detailed instructions on spinning one up, including instructions on how to obtain API keys for inference and web search services.

Once installed, the agent harness sees a single deep research capability. Phrases like “research the regulatory landscape for X across our internal policy docs and produce a memo” route through the skill, which submits a job to the AI-Q server, polls for completion, and returns a structured report with citations.

Secure MCP integration: AI-Q as an MCP client

The other half of the enterprise story is data access. This new release of AI-Q adds first-class support for connecting to authenticated MCP servers as data sources, so research pipelines can pull from the same enterprise systems agents already use, without standing up a parallel retrieval stack.

AI-Q is built on the NeMo Agent Toolkit, so MCP servers plug in as NeMo Agent Toolkit function groups. This release documents three integration patterns end-to-end:

ScenarioPattern
MCP server has no per-user authmcp_client function group
MCP server uses backend / app credentialsmcp_client + mcp_service_account
Downstream API trusts the AIQ user’s bearer tokenCustom AIQ tool using get_auth_token()
Table 1. Common MCP server authentication scenarios mapped to the AI-Q integration pattern connecting each server as a NeMo Agent Toolkit function group

Unauthenticated MCP servers are the simplest case. Point a mcp_client function group at the server URL, and AIQ discovers and registers the remote tools as NeMo Agent Toolkit functions:

function_groups:
  mcp_financial_tools:
    _type: mcp_client
    server:
      transport: streamable-http
      url: ${MCP_SERVER_URL:-http://localhost:9901/mcp}

Use streamable-http for new deployments. It’s required for protected MCP servers and recommended over sse for production auth scenarios.

Service-account MCP auth is the preferred pattern for CI, batch jobs, and shared enterprise data sources where access is governed at the application level rather than per user:

function_groups:
  mcp_enterprise_tools:
    _type: mcp_client
    server:
      transport: streamable-http
      url: ${ENTERPRISE_MCP_URL}
      auth_provider: enterprise_service_account

authentication:
  enterprise_service_account:
    _type: mcp_service_account
    client_id: ${SERVICE_ACCOUNT_CLIENT_ID}
    client_secret: ${SERVICE_ACCOUNT_CLIENT_SECRET}
    token_url: ${SERVICE_ACCOUNT_TOKEN_URL}
    scopes:
      - enterprise.read

For MCP servers that need both an OAuth2 service-account token and a service-specific delegation token, a service_token block adds a second header on the outbound call.

Forwarding the signed-in AIQ user’s identity is the current supported pattern when a downstream API or MCP gateway already trusts the AI-Q user’s bearer token. AI-Q exposes aiq_agent.auth.get_auth_token(). The request token is captured at job-submit time and restored inside async Dask workers, so long-running deep research jobs keep the user’s identity context as they execute. Tokens aren’t refreshed mid-job. An in-worker refresh is planned for the next release. Until then, jobs that outlive the access token’s TTL will fail on auth-required tool calls.

A researcher deployed where your data lives

This integration is most powerful when AI-Q runs in the same environment as enterprise data. The AI-Q Blueprint includes Docker Compose and Helm charts, meaning the exact same blueprint can run on a developer laptop, an on-premises or cloud-based Kubernetes cluster, or even an air-gapped data center.

For regulated industries, three deployment properties matter most:

  • The pipeline runs where the data is. AI-Q can read enterprise data, perform retrieval and synthesis, and create reports without raw documents leaving the controlled environment. This is critical for enterprises with data sovereignty requirements. The agent harness then receives the cited output, not direct access to the underlying sources.
  • Open models can be self-hosted, not just the pipeline. NVIDIA Nemotron open models can run on-prem as NVIDIA NIM, while cloud-based frontier models remain a fully configurable alternative. This enables teams to build flexible workflows: using a frontier model for complex orchestration and planning, routing sensitive research tasks to a self-hosted model, or disabling frontier models entirely to meet strict compliance needs.
  • Auditability is built into the pipeline, not bolted on after. AI-Q reports include source attribution, and NeMo Agent Toolkit emits OpenTelemetry traces. Compliance teams can inspect which sources were retrieved, how they were used, and how the final cited answer was produced.

For regulated teams, the practical implication is clear: an agent harness that shouldn’t directly access sensitive source data can still return research grounded in that data. AI-Q handles retrieval and synthesis inside the governed environment, while MCP authentication patterns preserve existing access controls.

A pipeline built for research, not adapted to it

Agent harnesses are designed around orchestration. When agents handle research without a dedicated research backend, a general-purpose agent or sub-agent changes a research workflow. This works for lookups but can produce inconsistent results on tasks requiring enterprise multi-source synthesis, long-horizon planning, or citation accuracy.

AI-Q’s pipeline is engineered specifically for research quality. Each query routes through four stages:

  • An intent classifier determines research depth.
  • A human-in-the-loop clarifier resolves ambiguity before retrieval begins.
  • A shallow researcher handles well-scoped, quick lookups.
  • A deep researcher handles long-horizon synthesis across enterprise data sources.

Each stage is independently tuned and evaluated using established benchmarks, FreshQA, Deep Research Bench, and DeepSearchQA.

AI-Q can use a hybrid model approach. Nemotron reasoning models handle planning and synthesis, while a configurable frontier-model router can be used for tasks that need additional capability. Teams can choose the model path that fits their cost, compliance, and performance requirements.

The same evaluation harnesses used for benchmark testing ship with the blueprint, so teams can measure quality on their own data. Reports also include source attribution, showing which sources were retrieved and how they contributed to the final answer.

This makes AI-Q a dedicated research backend for agent harnesses, rather than a general-purpose agent trying to assemble a research pipeline on the fly.

Get started

AI-Q is available as an open-source blueprint. Teams can add a reusable deep research capability to their agent harnesses today.

Get started by:

  • Spinning up the server: Head over to the AI-Q GitHub repository for quick-start deployment instructions using Docker Compose or Helm.
  • Connecting your data: Check out the official documentation on Adding a Data Source to securely hook up enterprise MCP servers.
  • Installing the skill: Run the setup commands detailed earlier in this post to link AI-Q to Claude Code, Codex, or OpenCode.

AI-Q is now validated on Dell AI Factory. For teams running on Dell infrastructure, the Dell-NVIDIA AI-Q 2.0 Reference Architecture, powered by the Dell AI Data Platform, packages the deployment patterns described above into a production-ready, on-premises multi-agent research workflow—purpose-built for regulated industries like financial services, public sector, and manufacturing.

Discuss (0)

Tags

Comments are closed.