AI Agent Secrets Management in Production

A platform team had built a production AI agent that could file support tickets, update CRM records, send emails, and post in Slack. It worked. Then someone asked: “Whose credentials is the agent using?” The honest answer was “the same service account everyone shares, with a long-lived API key in an environment variable, scoped to everything the agent might ever want to do.” Three different agents on the team used the same credential. None of them could be individually revoked. None of them had an audit trail anyone could trust. None of them stopped being authorized when the engineer who created them left.

This is the default state of AI agent credentials in 2026. Most production agents authenticate the way scripts authenticated in 2014: long-lived secrets, broad scopes, shared identities, no per-action context. Gartner predicts that 40% of enterprise applications will include task-specific AI agents by the end of 2026, up from less than 5% in 2025. That is a non-human identity explosion the existing toolchain was not built for.

This guide is for engineering teams putting AI agents into production. It covers how to structure a secrets vault for agents, when to use OAuth versus API keys, how to provision credentials just-in-time, and how to implement RBAC for non-human identities. It is part of the broader question of why your AI experiments are failing — credential design is one of the layers of the system underneath the chat box that determines whether your agents are an asset or a liability.

Why AI Agent Credentials Are Different

AI agents break three assumptions baked into traditional secrets management.

Agents are not predictable consumers. A web service that calls Stripe calls Stripe — every time, for the same reason. An AI agent might call Stripe today and the CRM tomorrow, depending on what a user asks. You cannot pre-issue exactly the right scoped credential because you do not know which actions the agent will need to take until the model decides.

Agents act on behalf of humans and as themselves. A scheduling agent might use a user’s OAuth token to read their Google Calendar, then use its own service identity to write to an internal logistics system. Both authentications happen in the same request. The credential model has to support both delegated and non-human authority.

Agents are easy to spawn. Engineers create new agents the way they create new Lambda functions. If every new agent requires a security review and a new credential, the security review becomes the bottleneck and engineers route around it by sharing existing credentials. The credential system has to be built for high agent throughput from the start.

The result is that traditional secrets managers — HashiCorp Vault, AWS Secrets Manager, CyberArk Conjur — are necessary but not sufficient. They handle storage and rotation well. They were not designed for dynamic, per-request authorization decisions about non-human identities making variable downstream calls. As Aembit’s analysis of agentic AI secrets management puts it, dynamic access management — not traditional vaults — is the job for agentic AI credentials.

Structuring the Vault: One Identity Per Agent

The first design decision is also the most violated rule in production: every agent instance gets its own identity, and identities are not shared across agents.

What “instance” means depends on architecture. A reasonable definition:

One identity per agent class. The “support triage agent” and the “billing reconciliation agent” are different identities even if they share code.
One identity per deployment environment. Dev, staging, and production each get distinct identities. Cross-environment credential reuse is how staging incidents become production incidents.
Optionally, one identity per tenant. In multi-tenant systems, the agent acting for Tenant A is a different non-human identity than the agent acting for Tenant B, so that revoking one tenant does not affect the others. This couples to your multi-tenant AI application architecture directly.

The point is operational, not theoretical. When something goes wrong, you need to revoke the smallest possible blast radius. If five agents share one credential, revoking it takes all five offline. If each has its own, you revoke one.

Inside the vault, structure agent secrets hierarchically rather than as a flat namespace:

secrets/
  agents/
    support-triage/
      production/
        oauth/zendesk
        oauth/slack
        api/internal-knowledge-base
      staging/...
    billing-reconciliation/
      production/...

This makes blast-radius reasoning explicit. A compromise of one agent at one stage in one environment is a clearly bounded set of credentials. Flat key-value naming makes this reasoning impossible.

A few vault patterns that hold up in production:

Short lease times for everything. Default to minutes, not days. The cost is more renewals; the benefit is that a leaked credential expires before an attacker can move laterally.
Audit trail on read, not just write. Every credential read by an agent is logged with the identity that read it, the time, and the calling context. Reads without context are unauditable.
Programmatic-only access for agent identities. Agents should not have console access to the vault. Humans rotating secrets should not use agent identities.
Separate vaults or namespaces for human and non-human identities. Human credentials are managed under one set of policies; agent credentials under another. Mixing them produces escalation paths.

OAuth vs API Keys: When to Use Which

The most consequential per-integration choice is OAuth vs API key. The decision is not “OAuth is more secure” — both can be secure. The decision is whose authority the agent is acting under.

Use OAuth when the agent acts on behalf of a specific human

The user delegated authority to the agent, and the downstream system (Google Workspace, Salesforce, GitHub, Slack as a user) needs to see that delegation. OAuth 2.1 with delegated scopes is the right answer. The agent receives a short-lived access token bound to the user, refreshes it via a refresh token held in the vault, and can have its authorization revoked by the user without affecting other users.

What this requires:

A per-user token store, scoped by user identity and by the downstream service.
A refresh flow the agent can execute without human interaction.
Scope minimization: ask for the smallest scope the agent actually needs, not the largest one the integration offers.
Token revocation handling: when a token is revoked, the agent fails gracefully and surfaces a re-authorization request, not a 500 error.

Use API keys (or service credentials) when the agent acts as itself

The agent is calling an internal service, a B2B API, or a downstream system that does not represent any individual user. The credential is the agent’s own non-human identity. API keys, service account tokens, signed JWTs, or mTLS client certs all work — the principle is the same: the credential represents the agent, not a user, and is scoped to the agent’s purpose.

When you use API keys for agents, the rules tighten:

No shared keys. One key per agent identity. Shared keys cannot be individually revoked.
Scoped to the action, not the system. “Read tickets in the support queue” not “full Zendesk admin.”
Rotatable without code change. The vault provides the current key; the agent reads it on demand. Embedded keys in code or environment variables are how breaches persist for years.

Use both, at the same time, when appropriate

The same agent execution can use a user’s OAuth token for one tool call and its own service credential for the next. This is normal. The credential resolver has to make the distinction per call, based on what the tool needs.

The Default-Deny Rule for Agent Permissions

The single most important policy decision for agent credentials: a tool is unavailable to an agent until its credentials, scope, and authorization are explicitly granted. The opposite default — agents inherit whatever the service account can do — is how a support agent ends up able to delete production database tables. Default-deny at the credential resolver, and grant tools per agent identity, per environment.

Just-In-Time Credential Provisioning

The pattern that holds up under agentic workloads is just-in-time (JIT) credentials: secrets that exist only for the duration of a single task or session, issued at the moment of need and expired immediately after.

A JIT flow looks like this:

The agent decides it needs to call Tool X.
The credential resolver checks: is this agent identity authorized to call Tool X right now, given its context (tenant, user, environment, action)?
If yes, the resolver mints a short-lived credential (a signed token, a temporary API key, a one-time secret) and returns it.
The agent makes the call with the credential.
The credential expires automatically — measured in seconds or minutes, not days.

The benefits compound:

A leaked credential has minutes of useful life, not months.
Every credential issuance is an authorization decision that can be logged, monitored, and revoked.
The set of valid credentials at any moment is small and bounded.
Compliance auditors can answer “what was this agent authorized to do at 14:32?” — because the answer was a discrete decision recorded at 14:32.

The challenge is that not every downstream service supports short-lived credentials. Internal services you control should — use signed JWTs with minutes-long expiries. SaaS APIs often only support long-lived API keys. The pragmatic pattern is to store the long-lived key in the vault but issue short-lived internal tokens to agents, with the vault as the trust broker between the long-lived secret and the agent’s request.

This pattern aligns with broader 2026 guidance from frameworks including SOC 2, ISO/IEC 27001:2022, ISO/IEC 42001:2023, NIST AI RMF, NIST SP 800-207 Zero Trust, and the OWASP Top 10 for Agentic Applications, all of which converge on dynamic, context-aware authorization for AI agents.

RBAC for Agents: What Role-Based Access Looks Like for Non-Human Identities

RBAC was designed for humans. The roles read like job titles. Translating it to AI agents requires a few adaptations.

Roles bind to agent identities, not user identities

The role is attached to the agent class (or the agent instance), not to the engineer who deployed it. When the engineer leaves the company, the agent’s role does not change. This is the opposite of how a lot of teams operate today, where “the agent uses my personal credentials” is a common shortcut.

Roles are scoped to actions, not systems

“Support Agent” is a system-shaped role. “Read open tickets, post replies, escalate to human” is an action-shaped role. The latter is enforceable; the former is decorative. Agent RBAC works when it lists the verbs the agent is authorized to perform, not the systems it can touch.

Roles compose with tenant context

In multi-tenant deployments, the role grants action X for this tenant’s data only. A support agent authorized to read tickets is authorized to read this tenant’s tickets, not all tickets. This requires the authorization layer to combine the agent identity, the tenant context, and the action into a single decision.

Roles change over the agent’s lifetime

A new agent in pilot might have a narrow role with a human approval gate on writes. A mature agent might have its writes auto-approved. The role is versioned. Promotions are reviewed. Demotions are immediate.

Roles include a kill switch per tool

Every role should be revocable to a specific tool without taking the agent offline. When the email integration is misbehaving, the role grants “no email” without disabling the rest. Coarse-grained roles produce coarse-grained incident response.

Build Production-Grade Agent Identity From the Start

Agent credentials are a non-human identity problem the existing toolchain was not built for. Talk with our team about structuring vaults, OAuth vs API keys, just-in-time provisioning, and RBAC for AI agents — before you have hundreds of agents sharing one long-lived key.

Common Failure Modes and the Design Responses

A short list of agent credential failures we see repeatedly, and what to do instead.

Long-lived API keys in environment variables. The agent reads OPENAI_API_KEY and STRIPE_KEY from process.env. Rotation requires a redeploy. Auditing is impossible. Replace with vault-backed JIT credentials that the agent fetches on demand.

One service account for all agents. Every agent authenticates as agents@company.internal. No revocation granularity, no audit trail, no blast-radius control. Replace with per-agent identities.

OAuth tokens stored in the agent’s working memory. The agent receives a user’s OAuth token and keeps it in a variable that lives for the session. If the agent crashes and restarts, the token survives in logs or state stores. Replace with token storage in a per-user encrypted store the agent fetches from each call.

Tool credentials granted by default. New tools attached to an agent inherit “use the agent’s full scope.” A new file-system tool ends up with database credentials. Replace with default-deny: each tool requires explicit credential grants per agent.

Human credentials reused for agents. An engineer’s GitHub personal access token becomes the agent’s credential. When the engineer leaves, the agent stops working — or worse, the credential survives and is reused. Replace with non-human identities issued from the start.

No audit trail on credential use. Vault reads are not logged with calling context. When investigating an incident, you cannot tell which agent used which credential when. Replace with structured audit logs that capture identity, timestamp, calling context, and scope.

Credentials shared across environments. The same Stripe key works in dev, staging, and production. A bug in dev that “tests against real Stripe” charges real customers. Replace with environment-scoped identities and credentials.

How Agent Credentials Connect to the Rest of the Production AI Stack

Agent identity is not a standalone problem. It sits at the intersection of several other production AI concerns.

MCP servers. The Model Context Protocol is rapidly becoming the standard way agents expose tools. MCP servers need their own credential model — both for the agent authenticating to the MCP server, and for the MCP server authenticating to the downstream system on the agent’s behalf. We cover this in building MCP servers for production AI agents.
Multi-tenancy. The tenant identity and the agent identity must both flow to every downstream call, and the authorization decision combines both. Without explicit propagation, you get cross-tenant leaks; without explicit per-tenant credentials, you cannot revoke a tenant without revoking everyone. See our guide to multi-tenant AI application architecture.
Audit and compliance. Regulated industries require provable answers to “what was this agent authorized to do, by whom, at what time, for what purpose?” Just-in-time credentials make those answers possible. Long-lived keys make them impossible.
Cost and rate-limit attribution. When agents share a credential, you cannot tell which agent spent which dollar. Per-agent identities are a prerequisite for cost attribution as much as for security. The companion problem of provider quotas and per-tenant token budgets is covered in LLM rate limiting and token quotas in production.

What Good Looks Like

A production-ready agent credential architecture has these properties:

Every agent instance has its own non-human identity, recorded in a registry, with an owner and a purpose.
No agent uses a long-lived API key embedded in code or environment variables.
The credential resolver decides per call whether to use OAuth (delegated authority) or a service credential (the agent’s own authority).
Credentials are issued just-in-time with minute-scale expirations.
Tools are default-deny; an agent cannot call a tool until its credentials, scope, and authorization are explicitly granted.
Audit logs capture every credential issuance with identity, time, scope, and calling context.
Per-tool kill switches let on-call revoke a single capability without taking the agent offline.
Agent roles are versioned, reviewed on promotion, and immediate on demotion.

This is one layer of the system underneath the chat box — the gap between the prompt and the product. It is rarely the layer demos focus on, which is exactly why it is the layer that quietly breaks first in production. Helping teams build it correctly is part of what we do as Operational AI.

Frequently Asked Questions

Can I use HashiCorp Vault or AWS Secrets Manager for AI agent credentials?

Yes, but not alone. Traditional vaults handle secure storage and rotation well. They were not designed for dynamic, per-request authorization decisions about non-human identities making variable downstream calls. The pragmatic pattern is to use a vault as the trust broker for long-lived underlying secrets, but issue short-lived just-in-time tokens to agents at request time. The vault is necessary; it is not sufficient.

When should I use OAuth versus API keys for an AI agent?

Use OAuth when the agent acts on behalf of a specific human user — the downstream system needs to see the delegation, and the user needs to be able to revoke the agent's access. Use API keys or service credentials when the agent acts as itself, calling internal systems or B2B APIs that do not represent any individual user. Many agents need both: a user's OAuth token for one tool call, the agent's own service credential for the next.

What is a non-human identity for an AI agent?

A non-human identity (NHI) is a distinct, named identity that represents the agent itself — not the engineer who deployed it, not a shared service account, not a generic 'agents' principal. It has its own credentials, its own audit trail, its own role assignments, and its own lifecycle. Treating each agent as a distinct NHI is what makes per-agent revocation, fine-grained audit, and blast-radius control possible.

How short should just-in-time credentials live?

Minutes, not days, with seconds preferred where the downstream service supports it. The benefit of JIT credentials scales inversely with their lifetime: a five-minute token leaked at 14:00 is useless by 14:10; a one-month token leaked at the start of the month is dangerous for thirty days. Set the default to the shortest expiry the downstream system tolerates without breaking the agent's workflow.

How do I implement RBAC for AI agents in practice?

Bind roles to agent identities (not to the engineers who created them). Define roles as the verbs the agent is authorized to perform, not the systems it can touch. Combine the agent identity with tenant context at the authorization decision. Version roles and review them on promotion. Build per-tool kill switches so you can revoke one capability without taking the agent offline. Default-deny: a tool is unavailable to an agent until explicitly granted.

What is the worst common mistake in AI agent secrets management?

Sharing one long-lived API key across multiple agents in environment variables. It is fast to set up, impossible to audit, impossible to revoke granularly, and impossible to attribute. When something goes wrong, you cannot tell which agent did it, you cannot rotate without taking everything down, and you cannot prove to a compliance auditor what was authorized. Replace it with per-agent non-human identities and just-in-time credentials before you have more than a handful of agents in production.

AI Agent Secrets Management: Credentials in Production