Don’t ship AI Agents without this

Security, governance, and all the things you really need to think about before releasing AI Agents into production

May 05, 2025

There’s a kind of magic to watching an AI agent handle a workflow autonomously. A few weeks ago, I watched a demo where an agent took a product spec, created Jira tickets, generated a Slack update, and even then created a feature branch with a boilerplate in GitHub — all without a single click from a human. The whole thing took under 45 seconds.

It was fast. It was slick. It was terrifying.

Because the more autonomy we give these systems, the more we expose ourselves and our customers to a completely new class of risk. And let’s be honest: most of us are still figuring out how to build reliable human-owned systems. Giving software the freedom to act on its own, without the right safety checks, is like handing your intern the AWS root credentials and saying, “Try not to break anything.”

If you’re a product manager, engineering leader, or security partner evaluating the use of AI agents, here’s the uncomfortable truth:

You are not deploying a smarter chatbot. You’re deploying a decision-making system with access to real infrastructure, data, and users.

This post is my attempt to help you frame that decision the right way and avoid the mistakes so many teams are on the verge of making.

What is an AI Agent really?

Before we get into the complexities of governance and security, let’s start by clarifying something fundamental: What exactly is an "AI agent"?

If you’ve been following the AI space, you’ve probably heard the term thrown around a lot. But it’s worth pausing for a moment to define what we mean when we say “AI agent.”

At its core, an AI agent is a system that exhibits three key capabilities:

Observes: It can ingest and process external data i.e. anything from documents, APIs, to telemetry. It takes in the world around it.
Plans: Based on that data, it then makes decisions. It can use planning frameworks like ReAct or chain-of-thought to figure out what needs to happen next.
Acts: After processing and planning, it actually does something whether that’s writing a file, making an API call, or sending a Slack message.

But that’s just the basic setup. AI agents can also do a lot more:

Persist memory: This means storing information like facts or insights into a vector database, enabling them to "remember" things over time.
Use tools and plugins: Think integrations with platforms like GitHub, Jira, or Notion to improve the agent’s functionality & scope.
Operate in a loop: The agent can improve its outputs over time, revising and adjusting its actions until it hits the desired goal.

At this stage, we’re talking about something way more powerful than just summarizing meeting notes. These agents aren’t just passive observers. They have the potential to change how systems work, manage production environments, engage with customers, or even handle large-scale infrastructure.

And here’s the catch: When we’re working with AI agents, the old product checklist of "Does it pass QA?" is no longer enough. These are systems that aren’t just running code but they’re also making decisions, executing actions, and learning from the process.

Security - The invisible blast radius

One of the hardest parts about AI agent security is that the attack surface is... everything. It’s not just the obvious stuff like prompt injection or unauthorized access — though those are very real risks. It’s that agents, by design, are built to take action. Autonomously. That’s powerful, but it also means the potential impact of a mistake or an exploit can be massive.

Let’s try to walk through some of the big security questions we need to ask when building or integrating AI agents.

Who does the Agent think it is? (Identity & Permissions)

Most agents run under a service identity or API key. But here’s the problem: we often plug them in quickly to get things working, without slowing down to ask what permissions that key actually has.

I’ve made this mistake. I’ve seen others make it too:
Giving the agent full, org-wide permissions just to move fast.

It works. Until it doesn’t.

What we should be doing instead:

Apply the least privilege always: The agent should only access what it absolutely needs. Nothing more.
Scope access by project, environment, and action: Read-only access for dev? Full write for a single prod service? Be deliberate.
Create dedicated service accounts: Avoid shared keys or inherited roles. Make it clear who did what, and why.
Rotate credentials. Monitor usage: Keys age, scopes change, and agents evolve. Don’t let access go stale.

What can it remember and where? (Data Storage & Leakage)

One of the most exciting (and risky) trends in agent design is persistent memory. Whether it’s storing documents, embeddings, or structured logs the agent’s ability to “remember” past interactions can be super helpful for personalization and efficiency.

But here’s the grey area:

That memory can become a liability fast.
It only takes one misstep for an agent to absorb sensitive info (say a customer's personal information or a secret) and then accidentally resurface it in the wrong context.

Here’s what you need to ask yourself:

Is the agent’s memory actually secure? That means encrypted both at rest and in transit with no shortcuts.
Do you have retention policies in place? How long should this memory live? What happens after that? Define it upfront.
Can you inspect or purge what it remembers? If something sensitive gets stored, can you find it and remove it quickly?
Are your vector databases treated like real data stores? Spoiler: they are real data stores. That means:
- Schema discipline
- Role-based access control
- Logging and audit trails
- Avoiding wild-west dumps of unstructured data

What happens when things go wrong? (Isolation & Fail-Safes)

Let’s be honest AI agents don’t always fail gracefully.
They don’t throw neat 500 errors or crash with a stack trace. Instead, they might:

Get stuck in an infinite loop
Hammer your CI/CD pipeline with nonsense requests
Auto-close a critical customer escalation based on a bad assumption

These aren’t traditional bugs. They’re breakdowns in reasoning. And with agents, they’re not rare.

So, how do you protect your system (and your team) when the agent goes off the rails?

Rate-limit everything. Whether the agent is hitting APIs, running jobs, or triggering workflows impose strict QPS and concurrency limits. This should be enforced at the gateway or orchestrator level.
Use sandbox environments for testing. Use sandboxed namespaces, containers, or virtual projects with scoped credentials. Never give an experimental agent write access to production. You should be treating it like an intern on day one.
Set timeouts and retry limits. If an agent is looping through a planner or calling external services repeatedly, use circuit breakers and fail-fast logic. No task should be allowed to consume unbounded resources.
Add human-in-the-loop gates. Block sensitive or irreversible actions unless a person explicitly approves. You can choose to use policy engines (e.g. OPA, AWS IAM conditions) or workflow tools (e.g. Argo, Airflow) to require human sign-off before performing destructive actions.
Log everything. Full traceability of decisions, inputs, outputs, and side effects isn’t optional. You’ll need it the moment something misfires.

Governance: Making the invisible visible

Security is about prevention. Governance is about explanation.
It’s what helps you answer the question everyone asks after something goes wrong:

“Why did this happen?”

And if your team can't answer that, then your system isn't just broken it's ungovernable.

This is the part a lot of teams skip in the rush to ship. And they always regret it later, usually when leadership, legal, or a customer is asking for answers you can’t give.

Can you explain the Agent’s behavior? (Observability)

The most basic governance question is: Why did the agent do that?

If you can’t answer that question, your system is effectively ungovernable.

You need:

Full logs of every input, output, and tool/action invocation. That includes prompts, API calls, function executions, and any state changes the agent made. Think audit trail but for intent and action.
Traces of the reasoning path especially for multi-step agents using planning or chain-of-thought logic. You should be able to replay how the agent arrived at a decision even if it was the wrong one.
Dashboards that non-engineers can use. Ops, support, and compliance teams shouldn’t need to read JSON logs or trace vector math. Give them timelines, decision trees, and search filters they can actually use.

Put differently: your compliance team should be able to reconstruct an incident without knowing what a vector embedding is.

Do you know when to get a human involved? (Autonomy Modes)

Autonomy is not binary. I recommend thinking in levels:

Level 0 – Manual: Human does everything. The agent observes, maybe logs.
Level 1 – Assistive: Agent suggests, but a human approves. Think copilots, draft generators, or PR suggestions.
Level 2 – Guarded Autonomy: Agent can take action, but only within strict boundaries. Maybe it can restart a service or reassign a ticket, but not delete anything or change customer data.
Level 3 – Full Autonomy: Agent operates independently. It acts, logs, and you review the impact later.

Production environments should start at Level 1. Graduate to Level 2 only after months of observation and incident-free behavior. Treat Level 3 like launching a rocket. If it fails, there’s no undo. You need confidence, containment, and a whole lot of telemetry.

Are You Auditing and Improving Over Time?

Governance isn’t a checkbox you tick once and forget. It’s a feedback loop and it only works if you commit to running it, consistently.

Here’s the basic lifecycle:

The agent takes action
You log what happened
You review and audit those actions
You improve the agent or adjust its constraints

Skip that last step, and your agent will drift from “safe” to “unpredictable” faster than you think.

What should this look like in practice?

Immutable audit logs, stored externally. Don’t rely on the agent’s internal memory or runtime to preserve critical history. Store logs in a secure, centralized system something your security team already trusts.
Behavioral drift detection. Over time, agents can change due to upstream LLM updates, data shifts, or configuration changes. You need a way to track deviations in how the agent behaves across similar tasks before it causes a regression.
Scheduled governance reviews. Block time (monthly or quarterly) to sit down with security, product, and engineering. Review incidents, trends, and any “weird” agent behavior. This is where you decide if it’s time to level up autonomy or roll it back.

The pre-prod checklist you should be using

If you’re serious about evaluating an AI agent before production, here’s a checklist I’ve used internally:

Closing thoughts

It’s easy to move fast especially when the prototype works and the results feel magical. But moving fast doesn’t protect production systems. Discipline does.

Here’s what’s worth remembering:

An AI agent isn’t just a tool. It’s a system.
It’s like hiring a new team member, one with no ethics, no intuition, and no context.
It just has access. And it moves fast.

That’s power. But it needs boundaries.

So instead of asking, “Can we automate this?”
Ask “Should this run unattended?” And if yes, why now?

When you do ship, make sure the agent is aligned, observable, and well-contained. Then let it run.

Because responsible autonomy isn’t about saying “no.”
It’s about saying “not yet” until you’re sure what you’re trusting it to do.

Curious how other orgs are building secure AI agents? I’m compiling case studies for a future post. Subscribe or drop me a note if you want to share your own lessons.

— Dhrubajyoti
Product @ Harness | Thinking about agent infrastructure, developer experience, and LLM applications.

Dhrubajyoti’s Newsletter