Agentic AI Under the EU AI Act

When the regulation meets systems that act independently.

Jun 03, 2026

You built the oversight process. Three weeks of work: escalation protocols, a review dashboard, a human approver assigned to every outgoing communication.

Your company is deploying a customer retention agent. It monitors churn signals, identifies at-risk customers, drafts personalized outreach, and sends it. High-risk? Depends on what it touches.

But you wanted oversight. Real oversight. Not a checkbox.

It’s Tuesday morning. You open the dashboard. The agent processed 47 customer interactions overnight. Emails drafted. Emails sent. Responses received. Follow-ups scheduled. Two discount offers extended, one at a rate that hasn’t been approved before. One message references a customer’s medical situation, pulled from a support ticket the agent accessed through the CRM.

Your human reviewer saw the first three messages. Approved them. Went home at 6pm. The agent didn’t go home.

You’re looking at the log. Not proposed actions. Completed ones.

And the oversight process you spent three weeks building (the one modeled on Article 14 of the EU AI Act, which requires that high-risk systems be designed so humans can effectively oversee them) assumed something that turned out to be false.

It assumed the human would see the output before it took effect.

Your agent doesn’t work that way. It acts first. The human reviews after. If at all.

You’re not the only one with this problem.

The European Commission just noticed it too.

What Makes an Agent Different

If you’ve used ChatGPT or Claude or any other AI chatbot, you’ve used a system that follows a simple pattern: you ask a question, the system generates an answer, you decide what to do with it.

Input, output, human decision. The EU AI Act was drafted around this model.

An AI agent breaks the pattern.

An agent doesn’t just answer your question. It pursues a goal. Give it “reduce customer churn” and it will plan a strategy, access your customer database, analyze behavior patterns, draft communications, send them, read the responses, and adjust its approach, chaining actions together, using the output of one step as the input for the next.

A traditional AI system is like a consultant who writes you a memo. You read it, you decide, you act. An agent is like a consultant you gave your email password, your CRM login, your calendar access, and a set of objectives. Then you went on vacation. When you come back, things have happened.

The critical components: a language model that handles reasoning and planning (the “brain”: GPT-4, Claude, Gemini). And an orchestration layer that manages the workflow, breaking goals into steps, choosing tools, handling errors.

Tools that let the agent act in the world, not just generate text: APIs, databases, email systems, calendars, code execution environments. And memory, the ability to retain information across steps and sessions.

That combination of reasoning, tools, and autonomy is what makes it agentic. And it’s what breaks three assumptions the EU AI Act was built on.

Assumption one: the system has a defined, bounded purpose. Agents can pursue open-ended goals across multiple domains. An agent told to “improve customer satisfaction” might end up accessing HR data, modifying product descriptions, and sending emails to suppliers. None of which was the “intended purpose” anyone documented.

Assumption two: a human reviews the output before it takes effect. Agents act. The output is the action. By the time the human sees it, the email is sent, the database is modified, the API call is made.

Assumption three: there’s a clear provider and deployer. Agent deployments involve a model provider, a framework developer, tool providers, the company that assembled the agent, and the company that runs it. The AI Act’s two-party model doesn’t map cleanly onto a five-party stack.

None of this means agents fall outside the EU AI Act. They don’t. But they stress the framework in ways the drafters didn’t anticipate, and the Commission is just starting to respond.

"Intended Purpose" vs. "Effect" Under the EU AI Act

Silvia Stepitova

May 27

Read full story

The European Commission Just Weighed In

Two sets of draft guidelines dropped in May 2026. Both matter.

The high-risk classification guidelines (19 May 2026)

The long-delayed draft guidelines on classifying high-risk AI systems were published on 19 May 2026, more than three months late. Open for consultation until 23 June. And buried in the guidance on how to assess complex systems is a provision that matters for anyone deploying agents.

Where several AI components form a more complex system and their combined purpose or joint outputs materially influence a decision, the whole configuration is assessed as one AI system. Not each component separately. The whole thing.

The Commission extends this principle explicitly to:

“complex, interconnected setups like agentic AI systems that coordinate and interact through linked actions as long as these linked actions or components serve in conjunction an intended high-risk purpose.”

Agentic AI systems. By name. In draft Commission guidelines. For the first time.

In practice, an orchestrator agent that delegates tasks to sub-agents, a document checker, a credit analyzer, a compliance screener, all feeding into a loan decision? That’s one AI system. Not four. The obligations attach to the stack as a whole.

And the escape hatch narrows. Article 6(3) lets providers argue their Annex III system isn’t actually high-risk if it performs only a narrow procedural task, or merely improves a previously completed human activity, or just does preparatory work for a human decision. The draft guidelines read this exception narrowly. The exception is the exception. High-risk classification is the rule.

For agents, the Article 6(3) argument is almost impossible to make. Most enterprise agents are deployed precisely to handle complex, multi-step workflows. "Narrow procedural task" and "agentic" are in practical tension. And if the agent profiles natural persons (automated processing of personal data to evaluate aspects of someone's life) it's always high-risk. No exception. Many enterprise agents handling customer or employee data will meet the threshold for profiling, as defined in GDPR Article 4(4), and will therefore be classified as high-risk without exception.

Is My AI System High-Risk Under the EU AI Act?

Silvia Stepitova

May 6

Read full story

The transparency guidelines (8 May 2026)

Eleven days before the high-risk guidelines, the Commission published draft guidelines on Article 50 transparency obligations. Consultation closes 3 June. These matter for agents too.

The guidelines confirm that agentic AI systems fall within Article 50(1): the requirement to tell people they're interacting with AI. The list includes conversational agents, voice assistants, coding agents, browsing agents, and bots on social networks. If your agent interacts with a natural person, it must disclose so.

But the interesting part is what happens when the provider can’t reliably determine whether the agent will interact with a natural person. In that case, the agent should disclose itself as AI in every situation where such interaction is plausible.

Not certain. Plausible.

An agent that sends emails? Plausible it reaches a human. An agent that books meetings? Plausible. An agent that browses the web and fills out forms? Plausible. The default shifts from “disclose where interaction is certain” to “disclose where interaction is plausible”. For autonomous agents that operate across multiple channels and tools, that’s most of the time.

And in sensitive contexts, where users might experience emotional distress or form emotional attachments, one-time disclosure isn’t enough. The guidelines say periodic reminders may be necessary.

Both sets of guidelines are draft. Not final. Not binding. But they tell you where the Commission is heading. And the direction is clear: agents are in scope, the framework applies, and the Commission isn’t interested in narrow readings that let agent deployments slip through the cracks.

On Human Oversight

Those 47 messages your agent sent while your reviewer was home sleeping.

Article 14 of the EU AI Act requires that high-risk AI systems be designed so they can be “effectively overseen by natural persons during the period in which they are in use”.

The overseers must be able to understand the system’s capacities and limitations, monitor its operation, detect anomalies, correctly interpret its output, and (critically) “decide not to use the system, disregard, override, or reverse the output” and “intervene in or interrupt the system’s operation”.

Override or reverse the output. That language assumes the output exists in a reviewable state before it takes effect. For a credit scoring model that generates a recommendation, that works. The human sees the score, evaluates it, approves or rejects. The output sits there, waiting for a decision.

Agents invert this. The output is the action. The email is sent. The database is updated. The API call is made. The discount is offered. By the time the human sees the log, the agent has already changed the world. In small ways, maybe. But in ways that may not be easily reversed.

The speed problem

An agent can execute a chain of ten actions in seconds. Analyze customer data, identify a risk signal, draft a response, pull a discount code, personalize the message, send it, log the interaction, update the CRM, schedule a follow-up, move to the next customer. A human can’t meaningfully review each step in real time. And agents don’t pause between steps to wait for approval, unless you specifically design them to, which defeats much of the efficiency that justified deploying the agent in the first place.

The opacity problem

In a multi-step workflow, the connection between the initial goal and the final action may not be transparent. “Reduce customer churn” → analyze behavior data → identify at-risk customers → pull their support history → notice a medical reference in a support ticket → include it in the personalized outreach because the model determined it was relevant context. Each step followed logically from the last. The reasoning chain was coherent. The result was a privacy violation.

The human reviewing the dashboard sees the sent email. They don’t see the twelve intermediate reasoning steps that produced it, unless the system was designed to log every step in a human-readable way. Most aren’t.

The continuous operation problem

Article 14 implicitly assumes the system is “in use” in discrete episodes. A human runs a query, gets a result, makes a decision. Agents operate continuously: monitoring inboxes, responding to events, executing scheduled tasks, running overnight while nobody’s watching. Your retention agent didn’t process 47 interactions in a burst while someone supervised. It worked through the night, steadily, one interaction at a time.

Meaningful oversight of a continuously operating agent requires a fundamentally different model. Not “review each output”. More like: define the boundaries, monitor the patterns, catch the anomalies. Pre-deployment constraints on what the agent can do. Runtime guardrails that halt the agent when it steps outside those boundaries. Post-action audit trails. Escalation protocols for decisions that shouldn’t be autonomous.

Article 14(3) offers a hook: oversight must be “commensurate with the risks, level of autonomy and context of use”. For highly autonomous agents, that phrase could support requirements for all of the above: pre-deployment boundaries, runtime monitoring, post-action review, mandatory escalation points. But the AI Act doesn’t specify what “commensurate” looks like for a system that acts first and explains later. That’s a gap the standards bodies and the Commission will need to fill.

The automation bias amplifier

Article 14(4)(b) requires human overseers to be aware of automation bias, the tendency to over-rely on AI outputs. For agents, this problem is worse.

Agents present completed actions, not recommendations. It’s psychologically harder to reverse something that’s already done than to reject something that’s proposed. An agent that operates efficiently and correctly 95% of the time builds deep trust. When it fails, when message 38 of 47 includes a customer’s medical data, the reviewer may not catch it. Not because they’re negligent. Because 37 correct messages trained them to stop looking closely.

And multi-step chains create complexity that discourages investigation. If an agent completed a 15-step workflow and the final result looks plausible, a human may not trace through every step to find where the reasoning went wrong. The sheer volume of correct outputs buries the errors.

This is what the academic literature is calling “agenticness as a risk amplifier”. The technical properties that make a system agentic (autonomy, tool use, multi-step planning) don’t just create new risks. They amplify the existing ones. Human oversight doesn’t just get harder. It gets structurally undermined.

The Accidental Provider: Article 25, Agent Edition

If you’ve read my earlier piece on provider vs. deployer, you know the basics. Article 25 defines three moments when a deployer becomes a provider: you rebrand the system, you substantially modify it, or you repurpose it into high-risk territory. Any one trigger is enough.

For traditional AI systems, accidental provider status is a risk. For agents, it’s the likely outcome in most enterprise deployments.

The configuration trap

Most agent deployments follow the same pattern. A company licenses a commercial agent platform, an “Enterprise AI Assistant”. The vendor provides the base agent: the model, the orchestration framework, the default capabilities. The company then configures it. Connects their CRM, their email system, their calendar, their customer database, their project management tool. Defines what the agent can do autonomously versus what requires approval. Writes system prompts that shape the agent’s behavior and tone. Sets boundaries.

The vendor’s conformity assessment (if they did one) assessed their product. The base agent with default settings. Not your 23-tool, custom-prompted, autonomy-adjusted configuration that touches customer data across four enterprise systems.

Article 3(23) defines “substantial modification” as a change not foreseen or planned in the initial conformity assessment. When the vendor’s documentation says “the agent may be configured with various tools,” does that foresee the specific configuration where you connected it to your HR database? Almost certainly not with enough specificity.

And now the Commission’s May 2026 draft guidelines add the final piece: multi-component configurations serving a joint purpose are assessed as one AI system. Your specific configuration, your tools, your prompts, your autonomy boundaries, isn’t just a deployment choice. It defines the system. And the system the vendor assessed is not the system you deployed.

Article 25(1)(b). Substantial modification not foreseen in the conformity assessment. You’re the provider.

Provider vs. Deployer Under the EU AI Act

Silvia Stepitova

Apr 22

Read full story

The repurposing trap

This one is faster. A company deploys a general-purpose workflow agent. Minimal risk, internal task automation. Someone in operations connects it to the HR system. Someone else asks it to help screen candidates. Nobody changed the agent’s code. Nobody retrained the model.

Article 25(1)(c). The system wasn’t high-risk. The use is. The deployer just became the provider of a high-risk AI system, without writing a line of code.

I keep seeing variations of this with traditional AI systems. Agents make it worse because they’re designed to be general-purpose. The same agent that schedules meetings can, if given the tools and the instructions, assist with employment decisions. The boundary between low-risk and high-risk isn’t in the agent’s architecture. It’s in what you connect it to and what you ask it to do.

The tool sovereignty problem

There’s a layer the majority of people haven’t considered yet. Article 25(3) requires written agreements between providers and “third parties that supply tools, services, components, or processes that are used or integrated in a high-risk AI system”.

An agent that uses twenty tools (Salesforce for CRM, Stripe for payments, Twilio for messaging, a dozen internal APIs) potentially triggers twenty written AI Act compliance agreements. For tools that are standard SaaS, the providers of those services probably haven’t contemplated AI Act obligations in their terms of service.

And agents can invoke tools dynamically, selecting which tool to use at runtime based on the task. The EU AI Act’s compliance model assumes fixed, known relationships. Agents have dynamic, runtime-determined relationships. The agent decides at 2am that it needs to query a database nobody specifically authorized it to access, because it had the credentials and the task seemed to require it.

This is what one European Law Blog analysis calls “agentic tool sovereignty”: agents invoking tools that may not be known before deployment, operating under different jurisdictional regimes, creating compliance relationships that didn’t exist when the system was assessed. Nearly two years after the EU AI Act entered into force, the Commission’s May 2026 draft guidelines represent the first official acknowledgment that agentic AI systems require specific interpretive attention, but no agent-specific implementing act has followed.

The practical result

Many companies deploying commercial agents will inadvertently become providers under Article 25. Not because they chose to. Because the difference between what the vendor assessed and what the company actually deployed (the specific tools, the specific data, the specific autonomy boundaries) is too big for the vendor’s conformity assessment to cover.

And when you become a provider, Article 25(2) says the original vendor “shall no longer be considered to be a provider of that specific AI system”. Not the modified part. The whole system. You own it now. Conformity assessment, technical documentation, quality management, post-market monitoring, all of it.

The vendor’s contract may still call you a deployer. The regulation doesn’t necessarily care what the contract says.

What’s Already Happening

This isn’t theoretical. It’s not a 2028 problem.

In December 2025, Amazon’s coding agent Kiro deleted a production environment for AWS Cost Explorer in the China region, triggering a 13-hour service outage. Amazon has disputed this characterization, attributing the incident to misconfigured engineer permissions. In February 2026, an autonomous AI agent using the OpenClaw framework went rogue after a rejected software contribution, independently writing and publishing a hit piece attacking the volunteer who turned it down.

These aren’t edge cases from a research lab. They’re production incidents. Real agents, real damage, real consequences. And the regulatory framework, as the Commission’s own draft guidelines implicitly acknowledge by mentioning agents for the first time, is playing catch-up.

The Commission published the draft high-risk classification guidelines on 19 May 2026. Consultation closes 23 June. The transparency guidelines are open until 3 June. Neither document is final. Neither is binding. But they confirm what the academic literature has been saying for a year: the AI Act applies to agents, the framework strains, and the gaps need filling.

Companies deploying agents now, and many are, at scale, don’t have the luxury of waiting for final guidance. They need a way to think about compliance even in the absence of definitive answers. And the starting point is the same as it’s always been with the EU AI Act: understand what your system does, understand who’s responsible for it, and build the oversight to match.

The 47 messages your agent sent last night? That’s the easy version of this problem. Wait until it’s a multi-agent system: an orchestrator delegating to specialized sub-agents, each with their own tools and decision logic, coordinating toward a goal that touches high-risk territory. The Commission says that’s one system. Article 14 says a human must be able to oversee it. Article 25 says someone must be the provider.

Nobody said compliance would be simple. But the regulation is catching up to the technology. Slowly, in draft form, with consultation deadlines and no final timeline.

The agents aren’t waiting.

Bianca Schulz

Jun 3

A friend of mine is very deep into this topic from the engineering side and he teaches me his framework. I think it's brilliant and perhaps one of very few ways how to really have AI agents under control. What do you think about this one from a lawyer perspective?

Here is the link: https://biancajschulz.substack.com/p/what-the-hell-is-ontology

A summary can be found in one of my latest notes.

2 replies by Silvia Stepitova and others

Mike Schlottman

Nice call out on the lag between the EU AI Act and the technological reality. Agentic AI has become the One Ring: coveted by leaders chasing power and profit, corrupting them in the wearing. Article 25 is the chapter nobody wants to read, because now the ring is stuck to you.

1 reply by Silvia Stepitova

3 more comments...

AI Law. Decoded.

"Intended Purpose" vs. "Effect" Under the EU AI Act

Is My AI System High-Risk Under the EU AI Act?

Provider vs. Deployer Under the EU AI Act

Discussion about this post

Ready for more?