AWS Bedrock AgentCore arrives, the cloud becomes an agent runtime

The week the cloud became an agent runtime

Amazon quietly flipped a switch that will change how production agents are built and run. On October 13, 2025, AWS announced that Bedrock AgentCore is generally available. If you have been stitching together frameworks, tool gateways, credentials, and tracing, this is the moment when do it yourself orchestration gives way to a native runtime that lives inside the cloud you already operate.

AgentCore bundles five ideas into one operating layer: long lived sessions on a managed runtime, an agent to agent protocol, a Model Context Protocol gateway that speaks Identity and Access Management, first party observability in Amazon CloudWatch, and a neutral stance on models and frameworks. The pieces fit like terminals, jet bridges, and ground crews. Each component can run on its own, but the real value shows up when flights, schedules, and service crews sync.

What actually changed under the hood

Most agent projects start with a request reply loop. A user asks, the agent answers, and the process ends. Real work rarely fits that pattern. Orders stall in a third party system, identity checks require callbacks, a research task runs for minutes, then the agent acts. AgentCore’s runtime supports extended execution windows, up to eight hours per run, with hard session isolation. That means an agent can hold context long enough to complete a real business process without polluting other sessions.

Picture a concierge who keeps an open folder for each guest throughout the day. The folder contains notes, documents, and tasks in flight. The concierge returns to it as new information arrives. Long lived sessions do the same for agents, and they do it without you running a fleet of long lived containers.

The second shift is the agent to agent protocol, often shortened to A2A. Teams have been improvising multi agent patterns with message queues and custom glue. A2A makes a graph of specialist agents feasible inside one runtime. You can split roles cleanly. A planner agent decides what should happen next. A researcher agent finds evidence. A cashier agent executes payments. Because each agent has its own session and policies, you get parallel progress without turning your system into a free for all.

Tools become a governed catalog, not an integration chore

Tools hold the keys to real work. If an agent cannot see inventory, file a ticket, or post a refund, it remains a chat widget that apologizes politely. AgentCore’s gateway turns tools into a catalog that the agent can safely discover and call. It speaks the Model Context Protocol, so agents and frameworks can list tools and call them in a standard way, and it respects Identity and Access Management for authorization. The effect is simple. Your agent can walk up to the counter, see what is available, and check out exactly what it is allowed to use.

If you have lived through brittle tool integrations, the appeal is immediate. You point the gateway at existing APIs, at Lambda functions, or at an external MCP server. You set who is allowed to call what, and on whose behalf. You get a single front door and a consistent policy path instead of a dozen hand coded adapters. For a concrete look at how this works, AWS documents the available gateway operations, including how agents list and call tools over the protocol. See the AgentCore gateway and MCP operations.

Identity is not an afterthought. AgentCore’s identity service stores and refreshes credentials, so the agent can act as a user or as a service with clear scopes. This is not glamorous, but it decides whether your deployment survives an audit. It also makes least privilege practical. A returns agent can issue refunds up to a set limit. A research agent can read from a knowledge store, but never write. These are not feature flags, they are durable guardrails.

CloudWatch observability, but for agents

Production agents fail in sneaky ways. A tool returns malformed data. A memory lookup times out. A planner loops on the same thought. Without tracing and metrics, the failure looks like a blank stare. AgentCore brings agent aware telemetry into Amazon CloudWatch. You see sessions, spans, traces, and model invocations tied together. There are curated dashboards that show token use, latency, error codes, and step by step execution paths.

You can treat an agent like a service. Set alarms on a spike in tool failures. Correlate a rise in latency with a specific model endpoint. Drill into a single customer session to understand why a refund flow took three minutes longer last night. Because the signals land in CloudWatch, you can wire them into Application Signals, Logs Insights, and alerting tools you already use. OpenTelemetry is supported, so traces can also flow to partners if that is where your team lives.

The outcome is a new posture. Instead of guessing why an agent behaved oddly, you can replay the path and see the moment where a tool call returned a 403, or where a planner step exceeded a time budget. That is how you reduce mean time to diagnosis from hours to minutes.

Cost shifts that will show up on your bill

AgentCore does not magically make agents cheap. It makes costs predictable and controllable in ways most roll your own stacks cannot match.

Runtime by the second. The managed runtime bills by resource and time. The upside is clear. You no longer keep warm containers or pay for idle orchestrators. The catch is that long lived sessions tempt teams to leave agents running longer than necessary. Treat every extended session like a checkout timer. If no work arrives within a sensible window, end the run and persist only the memory you need.
Gateway as a meter. Turning tools into a catalog means you can meter access. You can limit tool discovery to specific agents, cap calls per minute, or require explicit elevation for sensitive actions like payments or account closure. In practice this may shift some cost from model tokens to tool invocations, which is a good trade if you want strong guardrails. Set budgets and alarms on the gateway’s metrics just as you do for application programming interfaces.
Observability that pays for itself. CloudWatch pricing still applies to logs, traces, and metrics. The difference is that you only ingest the signals you need. Start with curated agent metrics, then add spans for the top three failure modes you actually see in production. Delete verbose logs after a short retention period and keep only the summary metrics and exception traces. A small, useful signal set beats a costly firehose of noise.
Memory with intent. Agent memory can become a silent bill. AgentCore’s memory service supports different strategies, including self managed approaches. Use short term memory for conversational context, and promote only valuable facts into long term memory. Treat memory writes like a database schema. Add structure and retention rules early, or you will pay for stale context that never gets read again.

These shifts add up. Teams that run a patchwork of functions, containers, and queues can collapse that into one managed surface. The result is fewer always on components and a tighter loop between work done and cost incurred.

Governance without friction

Every organization that cares about compliance asks the same questions. Who approved this agent to issue refunds. How do we revoke a permission quickly. Where is the record of what the agent did and why. Without a native identity and policy layer, teams invent answers and hope they hold up. AgentCore’s approach puts Identity and Access Management at the center.

Action scopes. Grant an agent the power to call tools with scope and limits. A shipping agent can reschedule deliveries within a time window and a distance range. A support agent can view but not modify billing records.
User impersonation that is auditable. Agents can act as a specific user with consent and time boxing. That means the customer’s identity flows through to systems of record. It also means you can prove an action was taken on behalf of a person at a known time.
Centralized logging. Tool calls, model invocations, and identity events show up in one place. You can export them for governance reports and anomaly detection. This replaces the scramble to merge logs from six services during an audit.

The most important cultural change is that product teams can move fast without bypassing controls. Policy becomes part of the design, not a gate at the end.

Multi model freedom without multi cloud sprawl

AgentCore does not lock you into a single model or framework. You can use small, fast models for classification, a larger model for planning, and a highly capable external model for a specialist task, all inside one runtime. The gateway and protocol approach is what enables this mix. The runtime does not care whether a tool calls a Bedrock hosted model, a third party endpoint, or a specialty inference cluster you operate yourself.

This matters as the model landscape changes. You can pick models for their strengths. A planning agent might favor a model with strong chain of thought, while a code interpreter agent might use a model optimized for tool calls. When a new model becomes attractive, you add it as a tool, set policies, and route traffic gradually.

There are constraints to navigate. Egress costs still apply if you call external endpoints. Data residency rules still govern where memory and logs can live. The upside is that the operating layer stays the same. You avoid a tangle of per model adapters and per vendor glue.

For a broader view on how the ecosystem is converging on agent runtimes, consider how Google’s Agent Builder makes production AI agents real and how SAP’s Joule Studio makes ERP an agentic control plane. The industry is standardizing on a small set of patterns: managed runtimes, governed tool catalogs, and deep observability.

Concrete playbooks you can run this quarter

Here is how to turn the promise into results that matter.

Start with one high value agent, not a platform. Pick a process where a person performs a clear sequence of steps, with access to at most five systems. Returns, onboarding, and warranty claims are good candidates. Instrument the baseline process time and success rate first, so you can measure improvement.
Define the tool catalog and policies together. List the exact actions the agent will need. For each action, write the allowed scope and the identity that should be used. For example, the agent can issue refunds up to a set amount using a service identity, and anything higher routes to a human. Build the gateway configuration from this list, not from a scatter of available APIs.
Design memory with a budget. Create a short term session store that expires on completion. Define one or two long term memory types, such as resolved customer preferences or escalations. Add retention rules. Review memory writes in code review the same way you review database schema migrations.
Wire CloudWatch before launch. Use the curated agent dashboards, then add alarms for the top three risks. Many teams pick loop detection, tool call failure rate, and a budget on token use per session. Add a cold path that exports traces for later analysis, but do not ship verbose logs by default.
Practice the failover drill. Decide what the agent does when a tool is down or a model endpoint degrades. Implement a backoff, a retry with a smaller model, or a graceful handoff to a human with context attached. Put the playbooks in runbooks, and test them weekly.
Bring in A2A deliberately. Start with a single agent so you can debug end to end. Introduce a second agent only when you have a clear specialty that warrants it, such as a research agent that gathers evidence under strict policies. Give each agent its own memory and identity boundaries, treat their conversations like service calls, and watch the traces as a single path.
Set model choice policies. Document which models are allowed for which tasks and why. Set a process to evaluate new models monthly. Keep a small routing layer so you can shift traffic to a new model with a configuration change, not a code rewrite.

If you are evaluating the emerging marketplace for tools and capabilities, you will see similar ideas outside AWS. The rise of marketplaces shows how teams want to buy capabilities with governance baked in. The dynamic is captured in our look at how Vercel Marketplace aims to be the npm for production agents.

What to watch as we head into 2026

A2A graphs as a first class design. Expect frameworks and consoles to visualize agent graphs and enforce policies at each edge. The winning approach will feel like service meshes for microservices, but with memory and identity built in.
Tool marketplaces with real governance. You will be able to subscribe to tool packs for common tasks, such as payments, shipping, or scheduling, and enforce your own limits on top. The gateway becomes a storefront where internal teams and vendors publish capabilities safely.
Agent service level objectives. Today, most teams set service level objectives around request latency and error rate. Expect to see objectives that track plan completion, cost per completed task, and auditability. Those can only be measured if your runtime and observability speak the same language, which is why CloudWatch integration matters.
Cost controls that look like traffic shaping. Just as we learned to cap queries per second, we will cap tokens per session, high risk tool actions per hour, and long lived sessions per user. These controls will be as normal as rate limits are today.
A steadier model portfolio. With a neutral runtime, teams will settle on a small set of models per job. Instead of chasing the model of the week, they will evaluate on cost curves, control, and observability hooks. Swapping models will look like swapping a database engine in a mature stack, rare and deliberate.

A final example to make it real

Consider a national retailer that wants to handle exchanges automatically. A customer chats about swapping a shirt for a different size. The agent validates inventory through the gateway, checks purchase history with identity scoped read access, and prepares a prepaid return label. A tool call fails because the shipping provider returns a transient error. The runtime holds the session open, retries with backoff, and succeeds. The agent then issues a credit, within limits, and schedules pickup.

In CloudWatch, the operations team sees the tool failure spike at 11:02 a.m. They set an alarm for that error code. They also notice average session time increased by 45 seconds during the provider outage. They add a fallback carrier tool with lower priority. The finance team reviews a weekly report of refunds above a threshold and tightens the policy for high risk orders. The agent improves over time without a rewrite.

This is how an agent stops being a chat demo and becomes a production system that pays its own bills.

The bottom line

AgentCore does not remove the hard parts of agentic software. It gives you a place to put them. Long lived sessions let real work run to completion. An agent to agent protocol makes specialist teams of agents practical. A Model Context Protocol gateway with Identity and Access Management turns tools into a safe catalog. CloudWatch observability shows you what happened and why. Put together, the cloud stops being just the place you host a model. It becomes the operating layer for agents. The teams that lean into that shift will spend smarter, pass audits with less drama, and make better model choices. That is the quiet breakthrough baked into this release.