Vertex AI Agent Engine’s September leap to real runtime

A turning point you can feel

Every few years a platform adds just enough new parts that it stops being a toolkit and starts behaving like a runtime. September 2025 was that moment for Google’s Vertex AI Agent Engine. In one sweep Google shipped safe sandboxed code execution, agent to agent interoperability, durable long term memory with a visual console, bidirectional streaming, and tighter enterprise controls. The net effect is simple to state and powerful to live with: you can now build agents that are safe to run, easy to compose, and ready for real workloads.

If you want the canonical summary straight from Google, the September 10 release notes list the highlights, including code execution, Agent to Agent support, bidirectional streaming, and the Memory Bank interface. See the September 10 release notes.

The five new primitives, explained in practical terms

Think of an agent runtime like a modern kitchen. You need a stove that is safe to use, a pantry that remembers what you have in stock, a way for cooks to coordinate without yelling, and a service window that lets food move in both directions. The September release delivered each of those pieces.

1) Safe code execution inside a sandbox

Agents now have a controlled place to run snippets of code. The sandbox is isolated, has no network access, and is designed to spin up in under a second. It supports file input and output with modest limits, and it can preserve state so you can build on previous steps rather than starting over each time. State can persist for up to two weeks, and that time to live can be configured. Timeouts and size caps keep runaway tasks from turning into incidents.

This matters because many real tasks require calculation or formatting that language models alone cannot do reliably. Imagine an accounts payable agent that must reconcile a spreadsheet, compute tax, and generate a compliant invoice file. With a sandbox the agent can write a short program to do that work safely, then return artifacts to the calling workflow. Since there is no network, you do not accidentally turn a reasoning step into a data exfiltration risk.

Important caveats for regulated work: the sandbox is a preview feature and it does not currently support some enterprise controls that other parts of Agent Engine do support. You should treat the sandbox as a separate trust zone, and for sensitive flows either avoid it, route only synthetic or anonymized data through it, or fence it with strict data minimization.

Concrete guardrails to implement on day one:

Keep a per task budget for runtime and storage, and fail closed at your limits.
Attach a short lived session token to each execution and record code, inputs, outputs, and logs to your event store for audit.
Treat code execution outputs as untrusted until validated by a checker agent or a typed schema parser.

2) Agent to Agent, a lingua franca for collaboration

Agent to Agent, often shortened to A2A, is an open protocol for agents to discover one another, exchange capability descriptions, and pass tasks without a custom integration for each pair. In practice A2A gives you an Agent Card that advertises skills and a small set of task operations such as send a message, get task status, and cancel a task. The experience feels like plugging a new device into a universal port instead of wiring up bespoke adapters for every new teammate.

Google’s documentation positions A2A as an open standard that outlives any single framework. If you want to see the moving parts, Google’s A2A protocol guide walks through Agent Cards, executors, and the basic endpoints.

What this unlocks: modular teams of specialists. For example, your intake agent can route a billing problem to a finance agent, which may in turn call a code execution micro agent to produce a ledger file. None of those services needs to know the internals of the others. They only need to honor a small contract and a shared identity story.

3) Durable long term memory with a console

Vertex AI’s Memory Bank gives your agents a place to write down facts, preferences, and outcomes that last beyond a single conversation. You can scope memories to a user identity, configure expiration with time to live rules, and pull memories back into context when they are relevant. You can also seed and manage memories with a new tab in the Cloud Console, which makes debugging and compliance reviews much less painful.

A useful mental model is a well organized notebook rather than a swollen transcript. Instead of scrolling and re prompting an entire chat history, the agent extracts the few details that matter, writes them to memory, and retrieves them when needed. You control the topics that are eligible, like personal preferences or key task outcomes, and you can define retention to align with your data policy.

4) Bidirectional streaming for real time interfaces

Agents can now speak and listen in real time. With bidirectional streaming the user can talk or type while the agent is still replying, and the agent can adjust mid response. That immediacy is the difference between a call center experience and a web form. It also enables multi modal experiences like voice and cursor interaction or ambient analytics during a video call.

On the back end you still get structured steps, tool calls, and a final output, but the user sees a fluid conversation. Limit concurrent connections to a number that your team can support during peak periods, and make sure you have a graceful fallback to text when a user’s network drops.

5) Enterprise controls that map to how you already secure systems

Agent Engine plugs into the controls that many security teams already use. Runtime, sessions, and memory support Virtual Private Cloud Service Controls to restrict data movement, customer managed encryption keys to manage cryptographic controls, and regional data residency at rest. Those same layers do not currently apply to sandboxed code execution, so treat that path with extra discipline or keep it out of regulated workloads until it matures. Agent Engine is part of Vertex AI services that support Health Insurance Portability and Accountability Act workloads, which matters if you plan to process protected health information. As always, align your design with your Business Associate Agreement and your data classification policy.

What this unlocks right now

Here are three concrete workflows that move from prototype to production with these pieces.

HIPAA aligned care coordination. Use the runtime, sessions, and memory to build a care assistant that collects intake notes, summarizes visits, and drafts prior authorization letters. Keep protected health information within your Virtual Private Cloud boundaries, use memory topics that exclude ultra sensitive fields, and retain only what you need. Do not use sandboxed code execution on protected health data yet. If you need computation, precompute outside of the agent path or use a separate non PHI workflow.
Autonomous back office bots. Compose a finance desk with a triage agent, an invoice extraction agent, a payments agent, and a compliance agent. The triage agent receives emails and files, uses A2A to delegate, and each specialist reports status back through tasks. For reconciliation, spin up a sandbox to compute deltas and produce artifacts like comma separated values files for your enterprise resource planning system. Gate all updates with a typed diff and a human approval policy for high risk changes.
Real time user interface agents. Build a voice concierge for retail that can answer questions, check inventory through a private interface, and place orders. Bidirectional streaming keeps the conversation natural. Memory stores preferences like sizes and brands. A checker agent validates final actions, and a rate limiter keeps the live conduit stable under load.

How to architect with these new primitives

Use a simple layered plan. Draw a thin horizontal line across the page. Everything above the line is user facing. Everything below is orchestration and controls.

Above the line

Interface layer. Web, mobile, or contact center. For voice, set a low latency budget and stream tokens as they arrive. For accessibility, always offer a text channel.
Session mediator. A small service that creates agent sessions, attaches an identity, and emits every user and agent event to your event store.

Below the line

Orchestrator agent. The brain that decides which tools to call, which agents to delegate to through A2A, and when to write or read memories. Keep this logic declarative so you can audit and change it without a redeploy.
Memory service. Configure topics and time to live rules. Use a memory as a tool design so the orchestrator can request a write only when the content meets your standard. Add an allowlist of fields to prevent accidental capture of sensitive data.
Code execution micro sandboxes. Treat each execution like a container with a clean filesystem. Provide only the minimal inputs, validate outputs against a schema, and delete the sandbox as soon as you do not need it.
A2A fabric. Register your agents with Agent Cards, keep skills focused and testable, and use task ids for end to end tracing across services.
Compliance envelope. Put the runtime, sessions, and memory inside your Virtual Private Cloud Service Controls perimeter. Use customer managed encryption keys where supported. Keep audit logs for every event and step, not just final outputs.

Design patterns that work well

The checker. For any action that touches money, identity, or compliance posture, add a second agent that verifies the proposed change with different prompts and tools.
The gate. For regulated workloads, route all reads and writes through a policy gateway that can redact or block data based on classification.
The saga. Model a long process, like a refund, as a series of tasks with compensating actions. Store that state in sessions and memories so you can resume after failure.

Why interoperability is the tipping point

Open standards change the slope of progress. Simple Mail Transfer Protocol turned fragmented email systems into a network. OpenAPI turned private application programming interfaces into ecosystems. Agent to Agent lowers the cost of adding a new specialist to your team of agents. You do not have to wire every pair together. You describe your skills once in an Agent Card and you expose a few standard endpoints. That turns your internal agents into products that are easy to discover, test, and version.

The immediate benefit is speed. Teams can publish a small agent that solves one problem well, then compose it with others later without a rewrite. The deeper benefit is safety. When you standardize the surface area, you can standardize logging, identity, and policy. You can layer rate limits, timeouts, and approval policies without chasing one off integrations.

The business takeaway is direct. Interoperability makes agent projects less like a monolith and more like a marketplace. You can buy, build, or borrow a capability and plug it in with almost no glue code. That is the moment a platform becomes a runtime.

A concrete 30 day plan to build and ship

You do not need a year to prove value. Here is a disciplined month that ends with a real deployment.

Week 1, pick one workflow and shape the envelope

Choose a narrow, valuable slice. Examples: refund initiation, benefits eligibility check, invoice triage.
Classify data. Mark fields as protected health information, personally identifiable information, payment card data, or public. Decide what is allowed in memory and what is never stored.
Stand up the basics. Create a project, enable Vertex AI Agent Engine, and configure your Virtual Private Cloud Service Controls perimeter for runtime, sessions, and memory. Set up customer managed encryption keys if your policy requires it. Create a non production environment with separate keys and accounts.
Instrument events. Before building anything smart, make sure you can log every session event, tool call, and agent step to your observability stack. Assign owners for alerts.

Week 2, build the minimum lovable agent

Author the orchestrator with the Agent Development Kit template or your framework of choice. Define explicit tools for data access and policy checks. Implement a memory tool with a narrow allowlist and time to live defaults.
Define an Agent Card for at least one specialist agent and wire A2A between the orchestrator and the specialist. Keep the Agent Card focused on two or three skills.
Add a single sandboxed computation for a non sensitive subtask, for example a currency conversion or document reformat. Validate outputs with a schema and a checker agent.
Create a thin user interface. For voice, enable bidirectional streaming. For web, stream tokens and tool events to the page so users see progress.

Week 3, harden and measure

Red team prompts and tools. Try prompt injections, long inputs, and malformed files. Verify that your policy gateway filters what it should and that the sandbox never reaches beyond its files.
Add guardrails. Set timeouts, rate limits, and task budgets. Implement a global circuit breaker that can route all agent calls to a fallback answer or a queue.
Evaluate outcomes. Track helpfulness, accuracy, time to first token, and task completion rates. Add a feedback control in the interface.
Prepare roll back. Keep a versioned configuration and a one click revert path.

Week 4, pilot and expand safely

Run a canary with 5 to 10 percent of traffic or a small set of users. Review logs daily. Fix issues before expanding.
Train people and publish policies. Share a short guide that explains what the agent can do, what data it stores, and how to report a problem.
Expand A2A composition. Add one new specialist agent that you can plug into the same orchestrator, for example a cost estimator or a shipping scheduler. Prove that you can add a capability without touching the rest.
Decide on graduation. If metrics look good, move to 50 percent traffic with alerting thresholds. If not, keep the pilot and iterate.

See the broader pattern

This shift at Google mirrors a wider movement where agent platforms mature into runtimes with real controls. Amazon’s launch of Bedrock AgentCore makes agents ready for production by formalizing tools, memory, and policy. GitHub’s push to make Copilot Agent turns PR runtime shows how developer workflows become agentic when review and action share a stateful loop. On the research and frontier model side, Claude Sonnet 4.5 pushes agents toward dependable work with better reasoning and tool use. The direction of travel is clear. Agents that could demo are becoming agents that can ship.

The bottom line

By adding a safe stove, a memory pantry, a universal port for collaboration, a live service window, and a security perimeter that speaks the language of enterprises, Google turned Agent Engine into a true runtime. You can ship a valuable agent in a month, compose new skills without rewiring your system, and meet your compliance team where they already work. The next wave will not be single towering agents. It will be teams of small agents that cooperate through a common standard, with code execution used sparingly and safely where it adds certainty. That is a pragmatic path to real productivity gains, starting today.