Vertex AI Agent Engine unlocks code execution and A2A

Breaking: the cloud agent runtime arrives

Google is turning Vertex AI Agent Engine into something bigger than a chat front end. The latest release packages sandboxed code execution, Agent to Agent protocol support, a Memory Bank for durable context, bidirectional streaming, and expanded runtime pricing that takes effect on November 6, 2025. From a builder’s perspective, the platform now looks like a programmable runtime that can support real software with real guardrails, not just a demo kiosk you point at a prompt. You can scan the official history in the Agent Engine release notes, which frame this progression clearly.

This shift matters because production teams need predictability, isolation, and a bill they can model. With a controllable execution surface, a standard way for agents to talk, explicit memory controls, and streaming for responsiveness, Agent Engine crosses into platform territory.

What actually shipped, in plain English

Secure code execution: Agents can create an isolated sandbox to run short programs. The sandbox is designed for untrusted code, starts with no network access, supports file I O within defined limits, and can retain state for a limited window so multi step work carries across calls. The feature currently appears in preview and in limited regions. Think of it as a safe bench where the agent can use power tools without leaving the lab. For technical framing, see Google’s code execution overview.
Agent to Agent protocol support: Agent Engine can host agents that exchange structured messages through a shared protocol rather than proprietary glue. That reduces lock in and makes multi agent systems feel more like services on a message bus.
Memory Bank: Agents can store and retrieve durable memories such as preferences, key facts, and explicit remember or forget instructions. The console exposes controls so teams can review and govern what the system remembers.
Bidirectional streaming: Requests and responses can stream both ways, which enables progressive updates, low latency feedback, and smoother human in the loop checkpoints.
Runtime pricing expansion: Beginning November 6, 2025, runtime billing expands to additional regions and follows a metered compute and memory model. That clarity supports budget planning for 2026 roadmaps.

If agents once felt trapped behind a text box, these capabilities turn that box into an operating room with instruments, procedures, and charting.

Why this is a turning point

Early agent tools rewarded clever prompts but punished teams that wanted determinism. Production use calls for a place to run code safely, a protocol to connect multiple specialists, a memory model that is explicit and auditable, and a cost line item that procurement recognizes. With this release, Agent Engine checks those boxes in one stack.

A secure sandbox makes tool use predictable. Instead of vague calls out to an unknown executor, teams can control where and how code runs, limit file sizes, and stage work across steps with traceability.
A standard agent to agent handshake makes interoperability real. A planner built on one framework can call a retrieval or coding specialist built on another, and you can replace any piece without rewriting the whole system.
Memory Bank elevates assistants from polite parrots to teammates who accumulate useful context under policy. Clear topics and deletion rules make this operational rather than magical.
Streaming makes live experiences feel fast and trustworthy. Users see partial progress, can course correct earlier, and do not abandon flows that appear frozen.
Published runtime billing invites serious adoption. Finance and security teams can evaluate costs and controls, then move from pilot to program.

This evolution parallels other parts of the industry where agents are getting a real runtime. We covered a similar shift on the data side when Snowflake Cortex Agents Go GA, and we explored enterprise control planes in GitHub Agent HQ mission control. Vertex AI Agent Engine is now playing in that same league.

What teams can ship right now

Here are five concrete patterns you can build this quarter without waiting for more features.

1) Analytics copilots that compute, not just chat

Give your analyst bot a safe workbench. Use the sandbox to run small Python snippets that clean data, calculate feature importance, or render a lightweight chart. Keep modules short and idempotent so each step is easy to retry and test. Persist temporary files or variables in the sandbox for the duration of an analysis session, then roll them forward across a few calls. Since the sandbox begins without network access, feed it data by attaching signed objects or pre mounted files that the agent passes in explicitly. This pattern replaces brittle notebooks with testable steps and makes the agent accountable for results.

2) Tiered tool access with a safe default

Default every workflow to the no network sandbox. When a step requires external systems, hand off to a separate tool microservice with its own policy, logs, and timeouts. The separation is simple to audit. Security teams can review the handful of networked tools rather than inspect every code generation step. Over time, move more logic into the sandbox so your internet facing surface area stays small.

3) Multi agent planners with a standard handshake

Put a small conductor in front that divides a goal into subtasks. Specialists handle planning, retrieval, coding, and quality control. Use a typed message schema for each agent so failures are diagnosable. Because the protocol is standardized, you can evolve one specialist at a time or even swap frameworks as the team learns. This keeps your architecture flexible without constant rewrites.

4) Memory aware assistants that respect policy

Start with a narrow Memory Bank schema that lists allowed topics such as user preferences and recent task outcomes. Write a governance rule that every stored memory must be justified by a sentence from the transcript and deletable on user command. Expose a simple audit page in your product so users can see and remove memories in place. This feature is the difference between a helpful teammate and a goldfish.

5) Live, progressive experiences

Turn on streaming for long running steps. Show draft outputs, token by token analysis, or the next planned action while the rest is still running. This reduces abandonment and builds trust because users can see how the system reasons.

How to architect for multi agent cooperation

Treat the agent protocol as a message bus, not a framework detail. A clean, scalable design is simple to describe and easier to troubleshoot.

Roles and responsibilities: Define a short contract for each agent. For example, Planner returns a task graph with typed edges, Retriever returns a ranked bundle with citations, Coder returns validated code diffs, and QA returns pass or fail with reasons. Keep each contract under ten fields and provide an example payload for unit tests.
Orchestration: Use a thin orchestrator that enforces timeouts, retries, and policy. Let agents do the thinking and producing. The orchestrator should only route messages and record traces, which keeps the system debuggable.
State: Maintain three layers of state. Session state covers the current conversation or job. Working state lives inside the code sandbox for intermediate variables and files. Long term state lives in Memory Bank. Make it explicit which layer each step reads and writes.
Safety: Keep a default deny posture. No agent should call the network from inside the sandbox. Any outbound calls happen in a separate tools service with its own credentials and observability. Review Memory Bank entries during development and prune anything that is not clearly justified.
Observability: Capture traces so you can answer three post incident questions quickly. What did the agent know at each step, what did it try to do, and what actually ran in the sandbox. Wire these answers into your on call runbook.
Human in the loop: Combine streaming with small, typed artifacts to add lightweight approvals. Ask a human to approve a plan before the Coder runs. Stream intermediate results from the sandbox and let humans nudge the path without stopping the world.

For a consumer angle on streaming and autonomy, compare these patterns with what we observed in Gemini Agent Mode on Android. The enterprise building blocks are similar, only with stricter controls and auditing.

Pricing, with a 2026 budget you can defend

Runtime billing is metered by compute and memory while your agent runs. Public reference rates outline costs per vCPU hour and GiB hour, and Google indicated on October 6 that billing would expand to additional regions beginning November 6, 2025. The exact cents will vary by region and tier, but the important point is that you can now model cost per job like you do for containers.

A practical way to build a 2026 budget:

Define a representative job. Example: a support triage flow that calls two external tools, writes one memory, and executes 20 seconds of sandboxed code with 1 vCPU and 1 GiB of memory.
Convert to cost. Multiply 20 seconds divided by 3600 by the posted vCPU rate, and do the same for memory. Put the math in a spreadsheet so product managers can see sensitivity to seconds and memory.
Multiply by expected volume. If your help desk runs 5,000 such jobs per day, you now have a daily runtime cost. Compare it to your current automation cost per ticket, then add model inference and storage where relevant.
Add buffer. Preview features can change limits and performance. Add a 25 percent buffer to your 2026 forecast until you observe three steady months.
Plan by region. If you operate in Europe or Asia Pacific, confirm which regions incur runtime billing and whether data residency rules require a specific region. The Agent Engine release notes track region coverage and updates.

Two pragmatic tips:

Short, bursty jobs are cheap in absolute dollars but sensitive to cold starts and orchestration overhead. Keep agents warm only when the business value justifies the idle cost.
The memory line item is not optional. If you run large in memory contexts or long sessions, model GiB hours the way you model an in memory cache.

Compliance and control without heroics

Security leaders will ask three questions. Where does code run, what can it reach, and what did it remember. With this stack, you can answer all three crisply. Code runs in a managed sandbox with no network by default. The only way out is through the tools you explicitly wire up. Memories live behind a console with clear governance, and streamed runs produce traces you can audit.

For regulated sectors such as healthcare or finance, pay attention to enterprise hardening like private networking, customer managed encryption keys, and service controls. These features let you align agents with existing policies, and they shorten the path from pilot to production because review boards can test controls rather than trust black boxes.

Build versus buy: a realistic checklist

You do not have to migrate everything to Agent Engine to benefit. Start where the risk is highest and the isolation helps most.

If you already run agents on Cloud Run or Kubernetes, move code execution into the Agent Engine sandbox while keeping your orchestrator as is. This isolates the most volatile part of the system without a full rewrite.
If your team mixes frameworks, standardize inter agent messages on a shared protocol. Treat frameworks like plug ins behind a stable contract rather than platform decisions that lock you in for years.
If your assistant forgets important facts, add Memory Bank with the smallest useful topic set and expose a simple memory viewer. Expand topics only when users ask for it, and keep delete controls obvious.
If your experience feels sluggish, enable streaming for heavy steps and show partial outputs. The perceived speed matters and the operational gains are real because users intervene earlier.

A concrete reference architecture

The following reference keeps the footprint small and the learning curve reasonable.

Entry: A gateway receives requests, attaches an identity, and opens a streaming channel.
Planning: A lightweight planner produces a task graph that lists subtasks and required tools. The graph is a typed artifact you can version and test.
Execution: The orchestrator submits code work to the sandbox and handles retries. All network calls are delegated to a separately deployed tools service. Each tool call includes an allow list of destinations and a timeout.
Memory: The agent writes only to the predefined Memory Bank topics. Any new topic requires a code change and a review. Users can view and delete memories in product.
Observation: Traces capture inputs, outputs, and the minimal context required to reproduce a run. Logs include which files were read and written in the sandbox.
Human review: The planner’s artifact and the Coder’s diff are presented to a reviewer when policy requires approval.

This layout mirrors what we have seen on other platforms that are embracing agent runtimes. The specifics vary, but the shape holds because the constraints are universal.

What this means for the industry

The center of gravity is moving from chat user interfaces to runtimes that look like application platforms. That change attracts a different buyer and a different set of questions. Engineering leaders ask for code reviews, memory retention policies, and cost baselines. Procurement wants line items they can model. Security wants controls they can test. A platform that answers those questions wins budgets.

It also sets the stage for open collaboration. With a standard protocol for inter agent messaging, you can imagine a system where a planning agent on one cloud hands a task to a specialized agent elsewhere without custom glue. That is how ecosystems grow and how teams avoid repainting the same walls every quarter.

The bottom line

Agents will not replace applications. They will become applications. The runtime you choose will matter more than the chat widget you ship. Vertex AI Agent Engine’s new capabilities make a strong case that cloud agent runtimes are consolidating around security you can reason about, protocols you can share, and prices your finance team can defend. If you start now, you can ship safer code execution, interoperable multi agent systems, durable memory with controls, and streaming experiences that feel alive. Do those four well and your 2026 plan will look less like a science project and more like real software.