Agent2Agent and Vertex AI Engine make enterprise agents real

The quiet flip from flashy demos to dependable systems

In September 2025, Google shipped a set of upgrades to Vertex AI Agent Engine that arrived without fireworks but changed the center of gravity for enterprise agents. Four primitives landed together and matter most in combination: Agent2Agent interoperability, a secure code execution sandbox, bidirectional streaming, and a first party Memory Bank. Each capability is useful on its own. Together, they flip agents from show floor curiosities into systems you can operate at scale, across vendors, under real governance.

Early agents behaved like gifted interns. They dazzled in controlled settings, then stumbled on enterprise realities like permissions, data boundaries, noisy tools, and teams that already run on a patchwork of platforms. This release turns those interns into colleagues who can work with other teams, use the right tools safely, remember what they learn, and coordinate in real time.

Why Agent2Agent interoperability matters

Agent2Agent, or A2A, gives agents a shared way to talk, hand off work, and negotiate responsibilities. In large companies no single vendor owns the stack. A sales agent might live on Google, a compliance agent on another cloud, and a pricing agent inside an on premises system. Without a consistent handoff protocol, teams are forced to build brittle, vendor specific bridges.

A2A solves that with three deceptively simple ideas:

Typed messages with intent. Instead of freeform text, agents exchange structured messages such as Request, Proposal, Result, and Exception. That sounds academic, but it removes guesswork and makes auditing possible.
Tool capability descriptors. Agents advertise what they can do, the inputs they accept, and the policies they enforce. A coordinator can route tasks intelligently rather than guess from prompts.
Negotiated contracts. Before one agent calls another, they establish scope and limits such as data access, time budgets, and cost ceilings. The contract becomes a policy object you can log and review.

With A2A, multi agent teams stop depending on prompt conventions and start behaving like microservices. That shift is the difference between a clever demo and an operable system.

The new primitives, in plain language

Here is what the other three upgrades add to the picture.

Secure code execution sandbox

Many enterprise tasks require code, from transforming spreadsheets to calling internal services. The sandbox provides an ephemeral, resource limited runtime with network controls, secrets isolation, and audit logs. Developers get a place to run real work without granting an agent broad privileges. Security teams get a predictable boundary that maps to familiar controls.

Bidirectional streaming

Request and response are no longer single blobs. The agent can stream partial reasoning, intermediate results, and tool output while the client streams new context, approvals, and corrections. Conversations become event streams rather than monologues. This cuts time to first useful token, supports progressive disclosure, and enables collaborative guardrails like human approvals or policy checks mid flight.

First party Memory Bank

Agents need two kinds of memory. They need short term working memory to track the current task, and durable memory to carry norms, preferences, and institutional knowledge forward. The Memory Bank is a managed store for that durable memory with governance controls, time to live policies, and selective recall. Crucially, it lives inside the same control plane as the agent, so access, encryption, and audit behave like the rest of the platform.

Together, these primitives translate into an operating model enterprises already understand. Typed messaging looks like service interfaces. The sandbox looks like a job runner. Streaming looks like a message bus. The Memory Bank looks like managed state with lifecycle rules.

A reference architecture you can ship in Q4 2025

You can deliver a cross vendor, production ready agent system this quarter by combining these components. The diagram in words looks like this.

Client layer. Channels where users and systems meet agents. Examples include a web console, a service desk plugin, or a developer command line interface.
Broker and orchestrator. A lightweight service that receives tasks, inspects policies, and decides which agent or tool gets the next step. It speaks A2A to external agents and the native protocol to Vertex agents.
Identity and policy stack. Everything runs under signed identities. Map users and agents to roles, scope down secrets, and use policy objects to set guardrails on data, time, spend, and network egress.
Tool and data adapters. Connectors to your systems of record and systems of engagement. Wrap each tool with a capability descriptor that A2A can advertise.
Code sandbox. An ephemeral runtime for transformations and glue code. Only the sandbox can reach sensitive endpoints, and only with scoped tokens.
Memory and state. Short term conversation state lives with the session. Durable institutional memory lives in the Memory Bank with explicit schemas, retention rules, and approvals for write access.
Telemetry and evaluation. Centralized logs for messages, tool calls, resource usage, and outcomes. An evaluation harness runs regression suites on the tasks that matter, like invoice matching or procurement approvals.

A basic deployment flow looks like this:

A user starts a request in the web console. The browser opens a streaming session and sends an initial task with the user identity attached.
The orchestrator checks policy and consults the Memory Bank for relevant durable memory. It enriches context and forwards a typed Request to the lead agent.
The agent plans the work. For code steps, it emits a sandbox job with a spec that declares timeouts, resource limits, and required secrets by name, not by value.
If the plan requires external agents, the orchestrator initiates A2A handshakes. Each external agent responds with a Proposal that includes scope and expected cost. The orchestrator selects one or more and issues a contract.
As tools run, the agent streams partial results. The console can ask clarifying questions or inject approvals without ending the turn.
Results are written to the target system. A durable memory update is proposed. A human or policy engine approves the write. The Memory Bank stores the lesson with metadata, including provenance and retention.
Telemetry flows to analytics. Your evaluation harness runs post hoc checks on accuracy, cost, and timing.

You can build this structure with a small team because the heavy lifting is in the platform. The new pieces replace custom glue that used to consume months.

A concrete use case: invoice resolution across systems

Imagine a global manufacturer with a backlog of mismatched invoices. Today, analysts bounce between an enterprise resource planning tool, a vendor portal, and a document system. A multi agent team resolves these mismatches in minutes.

The intake agent receives an exception from the enterprise resource planning system and streams a summary to the analyst.
The documents agent extracts fields from scans, runs a quick code transform in the sandbox, and normalizes line items.
The pricing agent checks contracts and discounts, some of which are stored as durable memory snippets from past negotiations.
A vendor agent from a partner platform joins via A2A to fetch shipment data that your company does not store.
A compliance agent verifies tax handling, logs the policy checks, and requires a one click approval for large corrections.
The orchestrator commits updates to the enterprise resource planning tool and records a new memory: the vendor accepts a specific reconciliation pattern when freight charges appear on the second page.

The change is not only speed. Every step is auditable, data stays in policy, and the team can prove why a correction happened.

What to build first, second, and third

If you are starting in Q4 2025, resist the urge to wire every system. Build three slices that teach the platform how to work.

Slice 1: run a single task end to end with streaming and the sandbox. Pick a task that mixes retrieval, transformation, and a write. Prove you can stream partial results, execute code safely, and post to a production system.
Slice 2: add A2A with one external agent. Choose a partner agent that brings data or expertise you do not own. Define a contract, test failure modes, and measure latency and cost.
Slice 3: write to the Memory Bank with human approval. Design memory schemas that separate preferences, facts, and norms. Use time to live rules so old beliefs retire.

When these are solid, scale horizontally. Add more tools, more agents, and more channels.

Interop across vendors without the pain

Real interoperability requires more than a shared message shape. Treat A2A as a boundary that hides your internal choices and respects those of your partners.

Normalize capabilities. Represent tools as capability descriptors with input and output schemas, side effects, and policy tags. Publish only what you are willing to support.
Translate politely. If a partner uses a different format for tool calls, keep a translator at the edge. Do not leak internal idiosyncrasies into the contract.
Keep contracts small. Bound each engagement by time, data scope, and spend. Renew if needed. Small contracts reduce blast radius and make billing simpler.
Share provenance. Include signatures and hashes for critical outputs so downstream agents can verify lineage without calling back to you.

The ecosystem is converging on these ideas across platforms. You can see similar patterns in how AWS AgentCore’s September update moved agents closer to enterprise native, and in how Agentforce 3 makes agents production grade centered governance as a first class feature. For connectors and tool catalogs, the push toward common descriptors echoes the momentum behind Boomi brings MCP to Agentstudio.

Security and governance in practice

Security is where many agent initiatives stall. The new primitives allow a conservative posture without killing velocity.

Threat model the tools. For each tool, list data access, network egress, and side effects. Place tools behind the sandbox when possible. For tools that must reach sensitive systems, require short lived tokens issued per job.
Enforce least privilege with policy objects. Policies set data filters, cost ceilings, and time budgets. Contracts inherit from policy and must be logged.
Block prompt injection at the edges. Use input sanitizers and a catalog of known bad patterns. Add canary instructions to detect hostile overrides.
Keep humans in the loop for writes and memory updates. Require approvals for irreversible actions and grant one time privileges rather than long lived roles.
Log at the message and tool level. Capture who asked, what was asked, the plan, tool invocations, results, and any external contracts. This is the audit trail regulators expect.

The result is a system that can pass internal review without becoming a fortress no one uses.

Operating the fleet, with numbers that matter

Once you move from pilots to production, run the agent fleet like an application platform. Pick a handful of metrics that tie to outcomes and publish them where executives already look.

Time to first token and time to task completion. Streaming should cut the first significantly while keeping the second predictable.
Cost per resolved case and cost variance. Contracts and policy objects help cap spend spikes.
Assist rate and deflection rate. Track how often the system resolves a task end to end and how often it provides a useful partial result that saves human time.
Error classes, not just counts. Separate hallucination with low stakes from policy violations and failed commits. Each class has a different owner and fix.
Memory hit rate. Measure how often durable memory accelerates a task and how often it is rejected by policy or humans.

When the numbers move, roadmaps move.

How these primitives reshape enterprise roadmaps

Agent programs have lived in two extremes. Either a showpiece pilot that never touched core systems, or a sprawling custom build that could not survive a platform change. The September 2025 package creates a middle path that is both practical and forward looking.

Standardized handoffs. A2A means business units can procure agents from different vendors and still work together. Enterprise architecture teams will publish standards for contracts, capability descriptors, and audit requirements. Procurement will ask for A2A compliance by default.
Safer autonomy. The sandbox and policy objects make it feasible to grant agents scoped autonomy. Teams can authorize low risk writes and code transforms without weeks of review, while reserving high risk actions for approvals.
Memory as a first class asset. The Memory Bank will push product teams to curate durable knowledge rather than bury it in prompts. Expect playbooks and norms to be encoded as memory with owners and expiration dates. Compliance teams will welcome a consistent place to review and purge.
Real time collaboration. Streaming changes the user experience. Agents will begin to narrate, ask clarifying questions, and show work in progress. Designers will craft patterns for repair, correction, and escalation inside the flow.
Vendor strategy. With A2A in place, enterprises can build fleets that include specialized agents from niche vendors. Core platforms will compete on governance, performance, and ecosystem depth rather than lock in. The winning roadmap prioritizes clean contracts over proprietary glue.

What to watch in the next two quarters

A few inflection points will signal whether the ecosystem is realizing the promise.

Cross vendor incident response. Security teams pilot joint playbooks where a detection agent from one vendor contracts a containment agent from another while your internal compliance agent records the chain of custody.
Agent catalogs in enterprise marketplaces. Expect catalogs where capability descriptors are validated, policies are pre attached, and trials run in a shared sandbox.
Pricing models that bill by contract. As A2A contracts become the unit of work, billing will follow. This will simplify chargebacks and make costs track value more closely.
Memory governance reviews. Audit and legal teams start treating durable memory as a regulated record in certain industries. The Memory Bank will need clear retention features and export controls.

A playbook for leaders

If you oversee an enterprise agent program, give your teams concrete tasks.

Publish a contract standard. Define required fields for scope, identity, data categories, spend caps, and retention. Provide a linter that blocks non conforming contracts from production.
Approve a starter capability schema. Keep it small enough to be adopted, rich enough to be useful. Add side effect tags like write, delete, or external call so policy can reason about risk.
Fund an evaluation bench. Select ten tasks that represent real value, like claims reconciliation or quote generation, and run them nightly across versions. Tie go or no go decisions to these results.
Create a memory council. Appoint owners for durable memory categories, set time to live defaults, and require provenance for every write.
Set a target for interop. Pick two non Google agents to integrate through A2A before the end of the quarter. Use the exercise to flush out translation issues and policy gaps.

These actions turn a vendor release into lasting momentum rather than another announcement in the rearview mirror.

The bottom line

The September 2025 upgrades brought the boring but essential parts of engineering to the agent world, and that was the point. Interoperability through A2A, safe execution through the sandbox, real time collaboration through streaming, and durable knowledge through the Memory Bank add up to an operable foundation. With these primitives, enterprise teams can ship multi agent systems that cross vendor lines, respect policy, and improve month after month. The era of one off demos is over. The era of agent fleets that pull their weight has begun.