The Conversational OS Moment: Apps, Agents, and Governance

This week, chat became an operating system

A line moved from hype to reality. On October 6, 2025, OpenAI used DevDay in San Francisco to ship two building blocks that turn conversation into a software surface: Apps in ChatGPT and a developer toolkit called AgentKit. Developers can now build applications that live directly inside a chat, with UI components, data connections, and safe actions. This is the first credible blueprint for conversation as a platform rather than a feature, and you can read the official overview of the Apps in ChatGPT announcement.

A day later, on October 7, 2025, Google introduced the Gemini 2.5 Computer Use model, which lets an agent navigate and operate user interfaces with clicks and keystrokes under policy. That capability ships through the Gemini API and Vertex AI, making cross-app control a product rather than a demo. Read Google’s Gemini 2.5 Computer Use model for the technical framing and safety posture.

One more story set the stakes. In August and early September 2025, Perplexity’s Comet browser rushed patches after researchers showed that indirect prompt injection could drive the browser’s agent to exfiltrate sensitive data by abusing the user’s session. The lesson was not just about Comet. It was about the category. The more an agent can do, the more you must govern what it may do.

The thesis: intelligence is abundant, governance is scarce

Model intelligence still matters, but the edge is shifting from raw IQ to system design. The winners will be the platforms that can grant, meter, prove, and revoke capabilities with the precision and predictability of a payments network. We have been circling this idea for a while, from consent mechanics to interface-native agents. If you care about how systems ask before they act, see our deep dive on the consent layer coming to AI. If you want the bigger interface shift, revisit why browser‑native agents change the API surface.

Think about how mobile went mainstream. Smartphones did not win on chips alone. They paired compute with stores, sandboxes, permissions, and payments that limited risk and created trust. Agents raise the bar again. A chat app that can design a billboard, book a flight, pay a vendor, file a ticket, and update a database is no longer a chatbot. It is a principal acting on your behalf. That demands a governance stack richer than:

Allow or deny once
One big consent screen
Hope for the best

Instead, we need intent-level control, proof of origin, continuous policy enforcement, and accountability with teeth.

What actually changed this week

ChatGPT gained a credible app runtime and SDK. That concentrates a large surface of user interaction inside a conversational shell that can host real workflows, not just answers.
AgentKit formalized production agents. Developers now target ChatGPT as a host environment for stateful work, not just as a destination for text completion.
Gemini 2.5 Computer Use turned cross-app control into a supported pattern. Agents can operate user interfaces, including websites without first-party APIs, with oversight and policies.
The Comet incident illustrated what happens when capability control lags behind capability expansion. A hidden instruction was enough to trigger actions with the user’s privileges.

Together, these signals say the quiet part out loud. We are standardizing the idea that agents will click, type, move money, and change records. If that is the case, the platform’s core product is no longer the model alone. It is governance.

From permission popups to capability contracts

A permission popup is a weak contract. It collects consent once and then stretches it across time and context. Agents need tighter loops.

Here is a better mental model. Treat every agent action like a transaction at a point of sale. A card swipe is governed by a network. The merchant has a contract. The cardholder has a limit. The terminal enforces rules. There is an audit trail and a chargeback path. Now port that logic to software:

The user expresses an intent, for example: book a flight to Austin next Thursday under 600 dollars, use the company card, aisle seat preferred.
The agent compiles that into a structured request and asks for specific capabilities: read the calendar, hold funds up to 650 dollars, purchase from approved airlines, store the receipt, notify a Slack channel.
The platform checks policy, limits, and attestations, then grants a time-bound capability lease and issues a token to perform the allowed steps.
Execution runs inside a verifiable sandbox with unforgeable logs and live policy checks. If a constraint is exceeded, the lease is revoked mid-run.
The run produces a receipt, a proof bundle, and a liability allocation that determines who pays if something goes wrong.

That is the difference between a flashy demo and an operating system.

A blueprint for an Agent OS

Below is an opinionated blueprint built to turn agents from risky demos into trustworthy economic actors.

1) Signed intent schemas

What it is: A structured, signed document capturing what the user asked for, what the agent proposes, what data sources will be touched, and what outcomes are in scope. Think invoice plus flight plan, signed by the session and by the platform.
Why it matters: Text prompts are ambiguous. Signed intents make scope unambiguous and reviewable.
How to implement now: Start with JSON schemas for the top 20 recurring tasks. Include requested capabilities, limits, data classes, and any confirmations. Sign with platform keys. Store the hash in the run log. Display a human‑readable summary in chat rather than JSON blobs.

2) Fine-grained capability leases

What it is: Capability tokens that are narrow in scope, dollar amount, and time. Example: pay up to 200 dollars to approved vendors in the next 15 minutes; read-only access to email threads tagged travel for one hour.
Why it matters: Least privilege should apply to actions, not only to data. Leases implement least privilege in the action space.
How to implement now: Map each tool into verbs and nouns. Approve verbs, constrain nouns. Add counters, ceilings, and expiration. Make leases easy to revoke and short by default.

3) Zero trust and trusted execution

What it is: Run high-risk steps in isolated compute that can prove what code ran and what it touched. Combine process sandboxing with hardware-backed attestation where available.
Why it matters: If an agent can open finance or email, you must know the code you approved is the code that executed.
How to implement now: Split the agent into three planes. The reasoning plane plans actions. The capability plane handles tokens and leases. The execution plane uses container sandboxes or confidential compute for tool calls crossing trust boundaries, and emits a signed transcript of calls and results.

4) Live revocation with circuit breakers

What it is: The ability to suspend or shrink a lease while a task is running when risk spikes. Triggers can include anomaly thresholds, vendor reputation downgrades, or a user panic button.
Why it matters: One wrong click can cascade. Live revocation turns incidents into near misses.
How to implement now: Add kill switches for each connector. Enforce per‑step policy checks. If the agent tries to increase a spend ceiling or expand scope, block and require a refreshed signed intent.

5) Proof, audit, and receipting

What it is: A tamper‑evident log that answers who, what, when, where, why, and with which token. Every run produces a user‑visible receipt and a platform‑verifiable trail.
Why it matters: Disputes cannot be resolved by vibes. You need evidence for support, compliance, and, eventually, courts.
How to implement now: Use append‑only logs with cryptographic hashing. Store the signed intent, leases, execution attestations, and the step sequence with sensitive inputs redacted. Provide a compact receipt to the user with a link to the full trace under access controls.

6) Insurance primitives and liability allocation

What it is: A clear, priced promise about losses. For example, the platform covers up to 5,000 dollars for mis‑executed purchases that passed intent checks; vendors cover fraud on their connectors; users cover actions outside policy.
Why it matters: Trust compounds when everyone knows who pays when systems fail. Liability turns gray zones into rules.
How to implement now: Start with warranties for a small set of capabilities, like bookings or invoice filing. Price coverage into capability fees. Require vendors to carry or pool coverage to list in a capability directory.

7) Organization‑level policy layers

What it is: Policy as code that applies across agents, users, and apps. It sets data boundaries, spend ceilings, vendor allowlists, and change management.
Why it matters: Enterprises will not adopt agents until they can express policy once and ensure everything honors it.
How to implement now: Provide a central policy console that compiles into checks at lease time and run time. Include templates for common scenarios like month‑end close or customer refunds.

Governance in practice

Design and deploy: A designer asks an app inside ChatGPT to turn a sketch into a Figma wireframe. The app requests a read‑only lease to the design library for one hour and a write lease to a new project folder capped at 20 files. The run ends with a receipt referencing the signed intent and both leases.
Travel booking: An assistant is asked to book three flights under a shared budget. The platform grants a 1,200 dollar purchase capability via an approved connector, limited to the next 30 minutes. If a fare spikes mid‑run, the capability engine blocks the new amount and asks for a refreshed intent.
Expense filing: The agent can read receipts in a mailbox label, but not the inbox. It can create entries below a threshold without review, and it requires a manager’s signoff above that threshold. Every entry carries a run ID and the attested code hash of the expense tool.

Why capability markets will accelerate innovation

Once safe access is real, the platform can expose a market of actions. Developers list capabilities, not just apps. A vendor might list Pay approved vendors under 500 dollars with same‑day settlement or Generate and e‑sign a sales order with compliance checks. Each listing includes a schema, price, performance metrics, and liability terms.

This solves the cold start for agents. An agent can assemble a workflow from a portfolio of capabilities with known costs and guarantees. It also creates price signals. If calendar access with conflict resolution is cheap and reliable, more products will build on it. If a booking connector has higher dispute rates, its token will cost more. Markets make reliability visible and reward it over hype. For a broader view on how AI systems shape user behavior and, by extension, demand for capabilities, see our piece on the preference loop in chat interfaces.

Lessons from recent agent failures

The Comet episode showed how agentic systems inherit a user’s privileges by default. Without leases, a single summarization request can turn into a universal master key. Without signed intents, you cannot distinguish what the user asked from what a page told the agent to do. Without attestation, you cannot prove that the right code obeyed the right policies. These are not patchable afterthoughts. They are architectural requirements.

Practical implications:

Separate the reasoning plane from the execution plane. Do not let planning code hold tokens.
Treat every high‑risk tool call as a transaction. Ask for explicit scopes and amounts.
Add confirmation steps for actions that cross money, identity, or policy boundaries.
Capture every decision in a signed, human‑readable summary inside the chat, tied to the run log.

What to build in the next 90 days

For platform teams:

Ship a minimal signed intent layer for your top workflows. Make it a quiet default with a clear human summary in the chat pane.
Wrap every connector in a capability lease. Start with short expirations and low ceilings. Add circuit breakers and a user panic button that voids all active leases for the session.
Build a receipt generator and a basic dispute process. Keep scope narrow, but make it real. A small, working warranty beats a broad promise on paper.

For app developers:

Adopt the new ChatGPT Apps SDK to meet users where they already work. Design your app surface for conversation‑first tasks. Keep state minimal, push long work to agent jobs, and present receipts for every meaningful action.
If you use computer‑use features on Gemini, treat the operating surface like a production robot. Disable actions you cannot audit, script high‑risk sequences with checks, and keep the agent’s session separate from the user’s main sessions.
Publish capability schemas as reusable building blocks. Price them. Offer warranties. Make your connector the easiest safe choice for agents assembling workflows.

For security and compliance leaders:

Put policy as code in front of agents, not behind them. Start with spend policy and data egress policy. Expand to vendor allowlists and geography rules.
Require attested execution for actions that touch money, personal data, or regulated systems. Accept software‑only sandboxes for low‑risk capabilities.
Track three new metrics: capability dispute rate, mean time to revoke, and coverage ratio. These three numbers will tell you if your governance is real.

What this means for the AI platform race

If Apps and AgentKit pull developers into ChatGPT, and Computer Use brings agents into day‑to‑day software, the race will be won by the platform that makes capability governance boring and automatic. That platform will have:

A clear way to express what an agent may do
A unified runtime that can prove what actually ran
An evidence trail and a warranty when things go wrong
A market where capabilities are composable and priced

Intelligence is compounding. Liability is compounding with it. The platforms that bind them together will convert imagination into GDP faster than platforms that only chase leaderboard gains.

The closing argument

We are crossing from chatbots to operating systems. The pieces that landed this week make it concrete. Apps live in the chat. Agents can use the computer. The leap that matters next is not a smarter sentence. It is an enforceable agreement about actions.

Build signed intent schemas. Lease capabilities. Attest execution. Make revocation instant. Put policy at the organization level. Price and insure capabilities. If we get these steps right, agents will stop being risky demos and will start operating like trustworthy economic actors. That is how the conversational OS becomes a real platform and how the market for capabilities converts creativity into durable value.