AgentKit and ChatGPT Apps Make Agents a Native Platform

The day agents moved in

OpenAI’s DevDay 2025 drew a line between the last era of AI features and the next era of AI-first products. The big idea was simple and bold. Agents will not sit at the edge of your product anymore. They will be the product. Two pillars make that shift real. First, a developer stack for building production-grade agents with built-in tools, permission prompts, and computer use. Second, a way to ship those experiences inside ChatGPT as native in chat apps that users can find, install, and control.

If you are a founder or a product lead, this is the moment to stop gluing a model to your app and start treating the agent as both your runtime and your channel. The pieces are finally aligned so you can build useful, safe, and shippable autonomy on day one.

What actually launched

Here is the concise summary of what changed.

OpenAI’s agent stack. The Responses API and the Agents SDK let you compose agents that can call tools, browse, search, read and write files, and operate a virtual computer. Agents can also use connectors to reach into services like calendars, storage, code hosts, and other accounts once a user grants access.
Computer use and permissioned actions. Agents get a secure virtual computer to click, type, run code, and generate documents with clear prompts to the user before anything consequential happens. The user can interrupt or take over at any time.
In chat apps. An Apps SDK lets developers package interactive apps that live inside ChatGPT. Apps ask for scopes, get reviewed against guidelines, and appear in a directory so users can discover them.
Real world example. Coinbase built AgentKit on top of the agent stack, a toolkit that gives agents a wallet and onchain skills like transfers and contract calls. It shows how permissioned actions can be made practical and safe even in sensitive domains.

Together, this is a full life cycle. Build the agent with native tools and guardrails. Ship it where users already work and chat. Grow through a store-like directory with policies and review. One platform from idea to distribution.

Why this matters for startups

Most teams spending months on agent projects hit the same three roadblocks: safety and control, tool orchestration, and distribution. DevDay’s releases attack all three at once.

Safety and control. Permission prompts, scopes, and an audit trail are now native. An agent does not just act. It asks. It shows what it plans to do. It runs in a sandboxed computer. That lowers organizational risk and cuts down on bespoke policy plumbing.
Tool orchestration. Search, file handling, code execution, and browsing are first-party tools designed to work together. You do not need five vendors to run a basic workflow end to end.
Distribution. Shipping inside ChatGPT means your first users do not need new accounts or a new interface. They find your app in a directory, connect the accounts you request, and start. The discovery and install loop that made mobile app stores powerful now exists for in chat experiences.

The result is a much shorter path from demo to daily use. Instead of stitching a model into your web app and hoping users will come, you bring your workflow to where millions already are and you do it with guardrails that enterprises can accept.

A mental model for the new runtime

Think of the agent stack as a three layer cake.

Reasoning and memory. The model plans, decides, and keeps track of the task. It knows when to ask permission and when to pause.
Tools and environment. Built-in tools provide web search, file search, and a virtual computer. Connectors supply access to calendars, drives, code repositories, and more when a user consents.
Governance and audit. Permission dialogs, scopes, and traces give users and admins visibility into what happened and why. You can replay steps, measure performance, and improve prompts or policies with evidence rather than guesswork.

Many teams have tried to hand craft this stack. Few get it right because each layer interacts with the others in subtle ways. A native runtime removes most of that complexity so you can spend time on the one thing only your product can do. The job your user hired you to do.

In chat apps are the distribution breakthrough

User experience shifts are often quiet until they are obvious. In chat apps feel small at first glance. Then you use one and notice what changed. A travel app is not another tab. It is a teammate in the conversation where you plan a trip. A design app is not a separate canvas you must visit. It is a creative partner in the same thread where you described your campaign. The handoffs between model and app disappear because the app is part of the conversation itself.

From a builder’s point of view, two design choices matter most.

Scopes instead of passwords. Apps ask for capabilities in clear language. Users decide once, can revoke later, and see what the app is allowed to do. That is cleaner and safer than ad hoc API key forms.
Review and guidelines. There is an explicit bar for listing in the directory and higher bars for featured placement. This is how quality rises and scams are throttled before they get reach.

If you built for mobile in 2008 or for browser extensions in 2010, this will feel familiar in all the right ways. The graph of discovery, installs, ratings, and iterative updates is the same pattern that let small teams punch far above their weight.

What AgentKit signals about actions

Coinbase’s AgentKit is a useful signal for two reasons. First, it shows that agents can perform high risk actions safely with the right primitives. Wallet creation, transfers, swaps, and contract calls demand strict consent, clear logging, and predictable error handling. Second, it shows that the agent can be the place where a business model lives. If an agent can move value, it can also charge for value.

You do not need to work in crypto to learn from this. The pattern is portable. Replace a blockchain action with another sensitive step. Filing an expense report. Booking travel. Creating a purchase order. Updating a customer record. The principle is the same. Model plans. Agent asks. User approves. Action runs in a controlled environment with an audit trail.

Shipping an autonomous workflow on day one

Here is a concrete way to turn the new stack into something you can launch this month. We will use a sales operations example because it touches data, actions, and safer computer use.

Define the job to be done. Pick one measurable outcome. For instance, prepare and send a weekly pipeline review to each account executive, including a forecast delta, a list of at risk deals, and three suggested next steps.
Design the permission model. The app will ask for read access to the CRM and write access to a drafts folder in your email provider. It will also request permission to operate a virtual computer session to generate a spreadsheet and a slide deck. Default to least privilege. Draft emails only, never send without a user click.
Assemble built-in tools. Use file search to pull last week’s report. Use web search if your process references external market data. Use computer use to open a spreadsheet, compute deltas, and export a chart. Keep each step idempotent so a retry does not duplicate work.
Plan the agent’s control flow. The agent runs nightly. If a connector is revoked, it pauses and alerts the owner. If a tool fails twice, it falls back to a simplified text report to avoid missing the deadline.
Expose a clean surface in chat. The user can type two commands. Set up my weekly review and show this week’s review. The agent prints what it plans to do before it acts. It posts a short checklist when done with links to the draft deck and draft email.
Log everything. Store a trace for each run with prompts, tool calls, timing, and outcomes. Flag steps that needed human intervention. Compute a simple success rate and a mean time to complete. You will use this data to improve prompts, fine tune actions, and prioritize fixes.
Package as an in chat app. Describe exactly what the app does, the scopes it needs, and why. Provide a one minute video showing the permission prompts, the computer use session, and the result. That video is the new landing page.
Ship to five teams. Do not go broad first. Put the app in the hands of a small group and measure. Tighten the prompts and the error handling. Only then open it to the directory.

This is not a toy. It is a workflow that saves hours each week, uses permissioned actions, and runs in a controlled environment from the first day you ship it.

Design patterns that work with permissioned actions

The most reliable agents share a few traits.

Predictable steps. Break a goal into small actions with clear preconditions and postconditions. The plan should read like a checklist a human would approve.
Human gated moments. Ask for consent when money moves, data leaves a company boundary, or a workflow triggers an external commitment like a purchase or a booking.
Explain first. Before touching a tool or a file, tell the user what you intend to do and why. Then ask to proceed. Users gain trust when they can see the plan.
Limited retries. Two attempts per step, then a safe fallback. Infinite retries look like a loop and feel like a bug.
Idempotent tools. Make tool calls re-runnable without side effects. Use dry run modes where possible.
Clear receipts. After an action, write down what changed and where to see it. A link to a draft, a path to a file, a transaction identifier.

These patterns are not flashy. They are what turns a demo into a dependable coworker.

The new go to market for agents

A native runtime plus an in chat channel gives startups a go to market that looks like this.

Pick a narrow job and own it. The narrower the better. Forecast updates, interview scheduling, invoice matching, budget reconciliation, privacy request processing. Vertical specialization is a feature, not a bug.
Treat scopes like a value proposition. Ask for the minimum you need to produce value on day one. Explain why each permission matters in the language of outcomes, not in the language of APIs.
Instrument from the start. Log traces, time per step, error classes, and approval rates. Publish reliability and time saved to customers in plain numbers. You are selling outcomes, so measure them.
Practice safe computer use. Limit the session length. Whitelist the apps and file types your agent can touch. Keep generated files in a dedicated folder. Make it easy to clean up.
Build a human handoff. When the agent gets stuck, it should package context, attach artifacts, and route to a human with a crisp summary. The fastest way to build trust is to avoid pretending you are perfect.
Price per successful completion. Align price to the result. Charge per booked meeting, per reconciled invoice, or per approved expense report. Provide a generous trial that proves your completion rate.

This is how you turn early directory placement into retained customers instead of one time curiosity.

What to watch as the agent store economy forms

A store-like marketplace creates new power laws. A small number of apps will capture a large share of attention and revenue. The same forces that shaped mobile apply here, but in chat context which is more intimate and more continuous than a home screen.

Ranking and quality signals. Expect the directory to weight safety, reliability, speed, and user ratings. Treat these as product features to optimize, not as afterthoughts.
Policy as product. Review guidelines and safety constraints are not paperwork. They are design inputs. Build your flows so they are easy for a reviewer to reason about.
Identity and accountability. Apps need a clear owner with real support. The teams that answer fast and fix quickly will rise.
Vendor concentration risk. Building directly on a platform is a trade. You gain reach and speed. You accept review queues and policy changes. Mitigate with clean separators in your code so you can move tools or models if needed without a rewrite.

If you grew a business on a mobile store or a cloud marketplace, you know the pattern. The winners will treat every review constraint as an opportunity to improve clarity and trust.

A second example in a sensitive domain

Suppose you build a finance assistant that drafts vendor payments for a mid market company.

The agent asks for read access to invoices, write access to a payments draft queue, and permission to operate a virtual spreadsheet to compute totals and taxes.
It validates that each payment has a matching purchase order and receipt. If anything is missing, it prepares a human ready packet that lists the gaps.
For each drafted payment, it generates a plain language receipt with the inputs, calculations, and links to supporting docs.
A controller approves the batch. Only then does the agent move the drafts into the accounting system’s queue. The agent never sends money. It prepares, explains, and waits.

This is safe autonomy. It saves hours. It leaves a trail. It respects the limits of software acting on behalf of people.

How it fits the broader landscape

OpenAI is not the only company racing to make agents feel native. The ecosystem is converging on similar ideas, which is good news for builders choosing where to bet.

Copilot style workflows are becoming standard in DevOps and productivity. See how Microsoft’s stack is evolving by reading Copilot Agent goes GA.
Enterprises are demanding reliability, permissions, and audit before deployment. Salesforce’s approach is captured in Agentforce 3 for enterprises.
Cloud platforms are shipping deeper runtimes with built-in tools and governance. Google’s trajectory is clear in Vertex AI Agent Engine leap.

Across these moves, a pattern repeats. Runtimes become opinionated, permissions become explicit, and distribution moves into the user’s flow of work. DevDay’s stack fits that pattern and accelerates it by placing apps inside the conversation where decisions already happen.

Build now checklist

If you want to act on this in the next two weeks, use this short checklist as your starting template.

Define one job that produces measurable value in one session.
Write the permission story in plain language and stick to least privilege.
Choose two built-in tools and one connector to start. Add more only after you hit a stable completion rate.
Script the agent’s narration. It should declare intent before actions and summarize outcomes after actions.
Set hard limits on computer use sessions. Track files written and clean up stale artifacts.
Log prompts, tool calls, and timing for every run. Create daily reliability and time saved dashboards.
Publish a simple policy: what the agent will never do without approval.
Ship to five teams, gather friction notes, and iterate.
When completion rate crosses your threshold, package as an in chat app and apply for directory listing.

This list will not cover every edge case, but it will keep your first release within a safe and valuable boundary.

The takeaway

DevDay 2025 did not just ship features. It shipped a platform posture. Agents now have a native runtime with tools, a native safety model with permissions and computer use, and a native distribution path through in chat apps. Coinbase’s AgentKit demonstrated that real world actions can be packaged cleanly on top of that stack. The next wave of startups will not ask whether agents can do real work. They will ask which job to own first and how to price the outcome.

The moment to build is now. Pick a job. Design the permissions. Ship an in chat app. Prove reliability with traces and receipts. In a year, the best agents will feel less like software and more like trusted colleagues who quietly deliver results while you focus on the hard decisions only humans can make.