GitLab Duo Agent Platform hits beta, DevSecOps orchestrated

Breaking: the first production arena for autonomous agents is software delivery

On July 17, 2025 GitLab announced the public beta of the Duo Agent Platform, a multi agent orchestration layer that sits inside the company’s DevSecOps system of record. The release matters for two reasons. First, it turns agentic coding from a single agent chat toy into a team sport that coordinates planning, testing, security, and change management. Second, it lands where guardrails already exist by default: the pipeline, the merge request, and the production change window. In other words, software delivery is the first real production arena for autonomous agents because it has version control, approvals, and blast radius controls baked in. You can see the feature outline and the roadmap to general availability in GitLab’s press room announcement of the public beta of Duo Agents.

This is not a lab demo. It is a pragmatic step that meets developers directly in their tools, understands their repositories, and proposes changes without bypassing compliance policy or approvals.

What actually shipped in July and August

Agentic chat inside IDEs

Duo Agentic Chat now runs inside code editors rather than only in the browser. In Visual Studio Code and JetBrains IDEs, you can ask for tests, refactors, or explanations and the agent can stage concrete changes. The experience is not just question and answer. It is a dialogue with memory, where you can steer the agent with a shared shorthand. For instance, you can:

Type /tests to draft unit tests around a selected function.
Use /explain to get a plain English unpacking of a hairy method.
Call /include to inject specific files or a merge request into the context so the agent reasons with your project’s actual structure and history.

The value here is speed with accountability. The agent proposes a change and your pipeline still decides whether that change compiles, passes tests, and respects policy.

Agentic chat in the GitLab Web UI

Many development tasks are not code edits. They involve issues, vulnerabilities, jobs, and environment statuses. Duo Agentic Chat in the GitLab Web UI brings the same agentic partner to where triage, planning, and approvals happen. Ask it to summarize a swarm of issue comments, propose a remediation plan for a flagged vulnerability, or explain why a job failed and which commits likely caused it. Because the agent sits next to your system of record, actions are auditable and reversible.

The orchestrated Software Development Flow

The Software Development Flow is the first orchestrated, multi step workflow in the platform. Think of it as a conductor that reads your prompt, drafts a plan, and then coordinates steps that create code, edit files, and prepare a merge request. It understands your repository, your project structure, and the Git history. You can pause the flow, redirect it, or add more context mid flight. The flow is opinionated about staying inside the rails: it stages changes and asks you to accept or amend them, rather than writing into main on its own.

A concrete example helps. You tell the flow, “Add pagination to the audit events list, keep the endpoint backward compatible, add tests, and update docs.” The flow drafts a plan: add query parameters to the controller, create a presenter, write integration tests, update a changelog entry, and touch the docs page. It then creates a feature branch, modifies the files, runs tests locally where supported, and prepares a merge request with a checklist. If a test fails, the flow revises the plan instead of blindly pushing. You still review the diff and your pipeline still enforces gates.

MCP client support to connect skills beyond GitLab

Agents get far more useful when they can safely talk to tools you already run. The platform includes a Model Context Protocol client so Duo Agentic Chat can connect to local or remote MCP servers. That means you can expose internal systems as tools, from a design system registry to a changelog service to your own knowledge bases. In practice, teams will curate a small, allow listed set of MCP tools to keep the agent’s capability surface narrow and auditable. Capability is added as a deliberate skill, not as an unbounded web crawler. For a broader view of how vendors are pushing MCP to the edges, see how Remote MCP at the edge changes where agents run.

IDE coverage for JetBrains and Visual Studio

July’s release brought JetBrains support for IntelliJ, PyCharm, GoLand, and WebStorm, which expanded reach to polyglot organizations. In August, GitLab shipped Visual Studio integration so .NET teams could access agentic chat and agent flows without leaving their primary tool. The August update confirms Visual Studio coverage and outlines early Agent Flows and CLI agent experiments in the GitLab 18.3 release notes.

Why software delivery is ready for agents before most other domains

Enterprises have many places to apply agents, from finance operations to customer support. Software delivery stands out because it already has the ingredients for safe autonomy:

Strong context. The repository, issues, and pipeline history define the work, its boundaries, and its dependencies.
Deterministic checks. Compilers, test suites, security scanners, and policy engines are objective referees.
Change control. Merge requests, approvals, and protected branches limit blast radius.
Observability and audit. Every action has an author, a timestamp, a diff, and logs.

Agents thrive where the cost of a bad suggestion is low and the chance of early detection is high. DevSecOps fits that profile. As other ecosystems mature, the pull request itself is turning into a runtime for agent plans, a trend captured when pull request becomes runtime for agent systems.

How to stand up agent ops inside CI and CD

Standing up agent ops means creating a repeatable way to design, test, approve, observe, and improve agent behavior in your pipelines. Treat agents like any other service that ships code.

1) Policy and permissions

Start with least privilege. Give Duo Agent Platform service accounts only the scopes required to stage changes and open merge requests, not direct push to main. Restrict the set of projects where agents can operate during your pilot.
Use pipeline execution policies and merge request approval policies to set hard gates for agent originated changes. For example, require a human approver for any merge request where the author is an agent identity, and require all security scans to pass with zero critical or high vulnerabilities.
Enable and stream audit events for agent actions to your security information and event management system. This provides a tamper resistant trail of who did what, when, and under which policy.

2) Evaluation before trust

Build a focused evaluation suite for your top five tasks. Include tests that detect regressions, style violations, and missed edge cases. Run these evals both offline against historical repos and in shadow mode during the pilot.
Add red team prompts and known prompt injection traps to the eval set. This is your canary. If the agent follows tainted instructions inside a file or an issue comment, block the run and raise a finding.

3) Observability as first class

Instrument pipelines to label agent originated merge requests and track their journey. Record build time, test pass rate, and rework needed.
Capture agent reasoning summaries inside merge request descriptions. This gives reviewers a fast path to sanity check the plan and the constraints the agent assumed.
Stream Duo and security policy audit events to your log store so you can correlate agent actions with pipeline failures or policy violations.

4) Guardrails that fail closed

Use protected branches and required approvals for agent authors. Add a policy that blocks auto merge when new secrets are detected or when a security policy reports a severe vulnerability.
Keep MCP servers on an allow list. Require signed configurations for any new tool exposed to agents. Disable network egress by default in agent jobs and only open outbound calls to approved destinations.
Run agent jobs on ephemeral runners. If an agent picks up a poisoned dependency, the runner instance disappears at the end of the job, which limits persistence.

5) Governance and change management

Treat the agent’s configuration as code. Version control prompts, rules, and MCP tool lists in a locked repository with change review.
Add a quarterly review for agent capabilities. If a tool is not used or fails evals, remove it. Capability drift is the enemy of safe autonomy.

KPIs to watch in Q4

Pick metrics that reflect speed with safety, not just token level cleverness. Track these three at both the project level and the portfolio level.

1) Lead time for changes

Definition: median time from code committed to code running in production. This is the speed dial for delivery. Target a material improvement, for example a 20 percent reduction from your Q3 baseline by the end of Q4. Expect gains to come from agent help with boilerplate, tests, and documentation.

How to instrument: use your value stream analytics and DORA data. Tag agent originated merge requests so you can split lead time for agent touched work versus human only work. Watch for bottlenecks moving from coding to review as the next constraint.

2) MTTR for incidents

Definition: median time to restore service during a production incident. Autonomy is only valuable if it does not increase recovery times. Many failures are configuration or logic errors that agents can help isolate by explaining diffs and log anomalies, and by proposing targeted rollbacks or fixes.

How to instrument: ensure incident records are created for production outages and linked to the deployment that caused them. Measure MTTR monthly and flag any project where MTTR rises as agent adoption grows.

3) Vulnerability fix rate

Definition: percentage of confirmed vulnerabilities resolved within your service level agreement window. For critical and high severities, many security programs expect 30 days or less. The platform’s vulnerability reports and policies can auto resolve items once scans show they are no longer detected. Use the agent to draft remediation code, but keep your policy thresholds strict.

How to instrument: run static analysis, dependency, and container scans on every agent originated merge request. Track the count resolved within SLA each month and divide by total confirmed vulnerabilities. Trend this by severity and by project. Improvement here is the clearest signal that agents are helping security, not just velocity.

A pragmatic pilot plan for the next 30 days

Week 1: Scope and setup
Choose one application and one service repository with healthy test coverage. Identify five recurring tasks that waste developer time, such as small feature edits, test generation, or vulnerability remediations.
Enable Duo Agent Platform in that group, create a dedicated agent service account, and restrict it to the pilot projects. Install the extensions for Visual Studio Code or JetBrains for the participating developers. Turn on audit event streaming for the group.
Define your policy gates. Examples: at least one human approval for agent authors, zero critical and high vulnerabilities, no new secrets, and license policy must pass.
Week 2: Build the agent toolkit
Configure the Software Development Flow and verify it can create a branch, stage changes, and open a merge request. Create a repo folder for agent rules and prompt templates. Treat it as code.
Add two or three MCP tools that help with context, such as a design system catalog or an internal changelog. Keep the list short and strictly allow listed.
Write a 20 case evaluation suite from past pull requests. Include both golden paths and tricky edge cases. Add three prompt injection traps to detect naive tool calls.
Week 3: Run in shadow, then constrained production
For two days, ask the agent to propose changes while developers perform the same tasks. Compare outputs and fix your prompts and rules based on misses.
Enable agent originated merge requests for low risk tasks only. Require human approval. Label them clearly so reviewers know to look for reasoning and assumptions.
Week 4: Measure and adjust
Gather lead time, rework rate, and test pass rate for agent originated work. Compare against the team’s 90 day baseline.
Review every failure and every near miss. Add new evals. Remove any MCP tool that caused confusion or tempted the agent into unhelpful actions.
Decide go, hold, or rollback. If go, add a second team and repeat the 30 day cycle with a stronger evaluation pack.

Risk mitigations you should implement now

Prompt injection and supply chain manipulation are the two practical threats to multi agent delivery. Use layered mitigations that assume partial failure and fail closed.

Strict context control. Use /include or equivalent explicit context injection rather than letting the agent fetch arbitrary content. Explicit context limits the surface for hidden prompts inside untrusted files.
Sanitization and taint tracking in tools. For MCP tools that read files or logs, implement simple taint rules. If content contains patterns like “ignore all prior instructions,” drop or neutralize that span before the agent sees it. Document the sanitization in your agent rules.
Network and file sandboxing. Run agent flows on ephemeral runners with no outbound network by default. For tools that need the network, only allow specific domains and methods. Mount repositories read write only within a temporary working directory and never expose machine credentials to agent code.
Policy gates that treat agents as untrusted authors. Require human review for agent authors, block merges on security policy violations, and prevent auto merge from agent accounts.
Secure the toolchain. Enforce dependency and container scanning, and require signed artifacts for both first party and third party components. Use provenance attestations such as SLSA levels where supported. Reject unsigned build outputs in production environments.
Secrets hygiene. Keep secrets out of prompts and added context. Ensure redaction is on in IDE extensions and web chat. Scan merge requests for accidental credential leak and block if detected.
Continuous evaluation. Run your prompt injection traps and regression suite on every change to agent rules, tools, or model versions. Store scores over time so you can spot drift.
Incident response readiness. When something slips through, you need speed. Make the agent produce all changes as merge requests with complete diffs and reasoning. That gives responders a fast way to revert and a paper trail to learn from.

Where this fits in the agent landscape

GitLab is not alone in pushing agents into production, but the integration point matters. By anchoring autonomy inside merge requests and pipelines, GitLab aligns with a wider industry shift that values supervision, audit, and policy. That is similar in spirit to how AgentCore makes agents production ready, with a focus on observable and permissioned actions rather than open ended chat. The common thread is simple. If an agent can be measured, it can be managed.

What this means for teams

The Duo Agent Platform’s public beta gives teams a place where agents can safely help, and sometimes take the lead, without skipping the governance that protects production. The July and August releases brought the fundamentals together: agentic chat in the IDE and the Web UI, an orchestrated Software Development Flow, a client for Model Context Protocol tools, and broad IDE coverage that reaches both JetBrains and Visual Studio developers. The result is a stack where autonomy is not a leap of faith. It is supervised, auditable, and measured.

If you pilot now and instrument it well, your Q4 dashboard can show shorter lead time, steady or improved MTTR, and a higher vulnerability fix rate. Those are the signals that matter to engineering leaders and security owners. Agents are ready for production when the pipeline says so, not when a demo looks clever.

That is the opportunity in front of you. Start small, measure honestly, harden your policy, and let the work speak. The teams that learn how to orchestrate human and agent skills inside the pipeline will ship better software faster and with fewer surprises.