AWS previewed 3 agents that can run for days — I’m intrigued but cautious

Executive summary – what changed and why it matters

AWS has entered the long‑running agent race with three preview “Frontier agents”: Kiro (an autonomous coding agent), AWS Security Agent, and DevOps Agent. The headline capability is persistent, multi‑session context – Kiro claims it can learn team workflows and operate autonomously for hours or days to implement spec‑driven code changes.

Impact: These agents aim to scale developer productivity by automating end‑to‑end coding, security review, and operational testing.
Immediate caveat: preview only; pricing, SLAs, and hard accuracy metrics are not disclosed.
Competitive context: OpenAI and others have announced long‑run agents; AWS emphasizes enterprise integration and spec‑driven development.

Key takeaways for decision‑makers

Kiro promises persistent context and the ability to autonomously perform multi‑step engineering tasks (AWS demo mentioned updating 15 related codebases from one assignment).
AWS Security Agent automates live security checks, post‑commit testing, and suggested fixes; DevOps Agent automates performance and compatibility testing.
Preview release – don’t assume production readiness: expect integration, validation, and governance work before rollout.
Major operational risks: hallucinations, supply‑chain exposure, privilege management, and auditability for autonomous code changes.

Breaking down the announcement

At re:Invent, AWS framed these Frontier agents as an evolution of its July Kiro coding tool. Kiro moves beyond one‑shot code generation to “spec‑driven development”: it learns coding standards and team preferences by observing workflows, scanning repo history, and interacting with developers to confirm assumptions. AWS says Kiro maintains “persistent context across sessions,” enabling it to be assigned a complex backlog task and run over hours or days with minimal human intervention.

The Security and DevOps Agents are complementary: Security Agent flags vulnerabilities as code is written, runs tests, and proposes fixes; DevOps Agent runs performance and compatibility tests to reduce incidents during deployments. Together they target two common sources of developer toil—manual security reviews and post‑deploy firefighting.

Capabilities, constraints, and unknowns

Capabilities: Persistent session memory, spec‑driven changes, cross‑repo updates, and automated security and performance testing. AWS positions integration with its cloud tooling as a differentiator for enterprises already on AWS.

Constraints and open questions: AWS hasn’t published quantitative accuracy, mean‑time‑to‑fix, cost per automated task, or throughput limits. “Hours or days” of operation is vague compared with OpenAI’s stated 24‑hour runs for GPT‑5.1‑Codex‑Max. Crucially, long‑running behavior amplifies risks around hallucination, unintended changes, and stale context if upstream state changes outside the agent’s view.

Competitive and market context — why now

We’re in a tight window where vendors are racing to convert coding suggestions into autonomous, continuous agents. OpenAI, Anthropic, and other tooling vendors have been pushing longer context windows and agentic features. AWS’s advantage is enterprise glue: built‑in integration with developer pipelines, IAM, and cloud resources. The timing lines up with CIO pressure to reduce cloud ops costs and labor shortages in engineering and security.

Risk, compliance, and governance considerations

Operationalizing autonomous code agents changes control surfaces. Key governance items to address before production use:

Privilege and access control: Restrict what agents can modify; use least privilege and ephemeral credentials.
Auditability: Ensure detailed immutable logs, diffs, and rationale for every autonomous change for compliance and incident analysis.
Human‑in‑the‑loop gates: Define mandatory approvals for high‑risk changes (security/security‑critical paths, infra changes).
Testing and rollback: Treat agent commits like developer changes — require CI, unit and integration tests, and automated rollback plans.
Data residency and IP: Clarify training data flows if agents observe internal tools and code; check contractual and regulatory requirements.

Recommendations — who should do what, and when

CTOs/Product leads: Run a controlled pilot on low‑risk subsystems. Measure correctness, rework rate, and mean time to remediation versus human teams.
Security and compliance teams: Demand transparent logs, the ability to revoke agent privileges, and pre‑deploy policy checks before expanding scope.
Engineering managers: Create clear spec templates and acceptance criteria—Kiro’s spec‑driven approach will only scale with precise specs and test coverage.
Platform teams: Integrate agent outputs into CI/CD, implement canary deployments, and require automated test gates and rollback triggers.

Bottom line

AWS’s Frontier agents are a meaningful step toward autonomous, long‑running engineering assistants and pair well with companies already invested in AWS tooling. However, preview status, missing metrics, and well‑known LLM failure modes mean enterprises should pilot cautiously, prioritize governance, and treat agentic commits as production‑grade artifacts requiring the same controls as human work.

AWS previewed 3 agents that can run for days — I’m intrigued but cautious

Executive summary – what changed and why it matters

Key takeaways for decision‑makers

Breaking down the announcement

Capabilities, constraints, and unknowns

Competitive and market context — why now

Risk, compliance, and governance considerations

Recommendations — who should do what, and when

Bottom line

Andrew

Continue Reading

I just learned NASA and USPS dropped Canoo vans — and I’m honestly worried

I’m surprised OpenAI, Anthropic, and Block just handed core agent tech to the Linux Foundation

After bleeding cash on dense LLMs, I’ve moved our agents to Nemotron 3 Nano