← Home

Library

54 accepted items

Week of Jun 15

  • concepts · article

    Loops are Replacing Prompts. Verification is About to Be Your Biggest Problem.

    Arjun Iyer · Jun 21

    Three eras of AI coding — prompt-driven, spec-driven, loop-driven — and why verification becomes the binding constraint as the human moves up a level each era. Loop economics: total cost = iterations-to-verified × cost-per-iteration.

  • concepts · article

    Loop Engineering

    Addy Osmani · Jun 21

    Addy Osmani names and defines loop engineering: replacing yourself as the person who prompts the agent — you design the system that does it instead. The shift: you held the tool (type → read → type). Now you build a small system that finds the work, hands it out, checks it, and decides the next thing.

  • resources · article

    WTF Is a Loop? Part 2: The 15 Loops People Are Actually Using

    Matt Van Horn · Jun 20

    Matt Van Horn cataloged 15 agent loops people are actually using in practice — a practical counterpart to the theoretical loop-engineering canon. Builds on his Part 1 debate map from Jun 8.

Week of Jun 8

  • concepts · article

    AI Agent Security and Prompt Injection Vulnerabilities

    Airia Team · Jun 14

    Agentic AI systems face a fundamental security flaw where they cannot distinguish between instructions and data, leading to prompt injection attacks. The 'Lethal Trifecta' combines access to private data, exposure to untrusted tokens, and exfiltration vectors, enabling attacks like EchoLeak and GeminiJack that steal sensitive data through hidden instructions in emails and documents.

  • concepts · article

    AI Agent Orchestration Patterns for Production

    JobsByCulture · Jun 14

    Multi-agent AI orchestration coordinates multiple agents to solve complex tasks that single agents cannot handle due to context limits, specialization needs, or parallelism requirements. The Sequential Chain pattern runs agents in fixed sequence where each agent's output feeds the next, ideal for tasks with natural linear progression like document processing pipelines.

  • concepts · article

    Compound Engineering 8-Step Framework Evolution

    Kieran Klaassen · Jun 14

    Compound engineering has evolved from a 4-step loop (brainstorm → work → review → compound) to an 8-step process that adds ideation and planning at the front, and polishing at the end. The framework recognizes that AI handles the middle work phases well, but humans remain crucial for initial vision-setting and final quality assessment.

  • concepts · article

    Agent Engineering Framework and Definition

    Latent.Space · Jun 14

    Explores the challenge of defining AI agents, presenting multiple perspectives from OpenAI's TRIM model (model + instructions + tools + runtime) to Lilian Weng's definition (LLM + memory + planning + tools). Proposes six core elements: LLMs with tools, encoded intent, LLM-driven control flow, multi-step planning, long-running memory, and goal-oriented behavior.

  • tools · article

    AI Coding Agent Evaluation Skills Framework

    Hamel Husain · Jun 14

    Hamel Husain released evals-skills, a structured set of capabilities that teach coding agents how to effectively evaluate AI products. The framework includes six core skills from error analysis to building review interfaces, designed to help agents distinguish between different types of failures and implement proper evaluation pipelines.

Week of Jun 1

  • tools · article

    LLM Council Multi-Model Query System

    Jun 7

    A local web application that sends queries to multiple LLMs simultaneously, has them review and rank each other's responses anonymously, then produces a final consolidated answer via a designated Chairman LLM. Built with FastAPI backend and React frontend, using OpenRouter API to access various models like GPT, Claude, and Gemini.

  • tools · article

    Matt Pocock's Production Agent Skills Library

    Yash Thakker · Jun 7

    A collection of 20+ production-grade AI agent skills for real engineering workflows, organized into planning, development, and tooling categories. Created by TypeScript educator Matt Pocock, the repository has 25,500+ GitHub stars and focuses on test-driven development, architecture planning, and git safety rather than experimental coding.

  • concepts · article

    Enterprise AI Tool Cost Management Strategies

    Simon Willison · Jun 7

    Uber implemented $1,500 monthly spending caps per employee for AI coding tools like Cursor and Claude Code after exceeding their 2026 AI budget in four months. This represents approximately 11% of median engineer compensation and suggests companies are finding real value in AI tools despite needing cost controls.

  • concepts · article

    How Coding Agents Work with LLMs

    Simon Willison · Jun 7

    Coding agents are software harnesses that extend LLMs with additional capabilities through invisible prompts and callable tools. LLMs work by completing text sequences using tokens (not words), with providers charging based on token usage. Chat templated prompts simulate conversations but require replaying entire conversation history each time, making longer conversations more expensive.

Week of May 25

  • concepts · article

    Trust Layer for AI-Generated Office Files

    Nate · May 31

    AI can generate polished-looking spreadsheets and presentations that contain hidden errors like broken formulas or wrong data sources. A systematic verification workflow with source tracking, assumption logging, and hostile review prevents shipping confident-looking but incorrect work.

  • concepts · article

    AI-Native Company Operations and Workforce

    Lenny Rachitsky · May 31

    Every, a 15-person company, generates 7-figure revenue with 100% AI-written code across 5 products. They use specialized AI agents for different tasks and employ an 'AI operations lead' to maximize team productivity through AI tools.

  • concepts · article

    Agent-Native Software Architecture Paradigm

    Dan Shipper · May 31

    A new software development approach where AI agents, rather than traditional code, form the core of applications. Features are defined as prompts describing desired outcomes rather than step-by-step instructions, making software more malleable and accessible to non-programmers.

  • concepts · article

    AI-Native Business Model and Organizational Structure

    Dan Shipper · May 31

    Every operates as an AI-first company combining media, products, and consulting with just 15 employees generating $1.2M ARR. Their model involves living in the future with AI tools, documenting observations, building missing solutions, and teaching what works. Everyone is a generalist who uses AI for all tasks, blending traditional job roles.

  • concepts · article

    Harness Engineering and Adversarial AI Architecture

    Eric · May 31

    Prompt engineering has reached its limits for complex autonomous tasks, leading to a new paradigm called 'Harness Engineering' that focuses on building structured environments around AI agents. Anthropic's breakthrough uses a GAN-inspired architecture with separate Generator and Evaluator agents creating adversarial feedback loops to overcome AI's inability to self-critique effectively.

  • concepts · article

    KPMG-Anthropic Strategic AI Alliance for Enterprise

    May 31

    KPMG announced a global alliance with Anthropic to integrate Claude AI across its entire workforce of 276,000+ employees and core business operations. The partnership embeds Claude into KPMG's Digital Gateway platform for client work in tax and legal services, and establishes KPMG as Anthropic's preferred partner for private equity.

  • tools · article

    Project Glasswing AI Vulnerability Discovery

    May 31

    Project Glasswing uses Claude Mythos Preview AI model to find over 10,000 critical vulnerabilities across systemically important software with 50+ partners. The model shows 10x improvement in bug discovery rates compared to previous methods, with Cloudflare finding 2,000 bugs including 400 critical-severity issues.

  • tools · article

    Anthropic Acquires Stainless SDK Platform

    May 31

    Anthropic acquired Stainless in May 2026, a company that generates SDKs and MCP server tooling across multiple programming languages. Stainless has powered all official Anthropic SDKs since the API launch and helps hundreds of companies build developer tools and agent connectors.

  • concepts · article

    PwC Claude Enterprise AI Implementation Strategy

    May 31

    PwC and Anthropic expanded their partnership to deploy Claude across PwC's global workforce of hundreds of thousands, focusing on agentic technology builds, AI-native deal-making, and enterprise function reinvention. The collaboration includes training 30,000 professionals and launching a Claude-based finance business unit, with production deployments already cutting delivery times by up to 70%.

  • tools · article

    Claude Opus 4.8 AI Model Release

    May 31

    Anthropic released Claude Opus 4.8 with significant improvements in coding, reasoning, and agentic tasks compared to previous versions. The model shows better judgment, tool calling efficiency, and reliability in autonomous workflows, with new features like effort control and dynamic workflows in Claude Code.

  • concepts · article

    LLM Knowledge Bases for Business Intelligence

    May 31

    LLM knowledge bases transform raw organizational data into self-organizing, queryable intelligence systems that improve with use. Andrej Karpathy demonstrated this approach by having AI compile and maintain a 100-article knowledge base without complex RAG infrastructure. The AI-driven knowledge management market is projected to reach $11.24 billion by 2026.

  • concepts · article

    Enterprise LLM Wiki Knowledge Management Pattern

    May 31

    Karpathy's LLM Wiki pattern uses an AI agent to automatically maintain a personal knowledge base by processing raw sources into structured wiki pages with cross-references and contradiction detection. The pattern works well personally but fails at enterprise scale because companies lack a dedicated curator and need automated ingestion from existing work tools rather than manual curation.

  • concepts · article

    Forward Deployed Engineers in AI Companies

    May 31

    Major AI companies like Google, OpenAI, and Anthropic are rapidly hiring Forward Deployed Engineers (FDEs) through simplified processes and separate organizations. These roles involve integrating AI systems into enterprise customers' workflows and operations. The role is evolving from platform engineering toward solutions architecture and consulting.

  • concepts · article

    Forward Deployed Engineers in Enterprise AI

    May 31

    AI vendors are embedding Forward Deployed Engineers (FDEs) to help enterprises implement agentic AI solutions, but Gartner predicts 70% of companies will abandon these projects by 2028 due to high costs and lack of internal capabilities. The key is ensuring knowledge transfer and capability building rather than creating vendor dependency.

  • concepts · article

    Palantir's Forward Deployed Engineer Enterprise Model

    MindStudio Team · May 31

    Palantir's Forward Deployed Engineer (FDE) model embeds engineers directly inside client companies to build and deploy AI solutions, bridging the knowledge gap between AI capabilities and business requirements. This approach drove 640% returns and is now being adopted by Anthropic and OpenAI for enterprise deployment.

  • concepts · article

    AI Agent Memory Benchmarks and Architectures 2026

    May 31

    Standardized benchmarks LoCoMo, LongMemEval, and BEAM now measure AI agent memory performance across dimensions like recall, temporal reasoning, and multi-session continuity. Top systems achieve 92.5 on LoCoMo and 94.4 on LongMemEval, with major gains in temporal reasoning (+29.6 points) and multi-hop queries (+23.1 points).

  • concepts · article

    Claude Memory Architecture for Persistent Context

    May 31

    Agent failures in long-running workflows stem from context drift, not model limitations. The solution is engineered memory architecture using three layers: CLAUDE.md for human instructions, auto memory for model learnings, and managed stores for versioned persistence. This shifts focus from endless chat history to selective state reconstruction.

  • concepts · article

    LLM-as-a-Judge for Automated Model Evaluation

    Karyna Naminas · May 31

    LLM-as-a-Judge uses powerful language models like GPT-4 to automatically evaluate AI outputs at scale, achieving 80% agreement with human evaluators while providing 500x-5000x cost savings over manual review. The approach involves prompting capable models to assess quality, safety, and relevance of other models' outputs based on specified criteria.

  • tools · article

    Claude Code Memory System Architecture

    orchestrator.dev · May 31

    Claude Code has a four-layer memory architecture that allows persistent storage of codebase context, architecture decisions, and debugging history across sessions. Most developers only use 10% of its capability, leading to repetitive corrections and lost context between sessions.

  • tools · article

    LLM Judge Model Selection Framework 2026

    NVJK Kartik · May 31

    Comprehensive comparison of 8 LLM judge models (Claude, GPT-5, Gemini, Luna-2, etc.) evaluated on three critical axes: human correlation on specific rubrics, cost per evaluation, and self-preference bias. Argues against using generic benchmarks like SummEval for judge selection.

  • concepts · article

    LLM as Judge Pattern for Agent Safety

    MindStudio Team · May 31

    A safety pattern where a second AI model evaluates and approves agent actions before execution, preventing costly mistakes in production workflows. The judge acts as a gatekeeper that can pause, revert, or route problematic actions to humans when rule-based guardrails aren't sufficient.

Week of May 11

  • tools · article

    Claude Code Agent View CLI Dashboard

    https://www.facebook.com/testingcatalog · May 17

    Anthropic launched Agent View for Claude Code, a command-line dashboard that manages multiple parallel coding sessions from a single interface. The feature allows developers to run background coding tasks, monitor session states, and switch between agents without managing multiple terminal windows.

  • tools · article

    Opus 4.7 Productivity Tips from Boris Cherny

    May 17

    Six practical tips for maximizing productivity with Claude's Opus 4.7, including auto mode for permission handling, effort level configuration, focus mode for cleaner output, and verification patterns. Features like recaps and the /fewer-permission-prompts skill help streamline long-running AI tasks.

  • concepts · article

    Multi-Agent AI Systems Architecture and Performance

    May 17

    Claude's Research feature uses multiple AI agents working in parallel to handle complex, open-ended research tasks that require dynamic planning and exploration. The multi-agent approach outperformed single-agent systems by 90.2% on internal evaluations, with token usage explaining 80% of performance variance.

  • concepts · article

    Context Engineering vs Prompt Engineering

    May 17

    Context engineering is the evolution of prompt engineering, focusing on optimizing the entire set of tokens and information available to an LLM during inference, not just the prompts. As AI systems become more agent-like with multi-turn interactions, managing context becomes critical due to 'context rot' - the degradation of model performance as context length increases.

  • tools · article

    Claude Agent SDK for Building AI Agents

    May 17

    The Claude Agent SDK (formerly Claude Code SDK) enables developers to build powerful agents by giving Claude access to computer tools like terminal, file editing, and bash commands. This approach allows agents to work like humans do, enabling applications beyond coding including finance, research, and customer support agents.

  • tools · article

    Claude Code Performance Issues and Fixes

    May 17

    Anthropic identified three separate issues that degraded Claude Code performance between March-April: default reasoning effort reduced from high to medium, a memory clearing bug causing forgetfulness, and verbosity reduction hurting code quality. All issues were resolved by April 20, with the company resetting usage limits and improving their change management process.

  • resources · article

    Processing failed

    May 17

    Could not process content automatically.

  • tools · article

    /grill-with-docs: Enhanced AI Collaboration with Documentation

    May 17

    An evolved AI prompting technique that combines intensive questioning (/grill-me) with active documentation management. It maintains CONTEXT.md files for shared terminology and creates Architectural Decision Records (ADRs) while exploring design decisions through AI dialogue.

  • tools · article

    Claude Platform Updates and Multi-Agent Features

    Simon Willison · May 17

    Anthropic announced several Claude platform improvements including doubled rate limits, SpaceX Colossus data center partnership, and three new Claude Managed Agents features: multi-agent orchestration, outcomes-based iteration, and self-improvement through 'dreaming'. API volume increased 17x year-over-year with focus on developer productivity tools.

  • concepts · article

    Real-time Search Quality Evaluation Systems

    May 17

    Sierra developed an evaluation system that measures AI agent search quality against real conversations by creating daily 'golden datasets' from anonymized customer interactions. The system uses retrieval metrics like recall, precision, and nDCG to identify search failures and drive continuous improvement, leading to up to 16 percentage point improvements in resolution rates.

  • concepts · article

    Fine-tuning Agents with Reverse-Engineered Training Data

    May 17

    Shopify fine-tuned Qwen3-32B to generate workflow automations from natural language by reverse-engineering training data from existing user workflows. They achieved 2.2x speed improvement and 68% cost reduction compared to frontier models by starting with validated production workflows and working backwards to generate plausible user queries.

  • tools · article

    Claude Financial Services AI Agent Framework

    May 17

    A comprehensive toolkit providing pre-built AI agents and plugins for financial workflows including investment banking, equity research, and wealth management. Offers dual deployment options via Claude Cowork plugins or Managed Agents API with specialized functions like pitch generation, market research, and financial reconciliation.

  • tools · article

    OpenKB - Open Source Knowledge Base System

    May 17

    OpenKB is an open-source CLI tool that compiles raw documents into structured, wiki-style knowledge bases using LLMs. Unlike traditional RAG systems that rediscover knowledge on every query, OpenKB creates persistent knowledge that accumulates over time with automatic cross-references and contradiction detection.

  • concepts · article

    AI Agent Workflows for 10x Engineering Productivity

    Rhea Purohit · May 17

    Two engineers at Every ship like a team of 15 by designing AI agent workflows that compound - using a meta-prompt that writes prompts to transform rough ideas into detailed GitHub issues, then carefully planning before coding. They emphasize fixing problems early in the workflow when stakes are low, inspired by Andy Grove's High Output Management principles.

  • tools · article

    Anthropic Claude Managed Agents Platform Launch

    Dan Shipper, Marcus Moretti, and Katie Parrott · May 17

    Anthropic announced Claude Managed Agents with three key features: multi-agent orchestration, dreaming (learning from past sessions), and outcomes (goal-oriented loops). The platform now provides AI models with harness and host computer, representing a shift from simple text completion to full AI infrastructure hosting.

  • concepts · article

    LLM Wiki vs RAG Knowledge Management

    May 17

    Andrej Karpathy's LLM wiki is a three-folder markdown system that loads structured knowledge directly into LLM context, while RAG retrieves chunks dynamically from vector stores. LLM wiki excels for personal-scale knowledge bases (up to 100 articles) with 95% token savings, while RAG scales to enterprise-level millions of documents.

  • concepts · article

    Authorization Propagation in Multi-Agent AI Systems

    May 17

    Multi-agent AI systems face a distinct security challenge beyond prompt injection: maintaining authorization invariants as non-human agents delegate tasks, retrieve data, and synthesize results across changing boundaries. This 'authorization propagation' problem involves three sub-problems: transitive delegation, aggregation inference, and temporal validity that classical access control models don't fully address.

Week of Jan 12

  • concepts · article

    Agent-Native Software Development Principles

    Jan 13

    Five core principles for building software where features are outcomes described in prompts rather than written code. Agents use atomic tools in loops to achieve objectives, with capabilities emerging from tool composition rather than explicit programming.

  • tools · article

    Claude Code Advanced Features Guide

    Ado Kukic · Jan 13

    Comprehensive guide to Claude Code's productivity features including project onboarding with /init, context management with @ mentions, instant bash execution with ! prefix, and workflow shortcuts like double Esc to rewind and Ctrl+R for command history. Covers setup, memory management, and essential commands for efficient AI-assisted coding.

  • concepts · article

    Knowledge Distribution with Claude Skills System

    Hedgineer Technologies · Jan 13

    Hedgineer Technologies systematized Anthropic's Claude Skills to automatically distribute institutional knowledge across four technical domains (AI, Data, Infrastructure, UI). Skills are model-invoked rather than user-invoked, meaning Claude automatically applies relevant expertise based on context without requiring engineers to know which skills exist.

  • concepts · article

    LLM Agent Architecture Patterns and Design

    Jan 12

    Anthropic distinguishes between workflows (LLMs orchestrated through predefined code paths) and agents (LLMs that dynamically direct their own processes). The most successful implementations use simple, composable patterns rather than complex frameworks, building on augmented LLMs as the foundational block.