Library

100 accepted items

concepts tools people resources

article tweet thread youtube video pdf adhoc

rated unrated read unread

flat by week

resources · article
Read
Loop Library
Jun 21
Matthew Berman launched Loop Library (Jun 2026) — a curated open repo of ~26 runnable agent loops. Every entry specifies: trigger, action, proof, memory, and stopping condition. Proof and stopping condition are first-class — not afterthoughts.
Source ↗
concepts · article
Loops are Replacing Prompts. Verification is About to Be Your Biggest Problem.
Arjun Iyer · Jun 21
Three eras of AI coding — prompt-driven, spec-driven, loop-driven — and why verification becomes the binding constraint as the human moves up a level each era. Loop economics: total cost = iterations-to-verified × cost-per-iteration.
Source ↗
concepts · tweet
You Shouldn't Be Prompting Coding Agents Anymore — Design Loops
@steipete · Jun 21
Peter Steinberger's Jun 8 tweet (6.5M views) seeded the loop-engineering naming wave: stop prompting agents, design loops that prompt your agents. The origin post that Osmani's essay, Cherny's post, and the entire June 2026 discourse cite.
Source ↗
concepts · article
Loop Engineering
Addy Osmani · Jun 21
Addy Osmani names and defines loop engineering: replacing yourself as the person who prompts the agent — you design the system that does it instead. The shift: you held the tool (type → read → type). Now you build a small system that finds the work, hands it out, checks it, and decides the next thing.
Source ↗
concepts · tweet
autoresearch: Remove Yourself as the Bottleneck
@karpathy · Jun 21
Andrej Karpathy on his autoresearch loop: a single-file nanochat where an agent loops on a git branch to lower validation loss. Remove yourself as the bottleneck, put in few tokens, huge amount happens.
Source ↗
concepts · tweet
Inspect: Ramp Background Coding Agent (75% of Code)
@rahulgs · Jun 21
Rahul Sengottuvelu reports that Inspect, Ramp's background coding agent, now produces 75%+ of Ramp's code. What got them there: sustained loop infrastructure — every repo agent-ready, every command performant, every feedback fast and truthful.
Source ↗
concepts · youtube
How To Approach Your AI Evals
Hamel Husain · Jun 21
Hamel Husain on how to actually approach AI evals — the verification half of a loop. Anchor of his 4-video eval series (Jun 2026). Evals are what make loops converge.
Source ↗
concepts · youtube
Don't Build More AI Agents Until You Watch This
Nate B Jones · Jun 21
Nate B. Jones argues against agent-sprawl: loops and orchestration over building more individual agents. The case for designing fewer, better-connected loops instead of proliferating agents.
Source ↗
resources · article
WTF Is a Loop? Part 2: The 15 Loops People Are Actually Using
Matt Van Horn · Jun 20
Matt Van Horn cataloged 15 agent loops people are actually using in practice — a practical counterpart to the theoretical loop-engineering canon. Builds on his Part 1 debate map from Jun 8.
Source ↗
concepts · article
AI Agent Security and Prompt Injection Vulnerabilities
Airia Team · Jun 14
Agentic AI systems face a fundamental security flaw where they cannot distinguish between instructions and data, leading to prompt injection attacks. The 'Lethal Trifecta' combines access to private data, exposure to untrusted tokens, and exfiltration vectors, enabling attacks like EchoLeak and GeminiJack that steal sensitive data through hidden instructions in emails and documents.
Source ↗
concepts · article
AI Agent Orchestration Patterns for Production
JobsByCulture · Jun 14
Multi-agent AI orchestration coordinates multiple agents to solve complex tasks that single agents cannot handle due to context limits, specialization needs, or parallelism requirements. The Sequential Chain pattern runs agents in fixed sequence where each agent's output feeds the next, ideal for tasks with natural linear progression like document processing pipelines.
Source ↗
concepts · article
Compound Engineering 8-Step Framework Evolution
Kieran Klaassen · Jun 14
Compound engineering has evolved from a 4-step loop (brainstorm → work → review → compound) to an 8-step process that adds ideation and planning at the front, and polishing at the end. The framework recognizes that AI handles the middle work phases well, but humans remain crucial for initial vision-setting and final quality assessment.
Source ↗
concepts · article
Agent Engineering Framework and Definition
Latent.Space · Jun 14
Explores the challenge of defining AI agents, presenting multiple perspectives from OpenAI's TRIM model (model + instructions + tools + runtime) to Lilian Weng's definition (LLM + memory + planning + tools). Proposes six core elements: LLMs with tools, encoded intent, LLM-driven control flow, multi-step planning, long-running memory, and goal-oriented behavior.
Source ↗
tools · article
AI Coding Agent Evaluation Skills Framework
Hamel Husain · Jun 14
Hamel Husain released evals-skills, a structured set of capabilities that teach coding agents how to effectively evaluate AI products. The framework includes six core skills from error analysis to building review interfaces, designed to help agents distinguish between different types of failures and implement proper evaluation pipelines.
Source ↗
tweet
https://x.com/gregisenberg/status/2054584280848769413
@gregisenberg · Jun 14
Source ↗
concepts · article
Read
Software 3.0 and Agentic Programming Evolution
Jun 14
The evolution from traditional coding to agentic programming represents a fundamental shift where LLMs become a programmable layer for digital work. Programming units changed from writing lines of code to delegating macro actions like implementing features or refactoring systems. Context windows become the new program interface, enabling adaptive software that transforms inputs directly without traditional infrastructure.
Source ↗
concepts · youtube
Agent Literacy: Claude vs Codex Interface Philosophy
Nate B Jones · Jun 14
Claude and Codex aren't just competing coding tools - they're teaching different approaches to agent interaction. Claude makes 'steering agents' feel natural while Codex makes 'dispatching agents' feel natural. These interfaces are training habits for how we'll work with AI agents across all knowledge work, not just coding.
Source ↗
concepts · youtube
Read
AI Agent Loops vs Human-in-the-Loop
Greg Isenberg · Jun 14
AI agent loops allow AI systems to operate autonomously without human prompting at each step, unlike human-in-the-loop where humans direct each iteration. While industry leaders like Boris and Peter advocate for autonomous loops, Professor Ras Mic argues human-in-the-loop remains superior for most use cases unless you have unlimited resources.
Source ↗
tweet
https://x.com/zero_goliath/status/2065835673911976398
@zero_goliath · Jun 14
Source ↗
tweet
https://x.com/brainsandtennis/status/2065190286519906657
@brainsandtennis · Jun 12
Source ↗
tweet
Designing loops with Fable 5
@rlancemartin · Jun 10
Source ↗
concepts · tweet
Read
AI Coding Loops vs Direct Prompting
Matt Van Horn · Jun 8
Instead of manually prompting AI coding agents, engineers should write 'loops' - small programs that automatically prompt agents, evaluate outputs, and iterate until completion. This represents a shift from being the prompter to being the author of the prompting system, with the AI model becoming a subroutine.
Source ↗
tools · article
LLM Council Multi-Model Query System
Jun 7
A local web application that sends queries to multiple LLMs simultaneously, has them review and rank each other's responses anonymously, then produces a final consolidated answer via a designated Chairman LLM. Built with FastAPI backend and React frontend, using OpenRouter API to access various models like GPT, Claude, and Gemini.
Source ↗
concepts · article
Read
Generative UI for AI Agents
Jun 7
Generative UI allows AI agents to dynamically create and control user interfaces at runtime instead of relying on static chat interfaces. This enables agents to render task-specific components, collect structured inputs, and show progress through interactive UI elements that adapt to context and user needs.
Source ↗
concepts · article
Read
Experience Internalization for Continual Learning LLMs
Jun 7
Research reveals that current LLM experience internalization methods suffer from progressive capability collapse in multi-iteration learning rather than compounding improvement. The study identifies three critical dimensions: principle-level experience outperforms instance-level, step-wise injection beats global injection, and off-policy context-distillation provides more stable training than on-policy approaches.
Source ↗
concepts · article
Read
Prompt Injection Vulnerabilities in AI Coding Assistants
Jun 7
Comprehensive analysis revealing that AI coding assistants like GitHub Copilot and Cursor face critical security vulnerabilities through prompt injection attacks, with success rates exceeding 85% against current defenses. The study cataloged 42 distinct attack techniques and found most defense mechanisms achieve less than 50% mitigation against sophisticated attacks.
Source ↗
tools · article
Matt Pocock's Production Agent Skills Library
Yash Thakker · Jun 7
A collection of 20+ production-grade AI agent skills for real engineering workflows, organized into planning, development, and tooling categories. Created by TypeScript educator Matt Pocock, the repository has 25,500+ GitHub stars and focuses on test-driven development, architecture planning, and git safety rather than experimental coding.
Source ↗
concepts · article
Enterprise AI Tool Cost Management Strategies
Simon Willison · Jun 7
Uber implemented $1,500 monthly spending caps per employee for AI coding tools like Cursor and Claude Code after exceeding their 2026 AI budget in four months. This represents approximately 11% of median engineer compensation and suggests companies are finding real value in AI tools despite needing cost controls.
Source ↗
concepts · article
How Coding Agents Work with LLMs
Simon Willison · Jun 7
Coding agents are software harnesses that extend LLMs with additional capabilities through invisible prompts and callable tools. LLMs work by completing text sequences using tokens (not words), with providers charging based on token usage. Chat templated prompts simulate conversations but require replaying entire conversation history each time, making longer conversations more expensive.
Source ↗
resources · youtube
Trust Layer for AI-Generated Office Files (Second AI Attack)
Nate B Jones · May 31
Nate B Jones's 4-stage trust-layer workflow for AI-generated office files: a hostile-reviewer prompt plus two-model QC (Codex ⇄ Opus 4.7) producing one verified output — the 'second AI attack' that catches what a single generation pass misses.
Source ↗
concepts · article
Trust Layer for AI-Generated Office Files
Nate · May 31
AI can generate polished-looking spreadsheets and presentations that contain hidden errors like broken formulas or wrong data sources. A systematic verification workflow with source tracking, assumption logging, and hostile review prevents shipping confident-looking but incorrect work.
Source ↗
concepts · article
AI-Native Company Operations and Workforce
Lenny Rachitsky · May 31
Every, a 15-person company, generates 7-figure revenue with 100% AI-written code across 5 products. They use specialized AI agents for different tasks and employ an 'AI operations lead' to maximize team productivity through AI tools.
Source ↗
concepts · article
Agent-Native Software Architecture Paradigm
Dan Shipper · May 31
A new software development approach where AI agents, rather than traditional code, form the core of applications. Features are defined as prompts describing desired outcomes rather than step-by-step instructions, making software more malleable and accessible to non-programmers.
Source ↗
concepts · tweet
2026 AI Software Architecture Predictions
Dan Shipper 📧 · May 31
Dan Shipper predicts four major shifts in software development by 2026: agent-native architectures where apps are prompts to general AI agents, empowered designers who can build without engineers, agentic engineering as a new no-code discipline, and AI training focused on autonomous self-direction rather than human-pleasing. These changes stem from AI making software development dramatically cheaper and more accessible.
Source ↗
concepts · article
AI-Native Business Model and Organizational Structure
Dan Shipper · May 31
Every operates as an AI-first company combining media, products, and consulting with just 15 employees generating $1.2M ARR. Their model involves living in the future with AI tools, documenting observations, building missing solutions, and teaching what works. Everyone is a generalist who uses AI for all tasks, blending traditional job roles.
Source ↗
concepts · article
Harness Engineering and Adversarial AI Architecture
Eric · May 31
Prompt engineering has reached its limits for complex autonomous tasks, leading to a new paradigm called 'Harness Engineering' that focuses on building structured environments around AI agents. Anthropic's breakthrough uses a GAN-inspired architecture with separate Generator and Evaluator agents creating adversarial feedback loops to overcome AI's inability to self-critique effectively.
Source ↗
concepts · article
Read
Anthropic Three-Agent AI Development Architecture
May 31
Anthropic developed a multi-agent system that divides long-running AI development tasks among three specialized agents: planning, generation, and evaluation. The system uses context resets and structured handoff artifacts to maintain coherence during multi-hour autonomous coding sessions, addressing common issues like context loss and premature task termination.
Source ↗
concepts · article
KPMG-Anthropic Strategic AI Alliance for Enterprise
May 31
KPMG announced a global alliance with Anthropic to integrate Claude AI across its entire workforce of 276,000+ employees and core business operations. The partnership embeds Claude into KPMG's Digital Gateway platform for client work in tax and legal services, and establishes KPMG as Anthropic's preferred partner for private equity.
Source ↗
tools · article
Project Glasswing AI Vulnerability Discovery
May 31
Project Glasswing uses Claude Mythos Preview AI model to find over 10,000 critical vulnerabilities across systemically important software with 50+ partners. The model shows 10x improvement in bug discovery rates compared to previous methods, with Cloudflare finding 2,000 bugs including 400 critical-severity issues.
Source ↗
tools · article
Anthropic Acquires Stainless SDK Platform
May 31
Anthropic acquired Stainless in May 2026, a company that generates SDKs and MCP server tooling across multiple programming languages. Stainless has powered all official Anthropic SDKs since the API launch and helps hundreds of companies build developer tools and agent connectors.
Source ↗
concepts · article
PwC Claude Enterprise AI Implementation Strategy
May 31
PwC and Anthropic expanded their partnership to deploy Claude across PwC's global workforce of hundreds of thousands, focusing on agentic technology builds, AI-native deal-making, and enterprise function reinvention. The collaboration includes training 30,000 professionals and launching a Claude-based finance business unit, with production deployments already cutting delivery times by up to 70%.
Source ↗
tools · article
Claude Opus 4.8 AI Model Release
May 31
Anthropic released Claude Opus 4.8 with significant improvements in coding, reasoning, and agentic tasks compared to previous versions. The model shows better judgment, tool calling efficiency, and reliability in autonomous workflows, with new features like effort control and dynamic workflows in Claude Code.
Source ↗
concepts · article
LLM Knowledge Bases for Business Intelligence
May 31
LLM knowledge bases transform raw organizational data into self-organizing, queryable intelligence systems that improve with use. Andrej Karpathy demonstrated this approach by having AI compile and maintain a 100-article knowledge base without complex RAG infrastructure. The AI-driven knowledge management market is projected to reach $11.24 billion by 2026.
Source ↗
concepts · article
Enterprise LLM Wiki Knowledge Management Pattern
May 31
Karpathy's LLM Wiki pattern uses an AI agent to automatically maintain a personal knowledge base by processing raw sources into structured wiki pages with cross-references and contradiction detection. The pattern works well personally but fails at enterprise scale because companies lack a dedicated curator and need automated ingestion from existing work tools rather than manual curation.
Source ↗
concepts · article
Forward Deployed Engineers in AI Companies
May 31
Major AI companies like Google, OpenAI, and Anthropic are rapidly hiring Forward Deployed Engineers (FDEs) through simplified processes and separate organizations. These roles involve integrating AI systems into enterprise customers' workflows and operations. The role is evolving from platform engineering toward solutions architecture and consulting.
Source ↗
concepts · article
Forward Deployed Engineers in Enterprise AI
May 31
AI vendors are embedding Forward Deployed Engineers (FDEs) to help enterprises implement agentic AI solutions, but Gartner predicts 70% of companies will abandon these projects by 2028 due to high costs and lack of internal capabilities. The key is ensuring knowledge transfer and capability building rather than creating vendor dependency.
Source ↗
concepts · article
Palantir's Forward Deployed Engineer Enterprise Model
MindStudio Team · May 31
Palantir's Forward Deployed Engineer (FDE) model embeds engineers directly inside client companies to build and deploy AI solutions, bridging the knowledge gap between AI capabilities and business requirements. This approach drove 640% returns and is now being adopted by Anthropic and OpenAI for enterprise deployment.
Source ↗
concepts · article
AI Agent Memory Benchmarks and Architectures 2026
May 31
Standardized benchmarks LoCoMo, LongMemEval, and BEAM now measure AI agent memory performance across dimensions like recall, temporal reasoning, and multi-session continuity. Top systems achieve 92.5 on LoCoMo and 94.4 on LongMemEval, with major gains in temporal reasoning (+29.6 points) and multi-hop queries (+23.1 points).
Source ↗
concepts · article
Claude Memory Architecture for Persistent Context
May 31
Agent failures in long-running workflows stem from context drift, not model limitations. The solution is engineered memory architecture using three layers: CLAUDE.md for human instructions, auto memory for model learnings, and managed stores for versioned persistence. This shifts focus from endless chat history to selective state reconstruction.
Source ↗
concepts · article
LLM-as-a-Judge for Automated Model Evaluation
Karyna Naminas · May 31
LLM-as-a-Judge uses powerful language models like GPT-4 to automatically evaluate AI outputs at scale, achieving 80% agreement with human evaluators while providing 500x-5000x cost savings over manual review. The approach involves prompting capable models to assess quality, safety, and relevance of other models' outputs based on specified criteria.
Source ↗
tools · article
Claude Code Memory System Architecture
orchestrator.dev · May 31
Claude Code has a four-layer memory architecture that allows persistent storage of codebase context, architecture decisions, and debugging history across sessions. Most developers only use 10% of its capability, leading to repetitive corrections and lost context between sessions.
Source ↗
tools · article
LLM Judge Model Selection Framework 2026
NVJK Kartik · May 31
Comprehensive comparison of 8 LLM judge models (Claude, GPT-5, Gemini, Luna-2, etc.) evaluated on three critical axes: human correlation on specific rubrics, cost per evaluation, and self-preference bias. Argues against using generic benchmarks like SummEval for judge selection.
Source ↗
concepts · article
LLM as Judge Pattern for Agent Safety
MindStudio Team · May 31
A safety pattern where a second AI model evaluates and approves agent actions before execution, preventing costly mistakes in production workflows. The judge acts as a gatekeeper that can pause, revert, or route problematic actions to humans when rule-based guardrails aren't sufficient.
Source ↗
resources · video
Claude on Vertex AI with the ADK
Ivan Nardini (Google Cloud) · May 17
Walkthrough of Google Cloud's agent stack as competitor/complement to Anthropic-native + Vercel AI SDK setups.
Source ↗
resources · video
Practical Claude Code Tips
Boris Cherny (Anthropic) · May 17
Practical, no-history-no-theory walkthrough: terminal setup, codebase Q&A as the onboarding wedge, memory files, the Claude Code SDK in CI, "don't index, ask Git."
Source ↗
resources · video
Skills: The Application Layer
Barry Zhang & Mahesh Murag (Anthropic) · May 17
"We stopped building agents and started building Skills." The clearest framing of Skills as the software layer of the agent stack.
Source ↗
resources · video
How We Build Effective Agents
Barry Zhang (Anthropic) · May 17
Direct read-across to ADR-001 and the Cortex agent layer. The talk to send anyone "thinking about agents."
Source ↗
resources · video
Software 3.0
Andrej Karpathy (interviewed by Stephanie Zhan) · May 17
The Software 3.0 framing: 1.0 = explicit code, 2.0 = learned weights, 3.0 = prompted models. Karpathy on why teams should change behavior the day they believe it.
Source ↗
tools · article
Claude Code Agent View CLI Dashboard
https://www.facebook.com/testingcatalog · May 17
Anthropic launched Agent View for Claude Code, a command-line dashboard that manages multiple parallel coding sessions from a single interface. The feature allows developers to run background coding tasks, monitor session states, and switch between agents without managing multiple terminal windows.
Source ↗
tools · article
Opus 4.7 Productivity Tips from Boris Cherny
May 17
Six practical tips for maximizing productivity with Claude's Opus 4.7, including auto mode for permission handling, effort level configuration, focus mode for cleaner output, and verification patterns. Features like recaps and the /fewer-permission-prompts skill help streamline long-running AI tasks.
Source ↗
tweet
https://x.com/mattpocockuk/status/2054808143041908936
@mattpocockuk · May 17
Source ↗
concepts · tweet
AI Output Evolution: Text to Interactive Visual Media
Andrej Karpathy · May 17
Andrej Karpathy argues that AI-human interaction is evolving from text-based outputs toward visual and interactive formats. He suggests a progression from raw text to markdown to HTML, eventually reaching interactive neural videos/simulations generated by diffusion models.
Source ↗
concepts · tweet
AI Context Persistence Problem in Development
Taelin · May 17
Working with AI agents is frustrating because each new session requires re-explaining domain knowledge. Existing solutions like AGENTS.md, RAGs, and skills don't solve the 'unknown unknowns' problem where the AI can't search for knowledge it doesn't know it needs.
Source ↗
concepts · article
Multi-Agent AI Systems Architecture and Performance
May 17
Claude's Research feature uses multiple AI agents working in parallel to handle complex, open-ended research tasks that require dynamic planning and exploration. The multi-agent approach outperformed single-agent systems by 90.2% on internal evaluations, with token usage explaining 80% of performance variance.
Source ↗
concepts · article
Context Engineering vs Prompt Engineering
May 17
Context engineering is the evolution of prompt engineering, focusing on optimizing the entire set of tokens and information available to an LLM during inference, not just the prompts. As AI systems become more agent-like with multi-turn interactions, managing context becomes critical due to 'context rot' - the degradation of model performance as context length increases.
Source ↗
tools · article
Claude Agent SDK for Building AI Agents
May 17
The Claude Agent SDK (formerly Claude Code SDK) enables developers to build powerful agents by giving Claude access to computer tools like terminal, file editing, and bash commands. This approach allows agents to work like humans do, enabling applications beyond coding including finance, research, and customer support agents.
Source ↗
tools · article
Claude Code Performance Issues and Fixes
May 17
Anthropic identified three separate issues that degraded Claude Code performance between March-April: default reasoning effort reduced from high to medium, a memory clearing bug causing forgetfulness, and verbosity reduction hurting code quality. All issues were resolved by April 20, with the company resetting usage limits and improving their change management process.
Source ↗
resources · article
Processing failed
May 17
Could not process content automatically.
Source ↗
tools · article
/grill-with-docs: Enhanced AI Collaboration with Documentation
May 17
An evolved AI prompting technique that combines intensive questioning (/grill-me) with active documentation management. It maintains CONTEXT.md files for shared terminology and creates Architectural Decision Records (ADRs) while exploring design decisions through AI dialogue.
Source ↗
tools · article
Claude Platform Updates and Multi-Agent Features
Simon Willison · May 17
Anthropic announced several Claude platform improvements including doubled rate limits, SpaceX Colossus data center partnership, and three new Claude Managed Agents features: multi-agent orchestration, outcomes-based iteration, and self-improvement through 'dreaming'. API volume increased 17x year-over-year with focus on developer productivity tools.
Source ↗
concepts · article
Real-time Search Quality Evaluation Systems
May 17
Sierra developed an evaluation system that measures AI agent search quality against real conversations by creating daily 'golden datasets' from anonymized customer interactions. The system uses retrieval metrics like recall, precision, and nDCG to identify search failures and drive continuous improvement, leading to up to 16 percentage point improvements in resolution rates.
Source ↗
concepts · article
Fine-tuning Agents with Reverse-Engineered Training Data
May 17
Shopify fine-tuned Qwen3-32B to generate workflow automations from natural language by reverse-engineering training data from existing user workflows. They achieved 2.2x speed improvement and 68% cost reduction compared to frontier models by starting with validated production workflows and working backwards to generate plausible user queries.
Source ↗
tools · article
Claude Financial Services AI Agent Framework
May 17
A comprehensive toolkit providing pre-built AI agents and plugins for financial workflows including investment banking, equity research, and wealth management. Offers dual deployment options via Claude Cowork plugins or Managed Agents API with specialized functions like pitch generation, market research, and financial reconciliation.
Source ↗
tools · article
OpenKB - Open Source Knowledge Base System
May 17
OpenKB is an open-source CLI tool that compiles raw documents into structured, wiki-style knowledge bases using LLMs. Unlike traditional RAG systems that rediscover knowledge on every query, OpenKB creates persistent knowledge that accumulates over time with automatic cross-references and contradiction detection.
Source ↗
concepts · article
AI Agent Workflows for 10x Engineering Productivity
Rhea Purohit · May 17
Two engineers at Every ship like a team of 15 by designing AI agent workflows that compound - using a meta-prompt that writes prompts to transform rough ideas into detailed GitHub issues, then carefully planning before coding. They emphasize fixing problems early in the workflow when stakes are low, inspired by Andy Grove's High Output Management principles.
Source ↗
tools · article
Anthropic Claude Managed Agents Platform Launch
Dan Shipper, Marcus Moretti, and Katie Parrott · May 17
Anthropic announced Claude Managed Agents with three key features: multi-agent orchestration, dreaming (learning from past sessions), and outcomes (goal-oriented loops). The platform now provides AI models with harness and host computer, representing a shift from simple text completion to full AI infrastructure hosting.
Source ↗
tools · article
Read
Claude Code Platform Updates Week 19 2026
May 17
Claude Code v2.1.128-v2.1.136 introduces plugin loading from ZIP archives and URLs, cross-project command history search with Ctrl+R, and new worktree branching controls. Additional improvements include auto mode hard deny rules and various environment variable configurations for better development workflow.
Source ↗
concepts · article
LLM Wiki vs RAG Knowledge Management
May 17
Andrej Karpathy's LLM wiki is a three-folder markdown system that loads structured knowledge directly into LLM context, while RAG retrieves chunks dynamically from vector stores. LLM wiki excels for personal-scale knowledge bases (up to 100 articles) with 95% token savings, while RAG scales to enterprise-level millions of documents.
Source ↗
concepts · article
Authorization Propagation in Multi-Agent AI Systems
May 17
Multi-agent AI systems face a distinct security challenge beyond prompt injection: maintaining authorization invariants as non-human agents delegate tasks, retrieve data, and synthesize results across changing boundaries. This 'authorization propagation' problem involves three sub-problems: transitive delegation, aggregation inference, and temporal validity that classical access control models don't fully address.
Source ↗
resources · youtube
Pinecone Just Demoted Vector Search. Here's the Knowledge Layer.
Nate B Jones · May 13
Nate B Jones argues the AI-agent-memory war has moved past embed-and-retrieve. Even Pinecone is repositioning vectors as one component of a broader knowledge layer that includes graph relationships, structured data, and contextual retrieval. The thesis: production agents need a layered knowledge stack, not just RAG.
Source ↗
resources · video
Software Fundamentals Matter More Than Ever
Matt Pocock · May 13
In the AI age, fundamentals (DDD, encapsulation, type safety) compound the value of AI tooling. Short, sharp, quotable.
Source ↗
resources · video
Full Walkthrough: Workflow for AI Coding
Matt Pocock · May 13
Pocock's consolidated AI-coding workflow built around skills (structured prompts with categories, validation checkpoints, bundled resources).
Source ↗
resources · youtube
Read
Engineers, DELETE the BASH Tool: Agentic Security
IndyDevDan · May 11
Argues that the Bash tool inside Claude Code (and most agent harnesses) is a ticking time bomb: prompt injection or a single bad prompt can escalate to destroying production. Walks through concrete sandboxing patterns to remove or constrain Bash while preserving capability.
Source ↗
tools · tweet
Anthropic Claude Managed Agents for Business Automation
Corey Ganim · Apr 10
Anthropic's Managed Agents removes the technical barriers to deploying AI agents for business automation by handling infrastructure, security, and deployment. Users only need to define what the agent should do, not how to build the underlying systems. This enables rapid prototyping and deployment of custom AI services without engineering expertise.
Source ↗
tools · tweet
Claude Managed Agents Launch
Lance Martin · Apr 8
Claude Managed Agents is a pre-built, configurable agent system that runs on managed infrastructure, designed to handle long-horizon tasks as Claude's capabilities grow. It addresses challenges of keeping agent harnesses updated with Claude's evolving abilities and supporting extended execution times through safe, resilient infrastructure.
Source ↗
concepts · tweet
AI Impact on Small Business Valuation Models
M&A Focused CPA · Apr 7
Traditional SMB valuation uses five levers: cash flow, owner compensation, durability, transferability, and growth rate, typically yielding 3-5x EBITDA. AI is disrupting this by enabling businesses to break through the $2-3M revenue ceiling that previously required risky system investments, while dramatically improving margin structures by replacing labor costs.
Source ↗
concepts · tweet
Neofirms: AI-Era Professional Services Evolution
Ryan Daniels · Apr 6
Professional services firms must transform from traditional partnerships focused on human talent to 'Neofirms' that blend practitioners with AI researchers. These new firms use corporate structures enabling R&D investment, bill for outcomes rather than hours, and continuously redefine the human-machine frontier.
Source ↗
concepts · tweet
AI Disruption of BigLaw Economic Model
Zack Shapiro · Apr 6
BigLaw firms' economic model relies on leverage - partners distributing work to numerous associates who bill high rates for labor-intensive tasks. AI threatens this by enabling complex legal work to be done with radically fewer human hours, making the associate-heavy pyramid structure economically unsustainable.
Source ↗
concepts · tweet
AI Job Impact: Production vs Judgment Tasks
Zack Shapiro · Apr 6
AI is rapidly commoditizing skilled production work (research, drafting, analysis) but cannot replace judgment-based tasks that require contextual decision-making and strategic thinking. A two-person law firm example shows how AI handles 90% of document analysis but cannot make strategic decisions about deal dynamics or client relationships.
Source ↗
tools · tweet
Claude for Legal Practice Workflows
Zack Shapiro · Apr 6
A boutique law firm uses Claude (general-purpose AI) instead of specialized legal AI tools to compete with larger firms. Claude analyzes complex deal terms, tracks interdependent contract provisions, and identifies legal conflicts in real-time during negotiations.
Source ↗
tools · tweet
Processing failed
Kevin Gu · Apr 5
Could not process content automatically.
Source ↗
concepts · tweet
LLM-Managed Personal Knowledge Base System
Andrej Karpathy · Apr 3
Andrej Karpathy describes a workflow where LLMs automatically build and maintain personal wikis from raw documents, creating structured markdown files with summaries, backlinks, and categorized concepts. The system uses Obsidian as an IDE frontend and enables complex Q&A against the knowledge base without traditional RAG systems.
Source ↗
concepts · tweet
Processing failed
ashu garg · Apr 3
Could not process content automatically.
Source ↗
tools · tweet
Comparing gstack, Superpowers, and Compound Engineering Tools
Vox · Mar 30
Three popular Claude-based coding tools serve different functions in AI development workflows: gstack handles planning and evaluation (like a head chef), Superpowers manages kitchen processes, and Compound Engineering acts as a knowledge repository. The author uses restaurant metaphors to explain how these tools complement rather than compete with each other.
Source ↗
concepts · tweet
AI Agent Evolution and Market Implications 2026
logan bartlett · Mar 30
AI is progressing from productivity copilots to autonomous task agents, expanding addressable markets from $0.5T software spend to potentially $6.2T knowledge worker compensation. Current software selloffs affect horizontal SaaS (-35%) more than vertical SaaS (+3%) due to different defensive moats against AI disruption.
Source ↗
concepts · tweet
Enterprise AI Adoption: Four Stages Framework
ashu garg · Mar 28
Enterprise AI adoption follows four stages: no AI usage, pilot sprawl without strategy, measurable productivity gains, and full process redesign. Most enterprises remain stuck at stage two with many pilots but no clear outcomes. Success requires picking real business problems, building proper data infrastructure, and organizational commitment to process change.
Source ↗
tools · tweet
AI Agent Design Harness for Non-Designers
Neethan Wu · Mar 23
A three-layer system using AI skills (instruction files for design expertise), canvases (HTML/CSS design surfaces), and agents to enable engineers to produce professional UI/UX without traditional design training. Key tools include Impeccable UI skill, Paper canvas for real HTML/CSS design, and Pencil for Git-versioned design files.
Source ↗
concepts · tweet
Plan-First Development with AI Agents
Matt Van Horn · Mar 23
A development methodology that inverts traditional coding: spend 80% of time planning with AI agents and 20% executing, using structured plan.md files as persistent context. Multiple AI research agents analyze codebases, past solutions, and external docs in parallel to create grounded, specific plans before any coding begins.
Source ↗
concepts · tweet
Claude Code Skills Categories and Best Practices
Thariq · Mar 18
Skills in Claude Code are flexible extension points that go beyond simple markdown files - they're folders containing scripts, assets, and data. At Anthropic, hundreds of skills cluster into four main categories: Reference (library/CLI usage), Verification (testing code correctness), Data (connecting to monitoring stacks), and Workflow (automating repetitive tasks).
Source ↗
concepts · tweet
Institutional AI vs Individual AI Productivity
George Sivulka · Mar 13
AI makes individuals 10x more productive but doesn't translate to organizational value without structural changes. Like electricity in textile mills (1890s-1920s), the technology must be paired with institutional redesign to realize productivity gains. Individual AI creates chaos without coordination layers.
Source ↗