Week of Jun 8
tools · article
AI Coding Agent Evaluation Skills Framework
Hamel Husain · Jun 14
Hamel Husain released evals-skills, a structured set of capabilities that teach coding agents how to effectively evaluate AI products. The framework includes six core skills from error analysis to building review interfaces, designed to help agents distinguish between different types of failures and implement proper evaluation pipelines.
Week of Jun 1
tools · article
LLM Council Multi-Model Query System
Jun 7
A local web application that sends queries to multiple LLMs simultaneously, has them review and rank each other's responses anonymously, then produces a final consolidated answer via a designated Chairman LLM. Built with FastAPI backend and React frontend, using OpenRouter API to access various models like GPT, Claude, and Gemini.
tools · article
Matt Pocock's Production Agent Skills Library
Yash Thakker · Jun 7
A collection of 20+ production-grade AI agent skills for real engineering workflows, organized into planning, development, and tooling categories. Created by TypeScript educator Matt Pocock, the repository has 25,500+ GitHub stars and focuses on test-driven development, architecture planning, and git safety rather than experimental coding.
Week of May 25
tools · article
Project Glasswing AI Vulnerability Discovery
May 31
Project Glasswing uses Claude Mythos Preview AI model to find over 10,000 critical vulnerabilities across systemically important software with 50+ partners. The model shows 10x improvement in bug discovery rates compared to previous methods, with Cloudflare finding 2,000 bugs including 400 critical-severity issues.
tools · article
Anthropic Acquires Stainless SDK Platform
May 31
Anthropic acquired Stainless in May 2026, a company that generates SDKs and MCP server tooling across multiple programming languages. Stainless has powered all official Anthropic SDKs since the API launch and helps hundreds of companies build developer tools and agent connectors.
tools · article
Claude Opus 4.8 AI Model Release
May 31
Anthropic released Claude Opus 4.8 with significant improvements in coding, reasoning, and agentic tasks compared to previous versions. The model shows better judgment, tool calling efficiency, and reliability in autonomous workflows, with new features like effort control and dynamic workflows in Claude Code.
tools · article
Claude Code Memory System Architecture
orchestrator.dev · May 31
Claude Code has a four-layer memory architecture that allows persistent storage of codebase context, architecture decisions, and debugging history across sessions. Most developers only use 10% of its capability, leading to repetitive corrections and lost context between sessions.
tools · article
LLM Judge Model Selection Framework 2026
NVJK Kartik · May 31
Comprehensive comparison of 8 LLM judge models (Claude, GPT-5, Gemini, Luna-2, etc.) evaluated on three critical axes: human correlation on specific rubrics, cost per evaluation, and self-preference bias. Argues against using generic benchmarks like SummEval for judge selection.
Week of May 11
tools · article
Claude Code Agent View CLI Dashboard
https://www.facebook.com/testingcatalog · May 17
Anthropic launched Agent View for Claude Code, a command-line dashboard that manages multiple parallel coding sessions from a single interface. The feature allows developers to run background coding tasks, monitor session states, and switch between agents without managing multiple terminal windows.
tools · article
Opus 4.7 Productivity Tips from Boris Cherny
May 17
Six practical tips for maximizing productivity with Claude's Opus 4.7, including auto mode for permission handling, effort level configuration, focus mode for cleaner output, and verification patterns. Features like recaps and the /fewer-permission-prompts skill help streamline long-running AI tasks.
tools · article
Claude Agent SDK for Building AI Agents
May 17
The Claude Agent SDK (formerly Claude Code SDK) enables developers to build powerful agents by giving Claude access to computer tools like terminal, file editing, and bash commands. This approach allows agents to work like humans do, enabling applications beyond coding including finance, research, and customer support agents.
tools · article
Claude Code Performance Issues and Fixes
May 17
Anthropic identified three separate issues that degraded Claude Code performance between March-April: default reasoning effort reduced from high to medium, a memory clearing bug causing forgetfulness, and verbosity reduction hurting code quality. All issues were resolved by April 20, with the company resetting usage limits and improving their change management process.
tools · article
/grill-with-docs: Enhanced AI Collaboration with Documentation
May 17
An evolved AI prompting technique that combines intensive questioning (/grill-me) with active documentation management. It maintains CONTEXT.md files for shared terminology and creates Architectural Decision Records (ADRs) while exploring design decisions through AI dialogue.
tools · article
Claude Platform Updates and Multi-Agent Features
Simon Willison · May 17
Anthropic announced several Claude platform improvements including doubled rate limits, SpaceX Colossus data center partnership, and three new Claude Managed Agents features: multi-agent orchestration, outcomes-based iteration, and self-improvement through 'dreaming'. API volume increased 17x year-over-year with focus on developer productivity tools.
tools · article
Claude Financial Services AI Agent Framework
May 17
A comprehensive toolkit providing pre-built AI agents and plugins for financial workflows including investment banking, equity research, and wealth management. Offers dual deployment options via Claude Cowork plugins or Managed Agents API with specialized functions like pitch generation, market research, and financial reconciliation.
tools · article
OpenKB - Open Source Knowledge Base System
May 17
OpenKB is an open-source CLI tool that compiles raw documents into structured, wiki-style knowledge bases using LLMs. Unlike traditional RAG systems that rediscover knowledge on every query, OpenKB creates persistent knowledge that accumulates over time with automatic cross-references and contradiction detection.
tools · article
Anthropic Claude Managed Agents Platform Launch
Dan Shipper, Marcus Moretti, and Katie Parrott · May 17
Anthropic announced Claude Managed Agents with three key features: multi-agent orchestration, dreaming (learning from past sessions), and outcomes (goal-oriented loops). The platform now provides AI models with harness and host computer, representing a shift from simple text completion to full AI infrastructure hosting.
tools · article
ReadClaude Code Platform Updates Week 19 2026
May 17
Claude Code v2.1.128-v2.1.136 introduces plugin loading from ZIP archives and URLs, cross-project command history search with Ctrl+R, and new worktree branching controls. Additional improvements include auto mode hard deny rules and various environment variable configurations for better development workflow.
Week of Apr 6
tools · tweet
Anthropic Claude Managed Agents for Business Automation
Corey Ganim · Apr 10
Anthropic's Managed Agents removes the technical barriers to deploying AI agents for business automation by handling infrastructure, security, and deployment. Users only need to define what the agent should do, not how to build the underlying systems. This enables rapid prototyping and deployment of custom AI services without engineering expertise.
tools · tweet
Claude Managed Agents Launch
Lance Martin · Apr 8
Claude Managed Agents is a pre-built, configurable agent system that runs on managed infrastructure, designed to handle long-horizon tasks as Claude's capabilities grow. It addresses challenges of keeping agent harnesses updated with Claude's evolving abilities and supporting extended execution times through safe, resilient infrastructure.
tools · tweet
Claude for Legal Practice Workflows
Zack Shapiro · Apr 6
A boutique law firm uses Claude (general-purpose AI) instead of specialized legal AI tools to compete with larger firms. Claude analyzes complex deal terms, tracks interdependent contract provisions, and identifies legal conflicts in real-time during negotiations.
Week of Mar 30
tools · tweet
Processing failed
Kevin Gu · Apr 5
Could not process content automatically.
tools · tweet
Comparing gstack, Superpowers, and Compound Engineering Tools
Vox · Mar 30
Three popular Claude-based coding tools serve different functions in AI development workflows: gstack handles planning and evaluation (like a head chef), Superpowers manages kitchen processes, and Compound Engineering acts as a knowledge repository. The author uses restaurant metaphors to explain how these tools complement rather than compete with each other.
Week of Mar 23
tools · tweet
AI Agent Design Harness for Non-Designers
Neethan Wu · Mar 23
A three-layer system using AI skills (instruction files for design expertise), canvases (HTML/CSS design surfaces), and agents to enable engineers to produce professional UI/UX without traditional design training. Key tools include Impeccable UI skill, Paper canvas for real HTML/CSS design, and Pencil for Git-versioned design files.
Week of Mar 2
tools · tweet
Multi-Agent Bug Finding System
Dan Peguine ⌐◨-◨ · Mar 4
A three-agent system for finding bugs using Hunter Agent (finds all potential bugs with scoring), Skeptic Agent (challenges findings to reduce false positives), and Referee Agent (makes final determinations). Each agent has specific prompts and scoring mechanisms to maximize accuracy.
tools · tweet
Processing failed
Artem Zhutov · Mar 3
Could not process content automatically.
Week of Feb 23
tools · tweet
Claude Code Skills for Design Automation
✌︎ frederik ✌︎ · Feb 24
Skills are instruction sets for Claude Code that automate specific design and development tasks. Key examples include mobile-ios-design for enforcing iOS guidelines, impeccable toolkit for design refinement, and custom enterprise UX research workflows that can process feature ideas through structured analysis phases.
Week of Feb 9
tools · tweet
Claude Code as AI Chief of Staff
Mike Murchison · Feb 14
Mike Murchison demonstrates using Claude Code as an AI Chief of Staff that doubled his CEO productivity by unifying 6+ communication channels, managing multiplayer todo lists overnight, enriching contact records from meeting transcripts, and providing strategic pushback on decisions. He's shared the implementation on Github for other executives to try.
Week of Feb 2
tools · tweet
Claude Code vs Cursor for Designer Workflows
✌︎ frederik ✌︎ · Feb 7
A designer's comparison of Claude Code and Cursor, highlighting how Claude Code's Model Context Protocols (MCPs) enable seamless integration with design tools like Figma, Framer, and Remotion. The author found Claude Code superior for automating tedious design tasks across entire projects in seconds rather than hours.
tools · tweet
Claude Code Agent Teams Feature
Tom · Feb 7
Anthropic shipped agent teams natively into Claude Code, allowing multiple AI agents to work in parallel on different parts of a task while coordinating with each other. This replaces the sequential single-agent approach with a project manager model that delegates work across specialized teammates.
tools · tweet
Claude Code Setup and Configuration Guide
Ashley Ha · Feb 2
Boris Cherny, creator of Claude Code, shared detailed threads about his setup and usage patterns. Ashley Ha compiled these instructions into a markdown guide, revealing that the tool works well with minimal customization out of the box.
Week of Jan 26
tools · tweet
Supermemory Plugin for Claude Code
Dhravya Shah · Jan 31
Supermemory launched a plugin that gives Claude Code persistent memory across sessions, remembering coding preferences, codebase context, and past decisions. Uses hybrid memory system combining fact extraction and profile building, achieving 81.6% on LongMemEval benchmark versus 40-60% for traditional RAG systems.
tools · tweet
Claude Code Playground Plugin for Interactive HTML
Thariq · Jan 30
A new Claude Code plugin that generates standalone HTML playgrounds for visualizing and interacting with problems in ways not suited for text. Useful for architecture visualization, design tweaking, layout brainstorming, and game balancing through interactive interfaces.
Week of Jan 19
tools · tweet
Claude Code Tasks System Launch
Thariq · Jan 23
Claude Code upgraded from Todos to Tasks, a new primitive for tracking complex projects across multiple sessions and subagents. Tasks support dependencies, are stored in the file system, and enable real-time collaboration between sessions working on the same project.
tools · tweet
Ralph - AI Coding Agent Loop
Aakash Gupta · Jan 22
Ralph is a bash loop that runs AI coding agents repeatedly on atomic tasks until completion, delivering entire projects autonomously. It breaks large features into small, binary-success tasks that AI can complete without context pollution or hallucination.
tools · tweet
Ralph AI - Autonomous Software Building Tool
Damian Player · Jan 22
Ralph is an AI system that builds software autonomously by breaking work into small, testable tasks and working through them iteratively while you're away. It operates like a continuous integration system, picking tasks, building features, testing them, and moving to the next one without human intervention.
Week of Jan 12
tools · tweet
Agentic UI Design Resources Trinity
Cole · Jan 18
Cole recommends three key resources for agentic UI design: rams.ai by Eli Rousso, ui-skills.com by Ibelick, and Vercel's design guidelines. These tools represent essential references for building AI-driven user interfaces.
tools · tweet
Advanced Claude Code Features and Context Management
Eyad · Jan 13
Claude Code provides consistent 200K token context unlike other AI coding tools, and includes three advanced features: skills (markdown files that teach Claude specific workflows), subagents, and MCP connectors. Skills use YAML frontmatter to define when they should be automatically applied, making them powerful for team-specific coding standards and workflows.
tools · tweet
Claude Cowork Desktop App Review
claire vo 🖤 · Jan 13
Claude Cowork is a Mac desktop app that applies Claude's coding approach to non-technical knowledge work tasks like document creation, data analysis, and calendar management. It features connectors, filesystem access, TODO tracking, and bundled skills, but has connectivity issues and exposes technical artifacts that may confuse non-technical users.
tools · tweet
Claude Agent SDK for Building AI Agents
nader dabit · Jan 13
The Claude Agent SDK provides the infrastructure behind Claude Code as a library, handling the agent loop, built-in tools, and context management. It includes pre-built tools like Read, Write, Edit, Bash, and WebSearch, allowing developers to build custom agents without implementing the underlying tool execution loop.
tools · article
Claude Code Advanced Features Guide
Ado Kukic · Jan 13
Comprehensive guide to Claude Code's productivity features including project onboarding with /init, context management with @ mentions, instant bash execution with ! prefix, and workflow shortcuts like double Esc to rewind and Ctrl+R for command history. Covers setup, memory management, and essential commands for efficient AI-assisted coding.