Week of Jun 8
tools · article
AI Coding Agent Evaluation Skills Framework
Hamel Husain · Jun 14
Hamel Husain released evals-skills, a structured set of capabilities that teach coding agents how to effectively evaluate AI products. The framework includes six core skills from error analysis to building review interfaces, designed to help agents distinguish between different types of failures and implement proper evaluation pipelines.
Week of Jun 1
tools · article
LLM Council Multi-Model Query System
Jun 7
A local web application that sends queries to multiple LLMs simultaneously, has them review and rank each other's responses anonymously, then produces a final consolidated answer via a designated Chairman LLM. Built with FastAPI backend and React frontend, using OpenRouter API to access various models like GPT, Claude, and Gemini.
tools · article
Matt Pocock's Production Agent Skills Library
Yash Thakker · Jun 7
A collection of 20+ production-grade AI agent skills for real engineering workflows, organized into planning, development, and tooling categories. Created by TypeScript educator Matt Pocock, the repository has 25,500+ GitHub stars and focuses on test-driven development, architecture planning, and git safety rather than experimental coding.
Week of May 25
tools · article
Project Glasswing AI Vulnerability Discovery
May 31
Project Glasswing uses Claude Mythos Preview AI model to find over 10,000 critical vulnerabilities across systemically important software with 50+ partners. The model shows 10x improvement in bug discovery rates compared to previous methods, with Cloudflare finding 2,000 bugs including 400 critical-severity issues.
tools · article
Anthropic Acquires Stainless SDK Platform
May 31
Anthropic acquired Stainless in May 2026, a company that generates SDKs and MCP server tooling across multiple programming languages. Stainless has powered all official Anthropic SDKs since the API launch and helps hundreds of companies build developer tools and agent connectors.
tools · article
Claude Opus 4.8 AI Model Release
May 31
Anthropic released Claude Opus 4.8 with significant improvements in coding, reasoning, and agentic tasks compared to previous versions. The model shows better judgment, tool calling efficiency, and reliability in autonomous workflows, with new features like effort control and dynamic workflows in Claude Code.
tools · article
Claude Code Memory System Architecture
orchestrator.dev · May 31
Claude Code has a four-layer memory architecture that allows persistent storage of codebase context, architecture decisions, and debugging history across sessions. Most developers only use 10% of its capability, leading to repetitive corrections and lost context between sessions.
tools · article
LLM Judge Model Selection Framework 2026
NVJK Kartik · May 31
Comprehensive comparison of 8 LLM judge models (Claude, GPT-5, Gemini, Luna-2, etc.) evaluated on three critical axes: human correlation on specific rubrics, cost per evaluation, and self-preference bias. Argues against using generic benchmarks like SummEval for judge selection.
Week of May 11
tools · article
Claude Code Agent View CLI Dashboard
https://www.facebook.com/testingcatalog · May 17
Anthropic launched Agent View for Claude Code, a command-line dashboard that manages multiple parallel coding sessions from a single interface. The feature allows developers to run background coding tasks, monitor session states, and switch between agents without managing multiple terminal windows.
tools · article
Opus 4.7 Productivity Tips from Boris Cherny
May 17
Six practical tips for maximizing productivity with Claude's Opus 4.7, including auto mode for permission handling, effort level configuration, focus mode for cleaner output, and verification patterns. Features like recaps and the /fewer-permission-prompts skill help streamline long-running AI tasks.
tools · article
Claude Agent SDK for Building AI Agents
May 17
The Claude Agent SDK (formerly Claude Code SDK) enables developers to build powerful agents by giving Claude access to computer tools like terminal, file editing, and bash commands. This approach allows agents to work like humans do, enabling applications beyond coding including finance, research, and customer support agents.
tools · article
Claude Code Performance Issues and Fixes
May 17
Anthropic identified three separate issues that degraded Claude Code performance between March-April: default reasoning effort reduced from high to medium, a memory clearing bug causing forgetfulness, and verbosity reduction hurting code quality. All issues were resolved by April 20, with the company resetting usage limits and improving their change management process.
tools · article
/grill-with-docs: Enhanced AI Collaboration with Documentation
May 17
An evolved AI prompting technique that combines intensive questioning (/grill-me) with active documentation management. It maintains CONTEXT.md files for shared terminology and creates Architectural Decision Records (ADRs) while exploring design decisions through AI dialogue.
tools · article
Claude Platform Updates and Multi-Agent Features
Simon Willison · May 17
Anthropic announced several Claude platform improvements including doubled rate limits, SpaceX Colossus data center partnership, and three new Claude Managed Agents features: multi-agent orchestration, outcomes-based iteration, and self-improvement through 'dreaming'. API volume increased 17x year-over-year with focus on developer productivity tools.
tools · article
Claude Financial Services AI Agent Framework
May 17
A comprehensive toolkit providing pre-built AI agents and plugins for financial workflows including investment banking, equity research, and wealth management. Offers dual deployment options via Claude Cowork plugins or Managed Agents API with specialized functions like pitch generation, market research, and financial reconciliation.
tools · article
OpenKB - Open Source Knowledge Base System
May 17
OpenKB is an open-source CLI tool that compiles raw documents into structured, wiki-style knowledge bases using LLMs. Unlike traditional RAG systems that rediscover knowledge on every query, OpenKB creates persistent knowledge that accumulates over time with automatic cross-references and contradiction detection.
tools · article
Anthropic Claude Managed Agents Platform Launch
Dan Shipper, Marcus Moretti, and Katie Parrott · May 17
Anthropic announced Claude Managed Agents with three key features: multi-agent orchestration, dreaming (learning from past sessions), and outcomes (goal-oriented loops). The platform now provides AI models with harness and host computer, representing a shift from simple text completion to full AI infrastructure hosting.
tools · article
ReadClaude Code Platform Updates Week 19 2026
May 17
Claude Code v2.1.128-v2.1.136 introduces plugin loading from ZIP archives and URLs, cross-project command history search with Ctrl+R, and new worktree branching controls. Additional improvements include auto mode hard deny rules and various environment variable configurations for better development workflow.
Week of Jan 12
tools · article
Claude Code Advanced Features Guide
Ado Kukic · Jan 13
Comprehensive guide to Claude Code's productivity features including project onboarding with /init, context management with @ mentions, instant bash execution with ! prefix, and workflow shortcuts like double Esc to rewind and Ctrl+R for command history. Covers setup, memory management, and essential commands for efficient AI-assisted coding.