← Home

Library

19 accepted items

Week of Jun 8

  • tools · article

    AI Coding Agent Evaluation Skills Framework

    Hamel Husain · Jun 14

    Hamel Husain released evals-skills, a structured set of capabilities that teach coding agents how to effectively evaluate AI products. The framework includes six core skills from error analysis to building review interfaces, designed to help agents distinguish between different types of failures and implement proper evaluation pipelines.

Week of Jun 1

  • tools · article

    LLM Council Multi-Model Query System

    Jun 7

    A local web application that sends queries to multiple LLMs simultaneously, has them review and rank each other's responses anonymously, then produces a final consolidated answer via a designated Chairman LLM. Built with FastAPI backend and React frontend, using OpenRouter API to access various models like GPT, Claude, and Gemini.

  • tools · article

    Matt Pocock's Production Agent Skills Library

    Yash Thakker · Jun 7

    A collection of 20+ production-grade AI agent skills for real engineering workflows, organized into planning, development, and tooling categories. Created by TypeScript educator Matt Pocock, the repository has 25,500+ GitHub stars and focuses on test-driven development, architecture planning, and git safety rather than experimental coding.

Week of May 25

  • tools · article

    Project Glasswing AI Vulnerability Discovery

    May 31

    Project Glasswing uses Claude Mythos Preview AI model to find over 10,000 critical vulnerabilities across systemically important software with 50+ partners. The model shows 10x improvement in bug discovery rates compared to previous methods, with Cloudflare finding 2,000 bugs including 400 critical-severity issues.

  • tools · article

    Anthropic Acquires Stainless SDK Platform

    May 31

    Anthropic acquired Stainless in May 2026, a company that generates SDKs and MCP server tooling across multiple programming languages. Stainless has powered all official Anthropic SDKs since the API launch and helps hundreds of companies build developer tools and agent connectors.

  • tools · article

    Claude Opus 4.8 AI Model Release

    May 31

    Anthropic released Claude Opus 4.8 with significant improvements in coding, reasoning, and agentic tasks compared to previous versions. The model shows better judgment, tool calling efficiency, and reliability in autonomous workflows, with new features like effort control and dynamic workflows in Claude Code.

  • tools · article

    Claude Code Memory System Architecture

    orchestrator.dev · May 31

    Claude Code has a four-layer memory architecture that allows persistent storage of codebase context, architecture decisions, and debugging history across sessions. Most developers only use 10% of its capability, leading to repetitive corrections and lost context between sessions.

  • tools · article

    LLM Judge Model Selection Framework 2026

    NVJK Kartik · May 31

    Comprehensive comparison of 8 LLM judge models (Claude, GPT-5, Gemini, Luna-2, etc.) evaluated on three critical axes: human correlation on specific rubrics, cost per evaluation, and self-preference bias. Argues against using generic benchmarks like SummEval for judge selection.

Week of May 11

  • tools · article

    Claude Code Agent View CLI Dashboard

    https://www.facebook.com/testingcatalog · May 17

    Anthropic launched Agent View for Claude Code, a command-line dashboard that manages multiple parallel coding sessions from a single interface. The feature allows developers to run background coding tasks, monitor session states, and switch between agents without managing multiple terminal windows.

  • tools · article

    Opus 4.7 Productivity Tips from Boris Cherny

    May 17

    Six practical tips for maximizing productivity with Claude's Opus 4.7, including auto mode for permission handling, effort level configuration, focus mode for cleaner output, and verification patterns. Features like recaps and the /fewer-permission-prompts skill help streamline long-running AI tasks.

  • tools · article

    Claude Agent SDK for Building AI Agents

    May 17

    The Claude Agent SDK (formerly Claude Code SDK) enables developers to build powerful agents by giving Claude access to computer tools like terminal, file editing, and bash commands. This approach allows agents to work like humans do, enabling applications beyond coding including finance, research, and customer support agents.

  • tools · article

    Claude Code Performance Issues and Fixes

    May 17

    Anthropic identified three separate issues that degraded Claude Code performance between March-April: default reasoning effort reduced from high to medium, a memory clearing bug causing forgetfulness, and verbosity reduction hurting code quality. All issues were resolved by April 20, with the company resetting usage limits and improving their change management process.

  • tools · article

    /grill-with-docs: Enhanced AI Collaboration with Documentation

    May 17

    An evolved AI prompting technique that combines intensive questioning (/grill-me) with active documentation management. It maintains CONTEXT.md files for shared terminology and creates Architectural Decision Records (ADRs) while exploring design decisions through AI dialogue.

  • tools · article

    Claude Platform Updates and Multi-Agent Features

    Simon Willison · May 17

    Anthropic announced several Claude platform improvements including doubled rate limits, SpaceX Colossus data center partnership, and three new Claude Managed Agents features: multi-agent orchestration, outcomes-based iteration, and self-improvement through 'dreaming'. API volume increased 17x year-over-year with focus on developer productivity tools.

  • tools · article

    Claude Financial Services AI Agent Framework

    May 17

    A comprehensive toolkit providing pre-built AI agents and plugins for financial workflows including investment banking, equity research, and wealth management. Offers dual deployment options via Claude Cowork plugins or Managed Agents API with specialized functions like pitch generation, market research, and financial reconciliation.

  • tools · article

    OpenKB - Open Source Knowledge Base System

    May 17

    OpenKB is an open-source CLI tool that compiles raw documents into structured, wiki-style knowledge bases using LLMs. Unlike traditional RAG systems that rediscover knowledge on every query, OpenKB creates persistent knowledge that accumulates over time with automatic cross-references and contradiction detection.

  • tools · article

    Anthropic Claude Managed Agents Platform Launch

    Dan Shipper, Marcus Moretti, and Katie Parrott · May 17

    Anthropic announced Claude Managed Agents with three key features: multi-agent orchestration, dreaming (learning from past sessions), and outcomes (goal-oriented loops). The platform now provides AI models with harness and host computer, representing a shift from simple text completion to full AI infrastructure hosting.

  • tools · article

    Read

    Claude Code Platform Updates Week 19 2026

    May 17

    Claude Code v2.1.128-v2.1.136 introduces plugin loading from ZIP archives and URLs, cross-project command history search with Ctrl+R, and new worktree branching controls. Additional improvements include auto mode hard deny rules and various environment variable configurations for better development workflow.

Week of Jan 12

  • tools · article

    Claude Code Advanced Features Guide

    Ado Kukic · Jan 13

    Comprehensive guide to Claude Code's productivity features including project onboarding with /init, context management with @ mentions, instant bash execution with ! prefix, and workflow shortcuts like double Esc to rewind and Ctrl+R for command history. Covers setup, memory management, and essential commands for efficient AI-assisted coding.