← Home

Library

40 accepted items

Week of Jun 8

  • tools · article

    AI Coding Agent Evaluation Skills Framework

    Hamel Husain · Jun 14

    Hamel Husain released evals-skills, a structured set of capabilities that teach coding agents how to effectively evaluate AI products. The framework includes six core skills from error analysis to building review interfaces, designed to help agents distinguish between different types of failures and implement proper evaluation pipelines.

Week of Jun 1

  • tools · article

    LLM Council Multi-Model Query System

    Jun 7

    A local web application that sends queries to multiple LLMs simultaneously, has them review and rank each other's responses anonymously, then produces a final consolidated answer via a designated Chairman LLM. Built with FastAPI backend and React frontend, using OpenRouter API to access various models like GPT, Claude, and Gemini.

  • tools · article

    Matt Pocock's Production Agent Skills Library

    Yash Thakker · Jun 7

    A collection of 20+ production-grade AI agent skills for real engineering workflows, organized into planning, development, and tooling categories. Created by TypeScript educator Matt Pocock, the repository has 25,500+ GitHub stars and focuses on test-driven development, architecture planning, and git safety rather than experimental coding.

Week of May 25

  • tools · article

    Project Glasswing AI Vulnerability Discovery

    May 31

    Project Glasswing uses Claude Mythos Preview AI model to find over 10,000 critical vulnerabilities across systemically important software with 50+ partners. The model shows 10x improvement in bug discovery rates compared to previous methods, with Cloudflare finding 2,000 bugs including 400 critical-severity issues.

  • tools · article

    Anthropic Acquires Stainless SDK Platform

    May 31

    Anthropic acquired Stainless in May 2026, a company that generates SDKs and MCP server tooling across multiple programming languages. Stainless has powered all official Anthropic SDKs since the API launch and helps hundreds of companies build developer tools and agent connectors.

  • tools · article

    Claude Opus 4.8 AI Model Release

    May 31

    Anthropic released Claude Opus 4.8 with significant improvements in coding, reasoning, and agentic tasks compared to previous versions. The model shows better judgment, tool calling efficiency, and reliability in autonomous workflows, with new features like effort control and dynamic workflows in Claude Code.

  • tools · article

    Claude Code Memory System Architecture

    orchestrator.dev · May 31

    Claude Code has a four-layer memory architecture that allows persistent storage of codebase context, architecture decisions, and debugging history across sessions. Most developers only use 10% of its capability, leading to repetitive corrections and lost context between sessions.

  • tools · article

    LLM Judge Model Selection Framework 2026

    NVJK Kartik · May 31

    Comprehensive comparison of 8 LLM judge models (Claude, GPT-5, Gemini, Luna-2, etc.) evaluated on three critical axes: human correlation on specific rubrics, cost per evaluation, and self-preference bias. Argues against using generic benchmarks like SummEval for judge selection.

Week of May 11

  • tools · article

    Claude Code Agent View CLI Dashboard

    https://www.facebook.com/testingcatalog · May 17

    Anthropic launched Agent View for Claude Code, a command-line dashboard that manages multiple parallel coding sessions from a single interface. The feature allows developers to run background coding tasks, monitor session states, and switch between agents without managing multiple terminal windows.

  • tools · article

    Opus 4.7 Productivity Tips from Boris Cherny

    May 17

    Six practical tips for maximizing productivity with Claude's Opus 4.7, including auto mode for permission handling, effort level configuration, focus mode for cleaner output, and verification patterns. Features like recaps and the /fewer-permission-prompts skill help streamline long-running AI tasks.

  • tools · article

    Claude Agent SDK for Building AI Agents

    May 17

    The Claude Agent SDK (formerly Claude Code SDK) enables developers to build powerful agents by giving Claude access to computer tools like terminal, file editing, and bash commands. This approach allows agents to work like humans do, enabling applications beyond coding including finance, research, and customer support agents.

  • tools · article

    Claude Code Performance Issues and Fixes

    May 17

    Anthropic identified three separate issues that degraded Claude Code performance between March-April: default reasoning effort reduced from high to medium, a memory clearing bug causing forgetfulness, and verbosity reduction hurting code quality. All issues were resolved by April 20, with the company resetting usage limits and improving their change management process.

  • tools · article

    /grill-with-docs: Enhanced AI Collaboration with Documentation

    May 17

    An evolved AI prompting technique that combines intensive questioning (/grill-me) with active documentation management. It maintains CONTEXT.md files for shared terminology and creates Architectural Decision Records (ADRs) while exploring design decisions through AI dialogue.

  • tools · article

    Claude Platform Updates and Multi-Agent Features

    Simon Willison · May 17

    Anthropic announced several Claude platform improvements including doubled rate limits, SpaceX Colossus data center partnership, and three new Claude Managed Agents features: multi-agent orchestration, outcomes-based iteration, and self-improvement through 'dreaming'. API volume increased 17x year-over-year with focus on developer productivity tools.

  • tools · article

    Claude Financial Services AI Agent Framework

    May 17

    A comprehensive toolkit providing pre-built AI agents and plugins for financial workflows including investment banking, equity research, and wealth management. Offers dual deployment options via Claude Cowork plugins or Managed Agents API with specialized functions like pitch generation, market research, and financial reconciliation.

  • tools · article

    OpenKB - Open Source Knowledge Base System

    May 17

    OpenKB is an open-source CLI tool that compiles raw documents into structured, wiki-style knowledge bases using LLMs. Unlike traditional RAG systems that rediscover knowledge on every query, OpenKB creates persistent knowledge that accumulates over time with automatic cross-references and contradiction detection.

  • tools · article

    Anthropic Claude Managed Agents Platform Launch

    Dan Shipper, Marcus Moretti, and Katie Parrott · May 17

    Anthropic announced Claude Managed Agents with three key features: multi-agent orchestration, dreaming (learning from past sessions), and outcomes (goal-oriented loops). The platform now provides AI models with harness and host computer, representing a shift from simple text completion to full AI infrastructure hosting.

Week of Apr 6

  • tools · tweet

    Anthropic Claude Managed Agents for Business Automation

    Corey Ganim · Apr 10

    Anthropic's Managed Agents removes the technical barriers to deploying AI agents for business automation by handling infrastructure, security, and deployment. Users only need to define what the agent should do, not how to build the underlying systems. This enables rapid prototyping and deployment of custom AI services without engineering expertise.

  • tools · tweet

    Claude Managed Agents Launch

    Lance Martin · Apr 8

    Claude Managed Agents is a pre-built, configurable agent system that runs on managed infrastructure, designed to handle long-horizon tasks as Claude's capabilities grow. It addresses challenges of keeping agent harnesses updated with Claude's evolving abilities and supporting extended execution times through safe, resilient infrastructure.

  • tools · tweet

    Claude for Legal Practice Workflows

    Zack Shapiro · Apr 6

    A boutique law firm uses Claude (general-purpose AI) instead of specialized legal AI tools to compete with larger firms. Claude analyzes complex deal terms, tracks interdependent contract provisions, and identifies legal conflicts in real-time during negotiations.

Week of Mar 30

  • tools · tweet

    Processing failed

    Kevin Gu · Apr 5

    Could not process content automatically.

  • tools · tweet

    Comparing gstack, Superpowers, and Compound Engineering Tools

    Vox · Mar 30

    Three popular Claude-based coding tools serve different functions in AI development workflows: gstack handles planning and evaluation (like a head chef), Superpowers manages kitchen processes, and Compound Engineering acts as a knowledge repository. The author uses restaurant metaphors to explain how these tools complement rather than compete with each other.

Week of Mar 23

  • tools · tweet

    AI Agent Design Harness for Non-Designers

    Neethan Wu · Mar 23

    A three-layer system using AI skills (instruction files for design expertise), canvases (HTML/CSS design surfaces), and agents to enable engineers to produce professional UI/UX without traditional design training. Key tools include Impeccable UI skill, Paper canvas for real HTML/CSS design, and Pencil for Git-versioned design files.

Week of Mar 2

  • tools · tweet

    Multi-Agent Bug Finding System

    Dan Peguine ⌐◨-◨ · Mar 4

    A three-agent system for finding bugs using Hunter Agent (finds all potential bugs with scoring), Skeptic Agent (challenges findings to reduce false positives), and Referee Agent (makes final determinations). Each agent has specific prompts and scoring mechanisms to maximize accuracy.

  • tools · tweet

    Processing failed

    Artem Zhutov · Mar 3

    Could not process content automatically.

Week of Feb 23

  • tools · tweet

    Claude Code Skills for Design Automation

    ✌︎ frederik ✌︎ · Feb 24

    Skills are instruction sets for Claude Code that automate specific design and development tasks. Key examples include mobile-ios-design for enforcing iOS guidelines, impeccable toolkit for design refinement, and custom enterprise UX research workflows that can process feature ideas through structured analysis phases.

Week of Feb 9

  • tools · tweet

    Claude Code as AI Chief of Staff

    Mike Murchison · Feb 14

    Mike Murchison demonstrates using Claude Code as an AI Chief of Staff that doubled his CEO productivity by unifying 6+ communication channels, managing multiplayer todo lists overnight, enriching contact records from meeting transcripts, and providing strategic pushback on decisions. He's shared the implementation on Github for other executives to try.

Week of Feb 2

  • tools · tweet

    Claude Code vs Cursor for Designer Workflows

    ✌︎ frederik ✌︎ · Feb 7

    A designer's comparison of Claude Code and Cursor, highlighting how Claude Code's Model Context Protocols (MCPs) enable seamless integration with design tools like Figma, Framer, and Remotion. The author found Claude Code superior for automating tedious design tasks across entire projects in seconds rather than hours.

  • tools · tweet

    Claude Code Agent Teams Feature

    Tom · Feb 7

    Anthropic shipped agent teams natively into Claude Code, allowing multiple AI agents to work in parallel on different parts of a task while coordinating with each other. This replaces the sequential single-agent approach with a project manager model that delegates work across specialized teammates.

  • tools · tweet

    Claude Code Setup and Configuration Guide

    Ashley Ha · Feb 2

    Boris Cherny, creator of Claude Code, shared detailed threads about his setup and usage patterns. Ashley Ha compiled these instructions into a markdown guide, revealing that the tool works well with minimal customization out of the box.

Week of Jan 26

  • tools · tweet

    Supermemory Plugin for Claude Code

    Dhravya Shah · Jan 31

    Supermemory launched a plugin that gives Claude Code persistent memory across sessions, remembering coding preferences, codebase context, and past decisions. Uses hybrid memory system combining fact extraction and profile building, achieving 81.6% on LongMemEval benchmark versus 40-60% for traditional RAG systems.

  • tools · tweet

    Claude Code Playground Plugin for Interactive HTML

    Thariq · Jan 30

    A new Claude Code plugin that generates standalone HTML playgrounds for visualizing and interacting with problems in ways not suited for text. Useful for architecture visualization, design tweaking, layout brainstorming, and game balancing through interactive interfaces.

Week of Jan 19

  • tools · tweet

    Claude Code Tasks System Launch

    Thariq · Jan 23

    Claude Code upgraded from Todos to Tasks, a new primitive for tracking complex projects across multiple sessions and subagents. Tasks support dependencies, are stored in the file system, and enable real-time collaboration between sessions working on the same project.

  • tools · tweet

    Ralph - AI Coding Agent Loop

    Aakash Gupta · Jan 22

    Ralph is a bash loop that runs AI coding agents repeatedly on atomic tasks until completion, delivering entire projects autonomously. It breaks large features into small, binary-success tasks that AI can complete without context pollution or hallucination.

  • tools · tweet

    Ralph AI - Autonomous Software Building Tool

    Damian Player · Jan 22

    Ralph is an AI system that builds software autonomously by breaking work into small, testable tasks and working through them iteratively while you're away. It operates like a continuous integration system, picking tasks, building features, testing them, and moving to the next one without human intervention.

Week of Jan 12

  • tools · tweet

    Agentic UI Design Resources Trinity

    Cole · Jan 18

    Cole recommends three key resources for agentic UI design: rams.ai by Eli Rousso, ui-skills.com by Ibelick, and Vercel's design guidelines. These tools represent essential references for building AI-driven user interfaces.

  • tools · tweet

    Advanced Claude Code Features and Context Management

    Eyad · Jan 13

    Claude Code provides consistent 200K token context unlike other AI coding tools, and includes three advanced features: skills (markdown files that teach Claude specific workflows), subagents, and MCP connectors. Skills use YAML frontmatter to define when they should be automatically applied, making them powerful for team-specific coding standards and workflows.

  • tools · tweet

    Claude Cowork Desktop App Review

    claire vo 🖤 · Jan 13

    Claude Cowork is a Mac desktop app that applies Claude's coding approach to non-technical knowledge work tasks like document creation, data analysis, and calendar management. It features connectors, filesystem access, TODO tracking, and bundled skills, but has connectivity issues and exposes technical artifacts that may confuse non-technical users.

  • tools · tweet

    Claude Agent SDK for Building AI Agents

    nader dabit · Jan 13

    The Claude Agent SDK provides the infrastructure behind Claude Code as a library, handling the agent loop, built-in tools, and context management. It includes pre-built tools like Read, Write, Edit, Bash, and WebSearch, allowing developers to build custom agents without implementing the underlying tool execution loop.

  • tools · article

    Claude Code Advanced Features Guide

    Ado Kukic · Jan 13

    Comprehensive guide to Claude Code's productivity features including project onboarding with /init, context management with @ mentions, instant bash execution with ! prefix, and workflow shortcuts like double Esc to rewind and Ctrl+R for command history. Covers setup, memory management, and essential commands for efficient AI-assisted coding.