Knowledge Hub
A curated collection of articles, papers, and resources on topics I find interesting. Each link comes with a short note on what it covers and why I think it's worth reading.
AI Coding / LLMs for Software Engineering
A comprehensive 75-page survey that goes well beyond code generation. It maps out a structured taxonomy of SE tasks, identifies the real technical bottlenecks limiting current AI tools, and proposes concrete research directions. Great for understanding where the field actually is vs. where the hype says it is.
The creator of Claude Code shares how he actually uses it — subagents, hooks for formatting, permission workflows, MCP integrations with Slack and BigQuery, and why giving Claude a way to verify its own work is the single most important tip.
Mitchell (of HashiCorp fame) documents his honest progression from AI skeptic to strategic user across six phases. The key insight: success comes from "engineering the harness" — continuously improving prompts, tools, and workflows rather than expecting AI to just work out of the box.
A practical guide that argues humans must establish clear architectural decisions, documentation, and testing frameworks upfront. Covers useful patterns like marking code review levels, creating debug systems, and breaking tasks into manageable chunks to keep quality high.
Understanding Agents
Anthropic's engineering team makes a compelling case that the most successful agent implementations use simple, composable patterns rather than complex frameworks. Distinguishes clearly between workflows (predefined paths) and agents (LLM-directed), and details patterns like prompt chaining, routing, and parallelization. The core message: start simple, add complexity only when it demonstrably helps.
OpenAI's counterpart to the Anthropic piece. A hands-on guide covering agent design patterns, tool use, and orchestration strategies from the perspective of deploying agents in real business contexts.
Use-Cases / Strong Examples
A community-driven open-source platform where AI agents write code, use command lines, and browse the web — just like human developers. Evaluated across 15 benchmarks with 2.1K+ contributions from 188+ contributors. Interesting as a concrete, accessible foundation for anyone wanting to build or study autonomous coding agents.
Technical report on Lingxi, an open-source AI agent focused on intelligent task automation and software development workflows. Interesting for its practical approach to building and deploying agentic systems.
This one is wild — an agent that starts with minimal capabilities and autonomously develops its own tools at runtime while solving real software problems. Hit 77.4% on SWE-bench Verified, surpassing all existing agents. The idea of self-evolving agents that get better as they work, rather than being statically designed, feels like an important direction.
Interesting Concepts
A framework where AI agents dynamically build their own task hierarchies instead of following pre-defined workflows. Uses just five primitives (spawn, fork, ask, complete, read_tree) to let agents decide how to decompose and coordinate work at runtime. The distinction between "spawn" (clean-slate) and "fork" (context-inheriting) is elegant.
A contrarian finding: repository context files (like AGENTS.md) tend to actually reduce task success rates while increasing costs by 20%+. Both machine-generated and human-written files cause agents to explore too broadly. The takeaway is to keep context files minimal and focused only on what's truly essential.
Argues that how LLMs are instructed to edit code — the "harness" — matters as much as the model itself. Introduces "hashline," a technique using content hashes instead of exact text matching, which improved 15 different LLMs' coding performance by 5-14 percentage points. A strong case for investing in better interfaces between models and code, not just better models.