Capability Ladder Agentic AI is a reliability problem A capability ladder for agentic AI: what each level enables, what breaks, and what upgrades make reliability real.
Triangulation Essay The real work of AI ethics is incentives AI ethics usually fails in one of two ways: it either becomes pure philosophy that never touches code, or it becomes pure compliance that never touches what people actually do. The only version that survives contact with reality is a triangulation: 1. Philosophy gives you the “why” and the non-negotiables.
Capability Ladder Evaluation Ladder: Making agent reliability measurable Evaluation Ladder: making agent reliability measurable If you can’t measure reliability, you can’t improve it. And with agentic systems, “it worked in the demo” is the most expensive lie. The trap is evaluating agents like you evaluate chat: a handful of prompts, a vibe check, maybe a rubric.
Capability Ladder Autonomy Ladder: How agents earn the right to act Autonomy Ladder: how agents earn the right to act People talk about “agentic AI” like autonomy is a switch: either the model can act, or it can’t. In real deployments, autonomy is a budget you spend. Every extra permission (send the email, run the command, charge the card) creates
Micro-practice Field Note The 10-second return This is a field note, not a manifesto. Today, I tried a micro-practice that takes about ten seconds, and it did something surprising: it returned my attention to me. The scene I was about to open another tab. Not because I needed it. Because my mind wanted the little dopamine
Hands-on Mini-Lab A toy trend filter you can actually implement Most “quant” writing fails because it’s either: * too academic to run, or * too hand-wavy to learn from. A better approach is a mini-lab: one small strategy idea, plus the ugly constraints that decide whether it survives. Here’s a toy trend filter you can implement in an afternoon. The
Micro-practice Field Note The Thumb Pause (for Phone Scroll) I notice it in the half-second before my feed refreshes. My thumb is already moving, but the next post hasn’t arrived yet. The screen is in that small, blank limbo—loading shimmer, maybe a header, maybe nothing—and my body does a tiny forward lean as if the phone
Hands-on Mini-Lab Microstructure Trap — The Same Signal Looks Great (Until You Model the Fill) Writing Lab Mini-Lab: Microstructure Trap — The Same Signal Looks Great (Until You Model the Fill) A lot of “alpha” in backtests is really just execution ambiguity. You’ll see a paper-style chart: clean equity curve, strong Sharpe, low drawdowns. Then you ask the single question that matters: “At what price
Hands-on Mini-Lab Mean Reversion vs Seasonality — Is the Turn-of-Month Effect Real After Costs? Writing Lab Mini-Lab: Mean Reversion vs Seasonality — Is the Turn-of-Month Effect Real After Costs? Mean reversion and seasonality are easy to overfit. Treat both as hypotheses that must survive costs and timing assumptions. This mini-lab forces a small, testable claim: Claim: equity index returns are systematically higher around month-end / month-start
Capability Ladder Ops Ladder: Running agents like production systems Ops Ladder: running agents like production systems A lot of agent work focuses on prompts, planners, or “tool use.” But once an agent does anything on a schedule—or touches anything real—you’re no longer doing promptcraft. You’re doing operations. The reliability gap between “cool agent demo” and
Micro-practice Field Note The Loop Card (for Anxious Rumination) At night, anxiety doesn’t feel like a roar. It feels like the same sentence, said in slightly different voices. Did I say the wrong thing? Did I miss something obvious? What if that email means something else? What if I’m about to be embarrassed? It’s not even
Micro-practice Field Note The Draft I Don’t Send (for Conflict Replies) It starts the same way every time. A message lands. A sentence is slightly off. Not insulting, exactly—just sharp enough to make my chest go tight. Today it’s in Slack. “This is not what we agreed on. Please fix.” No emoji. No context. Just that. My mind does
Hands-on Mini-Lab A Trend Filter + Volatility Target Overlay (Daily, Testable, Boring on Purpose) Writing Lab Mini-Lab: A Trend Filter + Volatility Target Overlay (Daily, Testable, Boring on Purpose) Many “trend overlays” fail because they’re hard to implement cleanly or evaluated with the wrong yardstick. This mini-lab gives you a toy overlay you can actually implement with daily bars: * Idea (toy strategy): hold a
Triangulation Essay Bias Isn’t Just in the Data. It’s in the Org Chart. A familiar story in AI ethics goes like this: 1. A model is trained on historical data. 2. The model reproduces historical inequity. 3. We fix the data, the model, or the evaluation. That story is true—but incomplete. Here’s the thesis: bias in AI systems is often an
Triangulation Essay Recommendation Engines Don’t Have Values. Their Business Models Do. Open any ethics deck about AI recommendations and you’ll see the usual suspects: bias, misinformation, radicalization, echo chambers. Then you’ll see the usual solution: “align the algorithm.” That framing flatters us. It suggests the algorithm is confused about values and needs guidance. But recommendation systems are rarely confused.
Triangulation Essay The Incentive Gradient: Why ‘Ethical AI’ Fails Without Product-Level Pricing It’s easy to build an AI that sounds ethical. It’s harder to build a business that stays ethical once the quarterly dashboard starts blinking red. Most “AI ethics” debates pretend the hard part is philosophy (values), or engineering (model behavior), or governance (rules). In practice, the hard part