<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>OrchestrateAI</title>
    <subtitle>Practical intelligence on AI orchestration: architecture, deployment, and operations for autonomous agent systems</subtitle>
    <link rel="self" type="application/atom+xml" href="https://orchestrateai.dev/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://orchestrateai.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-06-13T00:00:00+00:00</updated>
    <id>https://orchestrateai.dev/atom.xml</id>
    <entry xml:lang="en">
        <title>The Fable 5 Takedown Smells Less Like AI Safety and More Like a Frontier Oligarchy</title>
        <published>2026-06-13T00:00:00+00:00</published>
        <updated>2026-06-13T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://orchestrateai.dev/blog/the-fable-5-takedown-safety-or-oligarchy/"/>
        <id>https://orchestrateai.dev/blog/the-fable-5-takedown-safety-or-oligarchy/</id>
        
        <content type="html" xml:base="https://orchestrateai.dev/blog/the-fable-5-takedown-safety-or-oligarchy/">&lt;p&gt;Anthropic shipped Fable 5 and Mythos 5 on June 9. Three days later, on the 12th, a letter landed at 5:21pm Eastern and both models went dark for every user on the planet. Not throttled, not gated, not restricted to a watchlist of bad actors. Gone, worldwide, including for Anthropic’s own employees who happen to hold the wrong passport.&lt;&#x2F;p&gt;
&lt;p&gt;I had spent those three days shipping real work with it. Then a letter I never got to read pulled it out of my pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;The stated reason is national security. A frontier model that can do dangerous things got a government directive pulled down on it, the safety story writes itself, and plenty of people will nod along because that is what the words are designed to do. I want to look at the actual shape of the thing instead of the label on the box, because when you line up what was banned, who it hits, and what was conveniently left running, this stops smelling like safety and starts smelling like the early scaffolding of a two-lab oligarchy.&lt;&#x2F;p&gt;
&lt;p&gt;None of this is a tinfoil thesis. It is just what you get when you read intent from actions instead of press releases.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-actually-happened-in-one-breath&quot;&gt;What actually happened, in one breath&lt;&#x2F;h2&gt;
&lt;p&gt;On June 12 the government handed Anthropic an export-control directive built on national security authority, and the order suspended all access to Fable 5 and Mythos 5 for any foreign national, inside or outside the United States, which in practice meant Anthropic had to switch both models off for everyone to stay compliant. Every other Anthropic model kept running, so this was surgical, aimed at the two newest releases and nothing else.&lt;&#x2F;p&gt;
&lt;p&gt;The trigger, as far as anyone can tell, is a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.anthropic.com&#x2F;news&#x2F;fable-mythos-access&quot;&gt;jailbreak&lt;&#x2F;a&gt;. The government believes it found a way to bypass Fable 5’s guardrails, and the demonstrated technique was asking the model to review a codebase and point out the software vulnerabilities in it. Anthropic looked at the same demo and called it a narrow jailbreak that surfaced a few minor, already-known issues, using a capability that is widely available from other models and run every day by security professionals doing their actual jobs.&lt;&#x2F;p&gt;
&lt;p&gt;So the dangerous act here is code review. Hold that thought.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-safety-story-does-not-survive-contact-with-the-details&quot;&gt;The safety story does not survive contact with the details&lt;&#x2F;h2&gt;
&lt;p&gt;If the concern were genuinely the capability, the response would not be model-shaped, it would be capability-shaped, and it would not stop at the two products with the freshest launch dates.&lt;&#x2F;p&gt;
&lt;p&gt;GPT-5.5 will read your codebase and find vulnerabilities all day long, and it is sitting on the shelf untouched. Half the open-weight models you can run on a desk will do the same. Anthropic’s own argument is the blunt one, that &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;fortune.com&#x2F;2026&#x2F;06&#x2F;13&#x2F;anthropic-disables-fable-mythos-export-controls-national-security-threat&#x2F;&quot;&gt;if this standard were applied across the industry&lt;&#x2F;a&gt; it would essentially halt all new model deployments for every frontier provider, which is a polite way of saying the rule is not a rule, it is a spotlight, and right now it is pointed at exactly one company.&lt;&#x2F;p&gt;
&lt;p&gt;Even Dean Ball, who worked in the Trump administration, looked at this and said he could not tell whether it was lawfare against Anthropic specifically or just extreme national-security hawkery, and that either way it was, in his word, cartoonish. When the people sympathetic to the administration are calling the move cartoonish, the safety framing is already on thin ice.&lt;&#x2F;p&gt;
&lt;p&gt;There is a darker joke sitting underneath this one. Two days before the government switched off his newest model, Dario Amodei published an essay, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;darioamodei.com&#x2F;post&#x2F;policy-on-the-ai-exponential&quot;&gt;Policy on the AI Exponential&lt;&#x2F;a&gt;, arguing for exactly this power. He wrote that “frontier AI models, like airplanes, should be required to go through technical testing and auditing, and their release should be blocked or reversed as a threat to public safety if they do not meet high standards of safety.” He wanted an FAA for models, with the government holding a switch it could throw on anything it judged unsafe. He got it. The switch existed for about a week before it came down on Fable 5, and the single safeguard he asked for in that same essay, “protective measures against political favoritism or arbitrary decisions,” is the exact part that went missing when the order showed up with no public evidence in the middle of a contract fight. You do not get to build the off switch, hand it to Washington, and then act surprised when Washington decides your model is the thing it wants switched off.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;i-had-fable-for-three-days&quot;&gt;I had Fable for three days&lt;&#x2F;h2&gt;
&lt;p&gt;Underneath the policy argument there is a smaller, more personal complaint, which is that Fable 5 was genuinely really good and I am annoyed I do not have it anymore.&lt;&#x2F;p&gt;
&lt;p&gt;I did not spend its three-day life running throwaway tests. I shipped with it. It worked through two of the more head-spinning initiatives in my Loopcycle repo, the terminal-grid tmux unification, which is a deep refactor of the terminal substrate down at the client-architecture level, and the agent fleet cockpit, which is the fleet pill, launcher, and driver observability work that makes the whole operator surface legible. Neither of those is a one-shot prompt demo. They are multi-cycle, file-disjoint, easy-to-break work, the kind where a weaker model quietly leaves you a mess you find three commits later. Fable helped, working through the complex architectural options and coming up with a concrete strategy.&lt;&#x2F;p&gt;
&lt;p&gt;Then I pointed it at the Loopcycle marketing site for a revamp, and the part that catches the eye, the diagram logo, the cards, the flowing animation, that was Fable too. Visual taste is the thing most models still fake, and it just had it.&lt;&#x2F;p&gt;
&lt;figure style=&quot;margin:1.7rem 0;text-align:center;&quot;&gt;
&lt;video autoplay loop muted playsinline style=&quot;width:100%;max-width:620px;border:1px solid var(--c-faint);border-radius:12px;display:block;margin:0 auto;&quot;&gt;
&lt;source src=&quot;&#x2F;images&#x2F;fable-5-animation.webm&quot; type=&quot;video&#x2F;webm&quot; &#x2F;&gt;
&lt;source src=&quot;&#x2F;images&#x2F;fable-5-animation.mp4&quot; type=&quot;video&#x2F;mp4&quot; &#x2F;&gt;
&lt;&#x2F;video&gt;
&lt;figcaption style=&quot;font-size:0.78rem;color:var(--c-faint);margin-top:0.6rem;&quot;&gt;Fable built this. The model is gone, the animation it made is not. Launched June 9, pulled June 12, cause of death a letter.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;So when I call the takedown disproportionate, I am not defending an abstraction. They reached across the planet and switched off a tool that was, three days into its life, the best thing in my pipeline. Over a code-review demo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-backdrop-nobody-is-putting-in-the-headline&quot;&gt;The backdrop nobody is putting in the headline&lt;&#x2F;h2&gt;
&lt;p&gt;This did not happen in a vacuum, and it did not happen between strangers.&lt;&#x2F;p&gt;
&lt;p&gt;Back in February the administration &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;techcrunch.com&#x2F;2026&#x2F;02&#x2F;27&#x2F;president-trump-orders-federal-agencies-to-stop-using-anthropic-after-pentagon-dispute&#x2F;&quot;&gt;told federal agencies to stop using Anthropic’s models&lt;&#x2F;a&gt; and branded the company a supply chain risk, a label usually saved for firms tied to foreign adversaries, after Anthropic refused to hand the Pentagon unrestricted access and held a line against its models being used for autonomous weapons and domestic mass surveillance. Anthropic &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.cnn.com&#x2F;2026&#x2F;03&#x2F;09&#x2F;tech&#x2F;anthropic-sues-pentagon&quot;&gt;sued in March&lt;&#x2F;a&gt; and is still fighting it in court. So by the time the June 12 letter shows up, there is already a feud with a body count, and the newest, most capable models from the company in the doghouse get yanked off the global shelf on three days notice over a code-review demo.&lt;&#x2F;p&gt;
&lt;p&gt;Maybe that is genuine fallout from the military fight. Maybe it is theatre with an agenda, a bit of choreography where Anthropic plays the wounded party, cuts a quiet deal, and comes back online with a new federal understanding and a tidier moat than it had before. I cannot prove which one it is, and honestly it does not change the conclusion, because both roads end in the same place. The frontier labs and the state are not adversaries here so much as two hands on the same lever, and when the dust settles, Anthropic and OpenAI will still be standing inside the fence while the question of who else gets to build gets quietly redrawn.&lt;&#x2F;p&gt;
&lt;p&gt;I tend to assume the big labs are cogs in the same machine, not insurgents against it. That is not cynicism, it is just pattern matching against every other industry that got “secured” by people in suits.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-real-target-is-not-anthropic&quot;&gt;The real target is not Anthropic&lt;&#x2F;h2&gt;
&lt;p&gt;Here is the part you have to get right, because the lazy version of this take falls apart in one sentence.&lt;&#x2F;p&gt;
&lt;p&gt;If this is regulatory capture, why did it hurt the incumbent? The government did not hand Anthropic a moat, it torched their flagship launch three days in and left their biggest competitor running. That is the obvious objection, and if you do not answer it up front you deserve the comment that comes back.&lt;&#x2F;p&gt;
&lt;p&gt;The answer is that the direction of this one strike does not matter. The capability does. What just got established is that the executive can make a deployed, shipping model vanish for the entire world, overnight, on national security authority, with no published evidence and no due process you or I get to see. That power does not care who it lands on today. Once it exists, the steady state is the one every other regulated industry already settled into, where the players with the most lobbyists and the deepest relationships shape when the lever gets pulled and who it gets pulled on. Today it is Anthropic taking the hit in a contract spat. Tomorrow it is whoever is least useful to the people holding the lever, and the safe money says that is the open-weight providers, the labs releasing models you can download and run without asking anyone’s permission.&lt;&#x2F;p&gt;
&lt;p&gt;That is the precedent worth being loud about. Not “the government was mean to Anthropic,” but “the government demonstrated a worldwide kill switch and nobody got to see the evidence.” A switch like that, in this town, does not stay pointed at the strong. It drifts toward the inconvenient.&lt;&#x2F;p&gt;
&lt;p&gt;And the most inconvenient thing in AI, to a structure that wants a small number of licensable, leashable providers, is a good model with open weights that anyone can mirror a thousand times before lunch.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-there-is-a-real-argument-and-i-will-give-it-room&quot;&gt;Where there is a real argument, and I will give it room&lt;&#x2F;h2&gt;
&lt;p&gt;I do not want to pretend the other side is empty, because it is not.&lt;&#x2F;p&gt;
&lt;p&gt;There is a legitimate case that AI is a net negative for most people, and on the jobs piece I am not hedging, I think it is largely right. A lot of work that does not ask for original thought or real judgment is going to get displaced, it is going to happen faster than the people doing it can retrain or move, and I am genuinely worried about where that leaves them. That is not a productivity chart I am going to cheer at. The slop and the scams and the erosion of the few things that were still scarce are real on top of it.&lt;&#x2F;p&gt;
&lt;p&gt;The problem is that the argument arrives about five years too late to matter. The cat is out of the bag. The weights exist, the techniques are published, the hardware is on desks, and no directive out of Washington un-invents any of it. So once you accept that suppression is off the table, the question stops being “should this exist” and becomes “who gets to hold the leash,” and that is the question government control answers in exactly the wrong direction. A kill switch does not make a released technology safer. It just decides whose hand is on it, and the track record on who ends up holding that hand, across every industry that ever got captured, is the ruling class and the lobby money, every single time.&lt;&#x2F;p&gt;
&lt;p&gt;So I am not arguing that more AI faster is good. I am arguing that concentration is worse than distribution, that a tool this powerful sitting behind two or three state-blessed gates is more dangerous than the same tool spread across ten thousand machines nobody can switch off at once. Open competition is not the optimistic choice here. It is the least bad one, and it is the only one that does not end with a handful of companies and a federal agency deciding what the rest of us are allowed to run.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-it-changes-for-me-which-is-nothing-which-is-the-point&quot;&gt;What it changes for me, which is nothing, which is the point&lt;&#x2F;h2&gt;
&lt;p&gt;I run a mixed pipeline. Opus plans and builds in some stages, a competitor’s model reviews in others, and open-weight models like Kimi 2.6 already do real work in the cheap seats. I wrote a few days ago, in &lt;a href=&quot;&#x2F;blog&#x2F;june-15-is-mostly-semantics-for-solo-operators&#x2F;&quot;&gt;the June 15 piece&lt;&#x2F;a&gt;, about keeping that pipeline model agnostic so no single vendor owns my workflow. The Fable 5 takedown is the same lesson with a louder amp.&lt;&#x2F;p&gt;
&lt;p&gt;A model on a subscription can be metered. A model on someone else’s servers can be switched off by a letter you never see. A model running on the 3090 humming next to my desk cannot be metered, cannot be revoked, and cannot be made to disappear at 5:21pm because somebody in Washington had a bad week. That is not a small difference. That is the whole game.&lt;&#x2F;p&gt;
&lt;p&gt;The economics already pointed this way and the politics just put an exclamation point on it. The subsidies that make frontier API calls feel cheap are not forever, and when they end, the open-weight models trailing maybe ten percent behind the frontier are going to be enough for ninety-nine percent of real work, at something like eighty to ninety-nine percent less cost, and eventually at the cost of nothing but the electricity if your hardware is paid off. I was going to make that transition for the money. Now I have a second reason, which is that the open road is the only one without a gate on it.&lt;&#x2F;p&gt;
&lt;p&gt;So my answer to all of this is boring on purpose. I do not switch my whole stack to whoever is currently in favor. I stay agnostic, I keep the open models in the rotation and getting better, and I treat every frontier model as a tool I rent for the jobs where it earns its keep, never as the foundation I build on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-pattern-with-the-labels-off&quot;&gt;The pattern, with the labels off&lt;&#x2F;h2&gt;
&lt;p&gt;Step back and the past couple of weeks tell one story, not two.&lt;&#x2F;p&gt;
&lt;p&gt;On June 15, two days from now, the meter comes on for programmatic Claude usage, which is corporate lock-in wearing a pricing change. This week a government letter switched off two models worldwide, which is state lock-in wearing a security badge. Different hands, same disease, which is the steady pull toward a world where a few entities at the top of the supply chain decide who gets to run the good models and on what terms. One layer reaches for your wallet, the next reaches for the off switch.&lt;&#x2F;p&gt;
&lt;p&gt;That is why I keep saying the real subject is not the supply chain, it is the power structure. The supply chain is just where you can see the power structure when it moves.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-takeaway&quot;&gt;The takeaway&lt;&#x2F;h2&gt;
&lt;p&gt;Big daddy government showing up to protect you from a model that reviews code is the kind of help that has a way of leaving you with fewer options than you started with. Maybe this is honest national security and maybe it is a chess move in a contract feud, but the mechanism it normalizes is the same either way, a worldwide kill switch with the evidence sealed and the lobby money already lining up to aim it.&lt;&#x2F;p&gt;
&lt;p&gt;The labs at the top will be fine. Anthropic and OpenAI will make their deals, take their lumps, and stay inside the fence. The people who lose in the version of this story where it keeps going are the open-weight builders and the rest of us who would rather own our tools than rent them on good behavior.&lt;&#x2F;p&gt;
&lt;p&gt;So take the frontier models when they earn it, because plenty of them are genuinely excellent. Just do not build your life on a thing that can be switched off by a letter you will never get to read. Keep the open weights in the mix, keep them improving, and keep at least one model running on hardware nobody else gets a vote on. The dealer can change the price, and now we know the landlord can change the locks. The only model you actually own is the one already sitting on your floor.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Who Types &#x2F;skill: June 15, Claude Code Subscriptions, and Why It&#x27;s Mostly Semantics</title>
        <published>2026-06-09T00:00:00+00:00</published>
        <updated>2026-06-09T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://orchestrateai.dev/blog/june-15-is-mostly-semantics-for-solo-operators/"/>
        <id>https://orchestrateai.dev/blog/june-15-is-mostly-semantics-for-solo-operators/</id>
        
        <content type="html" xml:base="https://orchestrateai.dev/blog/june-15-is-mostly-semantics-for-solo-operators/">&lt;p&gt;Anthropic and the AI overlords are at it again, dangling one carrot and then reaching for the stick. For about a year your Claude Code subscription let you mainline Opus for a single flat monthly fee, running it all night and running it while you slept and running a small army of agents off a single seat, because the first hits are always cheap and the dealer knows it. On June 15 the meter comes on.&lt;&#x2F;p&gt;
&lt;p&gt;In the same breath that Anthropic shipped &lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt;, an agent that grinds away on its own while you wander off to make a sandwich, they announced that programmatic workflows now have to get off the subscription, and those two things cannot both be true at once, can they? Somebody is lying to themselves here, and I am fairly sure it isn’t me.&lt;&#x2F;p&gt;
&lt;p&gt;On June 15th, programmatic usage of Claude gets yanked out of your subscription and metered at full API rates, and on paper I am exactly the mark this is pointed at, because I run autonomous loops for a living (kind of). So here is the part the explainer posts are too PC to say out loud: this doesn’t stop me running loops, not even a little bit. What it changes is how recklessly I get to experiment, and it forces an honest look at which stages actually deserve Opus and which can run on Kimi 2.6 or whatever cheaper model does the job. Until I make the call, I will keep my loops on the subscription for as long as the platform lets me, and the day it stops letting me I will type &lt;code&gt;&#x2F;skill {folder}&lt;&#x2F;code&gt; and hit enter myself like a caveman if I need the model.&lt;&#x2F;p&gt;
&lt;p&gt;None of that is defiance. It is just what happens when somebody draws the line in the wrong place.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-change-in-one-breath&quot;&gt;The change, in one breath&lt;&#x2F;h2&gt;
&lt;p&gt;Starting June 15 the thing they call programmatic usage leaves the subscription, and that bucket includes the Claude Agent SDK, &lt;code&gt;claude -p&lt;&#x2F;code&gt; in non-interactive mode, Claude Code GitHub Actions, and any third-party agent wearing the SDK’s badge. All of it moves to a separate monthly credit pool billed at full API rates, with Pro getting $20, Max 5x getting $100, and Max 20x getting $200, and there is no rollover, no pooling across your team, and a hard stop the second the credit runs dry unless you went into the settings and flipped on overflow billing like a responsible adult.&lt;&#x2F;p&gt;
&lt;p&gt;Interactive Claude Code, the web and mobile apps, and Cowork all stay on the subscription, completely untouched. The official line from Boris Cherny is that subscriptions “weren’t built for the usage patterns of these third-party tools,” which is fair enough, because a flat rate that one obsessive runs around the clock was never going to survive contact with reality.&lt;&#x2F;p&gt;
&lt;p&gt;If you want the full autopsy on credit tiers and overflow and what happens to your CI, there are already a dozen good explainers for that. I care about the observation that actually matters, which is where exactly the line falls and why that spot makes the whole thing mostly theater for someone like me.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-line-isn-t-where-they-say-it-is&quot;&gt;The line isn’t where they say it is&lt;&#x2F;h2&gt;
&lt;p&gt;The billing line is not drawn between human work and machine work, it is not drawn between short jobs and long ones, and it is not drawn around autonomy, which are the three places you would actually expect to find it.&lt;&#x2F;p&gt;
&lt;p&gt;It is drawn at the entry point.&lt;&#x2F;p&gt;
&lt;p&gt;The exact same job, with the same prompt and the same tokens and the same model and the same output, counts as subscription usage if it walks in through an interactive session, and metered API usage if it arrives through &lt;code&gt;claude -p&lt;&#x2F;code&gt; or the SDK. Which means the work never sets the price, the entry point does. That is a semantic line rather than an economic one, and once you have seen it you cannot unsee it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;anthropic-is-arguing-with-itself&quot;&gt;Anthropic is arguing with itself&lt;&#x2F;h2&gt;
&lt;p&gt;Look at what Anthropic happily ships on the subscription right now.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;: you hand it a finish line and Claude takes turn after turn toward it on its own, with a small fast model checking after each turn whether you have arrived yet and quietly sending Claude back in if you haven’t. There is no human in the loop, which makes it autonomous work by any honest definition, and it bills to your subscription.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;&#x2F;loop&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;: you fire a prompt or a slash command on a repeating interval, or you let it set its own pace based on what it sees, and it runs recurring and unattended on the subscription.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Take those two together and the pattern is obvious, because both of them run on their own, both keep going long after you have walked away, and both do the exact kind of programmatic grinding the June 15 change is supposedly about, yet both bill to your subscription. The only thing that makes the SDK version programmatic and the &lt;code&gt;&#x2F;loop&lt;&#x2F;code&gt; version not is which entry point each one happened to use.&lt;&#x2F;p&gt;
&lt;p&gt;So if you state the policy honestly it is not “don’t run programmatic workflows on your subscription,” it is “don’t run programmatic workflows on your subscription through these specific entry points, while we keep building you other entry points that do exactly the same thing and bill the other way.” That is not a pricing principle so much as a tollbooth on one road with a wide open field sitting right next to it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;so-what-actually-changes-for-me&quot;&gt;So what actually changes for me?&lt;&#x2F;h2&gt;
&lt;p&gt;Mostly it comes down to who types &lt;code&gt;&#x2F;skill {folder}&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Today I can wire an agent to drop work into a queue and let a headless process grab it through the SDK, and after June 15 that road simply has a meter on it. The actual work, whether it is the implementation or the review or the refactor or whatever the skill happens to do, does not change in the slightest, it just needs a different way in.&lt;&#x2F;p&gt;
&lt;p&gt;The boring version of the workaround is that I leave a “master” session open on my always-on box and point it at a folder, and another agent drops work into that folder as a spec or a target directory or a task file while the master session eats each item as it lands and runs the skill against it. The session stays interactive, the work is identical to whatever the SDK would have done, and the only thing that moved is the trigger, which is now CC native.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the part that bugs me most, which is that I don’t even need to automate the trigger, because the lowest-tech version of all of this is just me, looking at the queue and typing &lt;code&gt;&#x2F;skill {folder}&lt;&#x2F;code&gt; and pressing enter. What a waste of my life, nudging clankers all day. And yet a human typing a command into an interactive session is the most subscription-billed thing in the entire universe, so if the only difference between metered and free is whether a script or my own index finger pushed the same button, then the distinction has already quietly died. I did not find a loophole, they drew the line in the shape of one.&lt;&#x2F;p&gt;
&lt;p&gt;A session babysitting a folder is not a detached daemon, since it has to stay open and the machine has to stay on, which for me is nothing because I already run an always-on 3090 for &lt;a href=&quot;&#x2F;blog&#x2F;i-prepared-my-taxes-with-a-local-llm&#x2F;&quot;&gt;local models&lt;&#x2F;a&gt;, but for someone on a laptop they snap shut at night it is a real constraint. I am pointing all of this out to show how flimsy the boundary is, anchored as it is to features Anthropic themselves bill to the subscription. I am not here to teach you how to work around TOS, so if you are running at a scale where this actually costs real money, skip ahead to the part where I admit it is fair.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;my-pipeline-is-just-slash-commands-with-a-gate-between-them&quot;&gt;My pipeline is just slash commands with a gate between them&lt;&#x2F;h2&gt;
&lt;p&gt;Out of the abstract, and into my normal Tuesday evening.&lt;&#x2F;p&gt;
&lt;p&gt;A unit of work in my harness is a cycle, and a cycle falls out of a pipeline that nests like Russian dolls, where a brainstorm becomes an initiative, the initiative breaks into cycles, each cycle gets planned into waves, and each wave is a pile of tasks that never touch the same files. A recent implementation in my repo went from six brainstorm documents into a single initiative into three cycles, each of them a wave-partitioned plan of individual tasks, and none of that is exotic because it is really just the bookkeeping that stops an autonomous agent from tripping over its own feet.&lt;&#x2F;p&gt;
&lt;p&gt;Each cycle then runs through a fixed agent pipeline, and here is the part that matters for June 15, which is that the pipeline is multi-vendor with purpose. This is a real cycle config from my repo, outlining the flow I oversee in my tauri cockpit.&lt;&#x2F;p&gt;
&lt;div class=&quot;cyc-pipeline&quot;&gt;
&lt;div class=&quot;cyc-header&quot;&gt;&lt;div class=&quot;cyc-header-left&quot;&gt;&lt;span class=&quot;cyc-logo&quot;&gt;&lt;svg fill=&quot;none&quot; stroke=&quot;currentColor&quot; viewBox=&quot;0 0 24 24&quot;&gt;&lt;path stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; stroke-width=&quot;2&quot; d=&quot;M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-title&quot;&gt;Cycle Pipeline&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;&lt;span class=&quot;cyc-status&quot;&gt;Automated&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-outline-wrap&quot;&gt;&lt;div class=&quot;cyc-outline&quot;&gt;&lt;div class=&quot;cyc-outline-label&quot;&gt;Plan&lt;&#x2F;div&gt;Scaffold the plan, gate it on review, execute the waves, review the diff, and loop the fixer until the score clears 8.5.&lt;&#x2F;div&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-nodes&quot;&gt;
&lt;div class=&quot;cyc-step is-done&quot;&gt;&lt;span class=&quot;cyc-icon&quot;&gt;&lt;svg fill=&quot;none&quot; stroke=&quot;currentColor&quot; viewBox=&quot;0 0 24 24&quot;&gt;&lt;path stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; stroke-width=&quot;2.5&quot; d=&quot;M5 13l4 4L19 7&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-text&quot;&gt;&lt;span class=&quot;cyc-step-title&quot;&gt;&#x2F;orchestrator&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-step-desc&quot;&gt;Opus 4.8 &amp;middot; scaffold + plan&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-vendor claude&quot;&gt;claude&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-conn is-done&quot;&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-step is-done&quot;&gt;&lt;span class=&quot;cyc-icon&quot;&gt;&lt;svg fill=&quot;none&quot; stroke=&quot;currentColor&quot; viewBox=&quot;0 0 24 24&quot;&gt;&lt;path stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; stroke-width=&quot;2.5&quot; d=&quot;M5 13l4 4L19 7&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-text&quot;&gt;&lt;span class=&quot;cyc-step-title&quot;&gt;plan-reviewer&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-step-desc&quot;&gt;GPT-5.5 &amp;middot; gate &amp;ge; 8.5&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-vendor codex&quot;&gt;codex&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-conn is-done&quot;&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-step is-done&quot;&gt;&lt;span class=&quot;cyc-icon&quot;&gt;&lt;svg fill=&quot;none&quot; stroke=&quot;currentColor&quot; viewBox=&quot;0 0 24 24&quot;&gt;&lt;path stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; stroke-width=&quot;2.5&quot; d=&quot;M5 13l4 4L19 7&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-text&quot;&gt;&lt;span class=&quot;cyc-step-title&quot;&gt;&#x2F;wave-executor&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-step-desc&quot;&gt;Opus 4.8 &amp;middot; execute waves&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-vendor claude&quot;&gt;claude&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-conn is-active&quot;&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-step is-running&quot;&gt;&lt;span class=&quot;cyc-icon&quot;&gt;&lt;svg fill=&quot;none&quot; stroke=&quot;currentColor&quot; viewBox=&quot;0 0 24 24&quot;&gt;&lt;path stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; stroke-width=&quot;2&quot; d=&quot;M13 10V3L4 14h7v7l9-11h-7z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-text&quot;&gt;&lt;span class=&quot;cyc-step-title&quot;&gt;code-reviewer&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-step-desc&quot;&gt;GPT-5.5 &amp;middot; reviewing diff...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-vendor codex&quot;&gt;codex&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-conn is-pending&quot;&gt;&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-step is-pending&quot;&gt;&lt;span class=&quot;cyc-icon&quot;&gt;&lt;svg fill=&quot;none&quot; stroke=&quot;currentColor&quot; viewBox=&quot;0 0 24 24&quot;&gt;&lt;path stroke-linecap=&quot;round&quot; stroke-linejoin=&quot;round&quot; stroke-width=&quot;2&quot; d=&quot;M12 8v4l3 3m6-3a9 9 0 11-18 0 9 9 0 0118 0z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-text&quot;&gt;&lt;span class=&quot;cyc-step-title&quot;&gt;&#x2F;fix-task&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-step-desc&quot;&gt;GPT-5.5 or Opus &amp;middot; fix loop&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-vendor either&quot;&gt;either&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;cyc-footer&quot;&gt;&lt;span&gt;Stage 4 &#x2F; 5&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-sep&quot;&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span&gt;Code Review&lt;&#x2F;span&gt;&lt;span class=&quot;cyc-spacer&quot;&gt;Gate &amp;ge; 8.5&lt;&#x2F;span&gt;&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;p&gt;The tags on the right are the whole point, because the orange &lt;code&gt;claude&lt;&#x2F;code&gt; stages, &lt;code&gt;&#x2F;orchestrator&lt;&#x2F;code&gt; and &lt;code&gt;&#x2F;wave-executor&lt;&#x2F;code&gt;, run Opus 4.8, while the outlined &lt;code&gt;codex&lt;&#x2F;code&gt; stages, plan review and code review and the fix loop, run GPT-5.5. Opus plans and builds while GPT-5.5 reviews and breaks ties, and the thing loops until the gate clears at 8.5, with the box colors showing nothing more than progress, so solid is done, orange is running, and faded is waiting its turn. Every single stage is a slash command, which means I can sit here and type &lt;code&gt;&#x2F;orchestrator&lt;&#x2F;code&gt;, read the plan, type &lt;code&gt;&#x2F;wave-executor&lt;&#x2F;code&gt;, watch the waves run, and then type &lt;code&gt;&#x2F;fix-task {cycle}&lt;&#x2F;code&gt; to drive the review-and-fix loop, hitting send between each one. That is the whole pipeline, and every keystroke of it is interactive subscription-billed Claude sitting right next to &lt;code&gt;codex -p&lt;&#x2F;code&gt; GPT-5.5 and sharing the same desk.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-loop-is-just-me-removed-from-the-enter-key&quot;&gt;The loop is just me, removed from the enter key&lt;&#x2F;h2&gt;
&lt;p&gt;Now flip on &lt;code&gt;executionMode: automated&lt;&#x2F;code&gt; and the harness types those commands for me, so the orchestrator hands off to plan review, plan review clears the gate and kicks the wave executor, code review fails and spawns the fix loop, and the fix loop runs again until the score finally clears. The commands are the same, the models are the same, the tokens are the same, and the work is the same, and the only thing that changed is that I am no longer the schmuck pressing enter between stages.&lt;&#x2F;p&gt;
&lt;p&gt;That is the entire appeal of autonomy, because it is so much nicer that hand-driving a five-stage gated pipeline across a dozen cycles, which feels miserable by comparison while letting it run in the background is the whole feature. But nicer is not the same as different, and what you are looking at is just the same pipeline with the human babysitter quietly removed.&lt;&#x2F;p&gt;
&lt;p&gt;Which is why the billing line feels so arbitrary from where I am standing, because the automated version and the by-hand version do identical work, and after June 15 one of them costs API rates while the other stays subscription based, with the only difference being whether my finger or a state machine pushed the button.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-it-s-a-losing-battle&quot;&gt;Why it’s a losing battle&lt;&#x2F;h2&gt;
&lt;p&gt;You cannot define programmatic by which entry point a job used when your own product line keeps knocking down the walls between those entry points. Or.. I guess you can?&lt;&#x2F;p&gt;
&lt;p&gt;Every release Anthropic ships more interactive autonomy because they have to, since that is where the competitive fight actually is, and every time &lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt; gets sharper or &lt;code&gt;&#x2F;loop&lt;&#x2F;code&gt; gets smarter they hand a person more ways to do unattended, programmatic-shaped work without ever touching the SDK. Every step they take to stay competitive widens the exact gap their billing change is trying to close, which means they are effectively bailing water into their own boat.&lt;&#x2F;p&gt;
&lt;p&gt;To actually enforce the line they would have to lobotomize the very features they are sprinting to build, telling &lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt; it can’t loop too many times and telling &lt;code&gt;&#x2F;loop&lt;&#x2F;code&gt; it can’t run too long or on a schedule, and the moment they did that the interactive product would get worse and the whole “your agent works while you sleep” pitch would die in its crib. They are not going to do that, because the autonomous experience is the product now.&lt;&#x2F;p&gt;
&lt;p&gt;So the definition is unenforceable without shooting themselves in the foot, which is exactly why I am comfortable betting my whole workflow on the subscription side of this line for as long as it exists. They can certainly move the line again, and they probably will, but they cannot move it anywhere that is both coherent and compatible with the thing they are trying to sell you.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-the-change-is-real-and-fair&quot;&gt;Where the change is real, and fair&lt;&#x2F;h2&gt;
&lt;p&gt;I don’t want to oversell the theater angle, because for plenty of people June 15 is not theater at all.&lt;&#x2F;p&gt;
&lt;p&gt;If you are running a true headless fleet, with dozens of agents in CI and scheduled jobs hammering the API around the clock and a team building an actual product on top of the Agent SDK, then you are burning real inference at real scale and metering it is completely reasonable. A subscription was never going to bankroll a startup’s production agent infrastructure, so the credit pool and the full API rates and the overflow billing are simply what compute costs for that crowd, and paying it is fair, because nobody is owed a subsidy forever.&lt;&#x2F;p&gt;
&lt;p&gt;The “mostly theater” verdict is specifically about the solo operator on one box, meaning one person and one machine running loops that serve their own work at a volume that would have fit inside the subscription’s interactive limits anyway if they had just walked it through the front door. That is me, and it is a lot of you reading this, and for us the change does not take away the ability to run loops so much as it reshuffles how we trigger them.&lt;&#x2F;p&gt;
&lt;p&gt;The reason I care is not that I would go broke without the subsidy, it is what a meter does to experimentation, because open-ended, run-it-and-see, who-knows-what-this-does iteration is the first thing that gets expensive when every speculative loop bills at full API rates. The honest move is not to pay the meter or to quit, it is to take a cold hard look at the pipeline and decide what genuinely needs Opus and what can run on Kimi 2.6 or some other cheaper model that is almost as good, because the meter turns “throw Opus at everything” into an actual question worth answering. So I stay on the subscription and run loops while the platform allows it, I downgrade my fanciest triggers to a guy pressing enter the day it doesn’t, and I rationalize my model mix with purpose instead of in a panic.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-s-being-metered-tells-you-the-motive&quot;&gt;What’s being metered tells you the motive&lt;&#x2F;h2&gt;
&lt;p&gt;I will flag this as a gut call rather than gospel, but you can read intent from where a company draws its lines, and judging by the actions this is where this particular one points.&lt;&#x2F;p&gt;
&lt;p&gt;Notice what June 15 actually meters, because it is not autonomous work, which ships on the subscription, and it is not long-running work, since &lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt; runs as long as it wants on subs. What gets metered is Claude invoked programmatically, which is the exact mode where Opus is a part I call from my own orchestrator while it sits next to GPT-5.5 and OSS models like Kimi 2.6.&lt;&#x2F;p&gt;
&lt;p&gt;That is the tell, because in my pipeline Claude is not the harness but one node in a mixed loop I own, doing the stages it is best at while a competitor’s model does the rest, which means the thing they are making expensive is precisely the setup that treats Opus as swappable, one model among several in a pipeline that somebody else controls.&lt;&#x2F;p&gt;
&lt;p&gt;Meanwhile the subsidized road is &lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt; and &lt;code&gt;&#x2F;loop&lt;&#x2F;code&gt;, which is Claude Code running as the whole loop, first-party and all-Claude and owned end to end by Anthropic, so the incentive was never “use less compute,” because if it were then their own looping system wouldn’t be included. The incentive is to get you to stop wiring Opus into your own multi-vendor Frankenstein and come live inside theirs instead.&lt;&#x2F;p&gt;
&lt;p&gt;The other half of it is even less subtle, and it is simply money, because subscriptions where the heaviest users run loops around the clock are a business that Anthropic, like OpenAI, is clearly not making money on, or barely, and metering the programmatic road is how that stops being a loss leader. Vendor lock and revenue are not two competing theories here, they are the same hand reaching for the same wallet.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-takeaway&quot;&gt;The takeaway&lt;&#x2F;h2&gt;
&lt;p&gt;The real story is not a price hike, it is that Anthropic hasn’t actually decided what a subscription means in the agent era, and they are billing by the entry point because the entry point is the last thing left that is easy to measure, given that autonomy and duration and the whole question of “is this programmatic” all turned to mush the moment they shipped &lt;code&gt;&#x2F;goal&lt;&#x2F;code&gt; and &lt;code&gt;&#x2F;loop&lt;&#x2F;code&gt; onto the subscription themselves.&lt;&#x2F;p&gt;
&lt;p&gt;So they drew a line that their own roadmap keeps tripping over, and for the funded teams at scale that line is a real and reasonable bill, while for the solo operator with an always-on box and a folder full of queued work it is mostly a reminder that the cheapest and most durable trigger ever invented is still a human typing a command and pressing enter.&lt;&#x2F;p&gt;
&lt;p&gt;I will run my loops on the subscription for as long as that remains a thing, rationalize my model mix with purpose, and type &lt;code&gt;&#x2F;skill {folder}&lt;&#x2F;code&gt; by hand the day automation gets metered. Doing the same work for the same price, and if that breaks the spirit of the change then the change should have been about the work rather than about who types the command.&lt;&#x2F;p&gt;
&lt;p&gt;None of this is me sharpening a pitchfork, because Anthropic has a business to run and a frontier to fund, and charging real money for compute is entirely their right. The point is quieter and simple, their incentives and yours are simply pulling in opposite directions. You want cheap tokens, model-agnostic workflows, and zero vendor lock, while they need pricier tokens, a single-vendor harness, and as much lock-in as the market will tolerate. That tension is the whole reason this landscape is so hard to manage from both ends, and it is why no single shiny feature should ever be mistaken for a simple gift.&lt;&#x2F;p&gt;
&lt;p&gt;So here is the thing worth remembering if you are building anything on top of the big labs. The dealer letting you sample the product, and the first hits are cheap by design. Don’t get too addicted, because sooner or later you pay for the fix. Take the carrots, because plenty of them are genuinely excellent, but don’t get fooled, because the stick is never far behind. And it’s an expensive one.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>How Curiosity Turned Into Digital Infrastructure</title>
        <published>2026-05-09T00:00:00+00:00</published>
        <updated>2026-05-09T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://orchestrateai.dev/blog/how-curiosity-turned-into-digital-infrastructure/"/>
        <id>https://orchestrateai.dev/blog/how-curiosity-turned-into-digital-infrastructure/</id>
        
        <content type="html" xml:base="https://orchestrateai.dev/blog/how-curiosity-turned-into-digital-infrastructure/">&lt;p&gt;My career did not start with IT. It started with noticing things early, paying attention to opportunities before they had a clean category, and trying to figure out whether there was something real inside them.&lt;&#x2F;p&gt;
&lt;p&gt;That is less tidy than the LinkedIn version, but it is more true. I was usually close to the signal before the market had made it obvious, and I could normally get far enough to learn the shape of the opportunity, sell something, wire something together, or become useful before anyone handed me a title for it.&lt;&#x2F;p&gt;
&lt;p&gt;The part I understand better now is that curiosity is not the same thing as conviction. I was willing to research, test, sell, and move early, but I did not always believe enough to go all in when the signal was right in front of me.&lt;&#x2F;p&gt;
&lt;p&gt;That changed at Zengar.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;i-kept-finding-things-early&quot;&gt;I kept finding things early&lt;&#x2F;h2&gt;
&lt;p&gt;Around 2010, when I was 20, I was selling e-cigarettes before almost anyone I knew had seen one in person. It was not a grand thesis about nicotine markets or consumer hardware. It was more basic than that. I saw something new, got curious, figured out where to get it, and found people who wanted it.&lt;&#x2F;p&gt;
&lt;p&gt;It was the kind of thing I tended to notice early: useful enough that people would pay attention once they saw it, early enough that most people had no frame for it yet, and still far enough ahead of normal consumer awareness that there was room to move before the market caught up.&lt;&#x2F;p&gt;
&lt;p&gt;The same thing happened again in 2015, when I was deep in Bitcoin and Ethereum research. I got far enough that I ordered a BTC miner, then cancelled it because the sellers were taking too long and difficulty was rising while I waited. In the moment, cancelling felt rational. The window was moving, the economics were changing, and I did not want late hardware showing up after the opportunity had already compressed.&lt;&#x2F;p&gt;
&lt;p&gt;Looking back, that decision is a pretty clean example of the pattern. I saw the thing early enough, understood enough to act, and still did not quite believe enough to stay in the trade.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;resourceful-is-not-the-same-as-all-in&quot;&gt;Resourceful is not the same as all in&lt;&#x2F;h2&gt;
&lt;p&gt;I have usually been able to do pretty well by being resourceful and early to the party. Find the major opportunity before everyone and their taxi driver is talking about it, learn enough to be useful, move before the market becomes obvious, and there is usually some value there if you are willing to do the unglamorous part yourself.&lt;&#x2F;p&gt;
&lt;p&gt;That works, but only up to a point, because being early without conviction turns into a long list of almosts.&lt;&#x2F;p&gt;
&lt;p&gt;Almost caught the wave, almost held the position, almost built the thing before it was obvious, almost got the big break.&lt;&#x2F;p&gt;
&lt;p&gt;I used to frame that mostly as luck, and some of it probably was, because timing is real and nobody catches every wave cleanly. But the more honest version is that I often did not believe enough to go all in. I was willing to research, test, sell, wire things together, and be useful, but I was less willing to plant my feet and say, this is the arena.&lt;&#x2F;p&gt;
&lt;p&gt;Zengar changed that.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;zengar-gave-the-curiosity-somewhere-to-go&quot;&gt;Zengar gave the curiosity somewhere to go&lt;&#x2F;h2&gt;
&lt;p&gt;When I started working at Zengar, I was finally in an environment that saw my potential before I had fully named it myself. It was small enough that the gaps were visible, serious enough that filling the right one mattered, and flexible enough that I could keep moving toward the work with the most leverage.&lt;&#x2F;p&gt;
&lt;p&gt;I came in through technical support in 2021, working with Neuroptimal users, customer relationships, and Windows support. That title was real, but it was never the whole job for long. The useful work kept appearing in the gaps between departments.&lt;&#x2F;p&gt;
&lt;p&gt;Marketing needed something. Tech needed something. Support needed something. Operations needed something. I had already worked across enough different parts of a business that I was not worried about whether the work fit a clean job description, and before Zengar I had also worked in patient care as a hospital patient transporter. That might still be the coolest job I have had. The pay was terrible. The work was not.&lt;&#x2F;p&gt;
&lt;p&gt;That mix matters because it changed how I looked at systems. I was not seeing a website, a support queue, a clinic workflow, or a marketing funnel as separate worlds. I was seeing handoffs. Every handoff had cost, delay, confusion, or opportunity inside it.&lt;&#x2F;p&gt;
&lt;p&gt;That is where digital infrastructure starts for me.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-job-became-filling-the-highest-value-gap&quot;&gt;The job became filling the highest-value gap&lt;&#x2F;h2&gt;
&lt;p&gt;At Zengar, I started leaning in harder.&lt;&#x2F;p&gt;
&lt;p&gt;Not in the vague “let me help with the computer thing” way. It was more specific than that, and usually tied to some cost, bottleneck, or process that had become normal because nobody had time to pull it apart.&lt;&#x2F;p&gt;
&lt;p&gt;You are paying $200 an hour for a freelancer to do what? Let me figure that out.&lt;&#x2F;p&gt;
&lt;p&gt;This API is costing how much? Let’s look at how we can gate those flows better.&lt;&#x2F;p&gt;
&lt;p&gt;Why is this manual? Why does this person need to copy the same data into three places? Why is the tool allowed to run when nobody is ready to act on the output?&lt;&#x2F;p&gt;
&lt;p&gt;That is the work that moved me from technical support into senior web development, and eventually into Head of Digital Infrastructure and Automations. The title caught up after the pattern was already obvious, because the job had become less about owning one tool and more about finding the place where a small technical move could remove a lot of drag.&lt;&#x2F;p&gt;
&lt;p&gt;The work was not one clean lane. WooCommerce, WordPress, Cloudflare, automations, customer support systems, operational triage, marketing surfaces, API costs, internal tooling. None of those are especially impressive in isolation, but together they become the system behind the system.&lt;&#x2F;p&gt;
&lt;p&gt;That is what I mean when I talk about digital infrastructure.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-digital-infrastructure-means-at-my-desk&quot;&gt;What digital infrastructure means at my desk&lt;&#x2F;h2&gt;
&lt;p&gt;Digital infrastructure is not a finance term to me. It is not a rack of servers either.&lt;&#x2F;p&gt;
&lt;p&gt;It is the connective tissue between the parts of a business that already exist: the website that sells, the support team that absorbs pain, the marketing system that creates demand, the operations team that has to fulfill promises, and the customer experience that exposes whether the whole thing is actually working.&lt;&#x2F;p&gt;
&lt;p&gt;In practice, that meant looking at the handoffs instead of just the tools. Sometimes the right move was replacing expensive outside work with internal capability, sometimes it was gating an API so cost matched intent, sometimes it was leaving a human approval step in place because full automation would create quiet damage, and sometimes it was choosing WordPress or WooCommerce because the boring tool already sat in the middle of the business graph.&lt;&#x2F;p&gt;
&lt;p&gt;That is also why I do not separate AI from infrastructure. AI is useful when it lands inside the system with the right controls. It is expensive noise when it floats above the work as a demo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-part-that-finally-clicked&quot;&gt;The part that finally clicked&lt;&#x2F;h2&gt;
&lt;p&gt;The difference at Zengar was not that I suddenly became curious. I had always been curious.&lt;&#x2F;p&gt;
&lt;p&gt;The difference was that curiosity finally had a place to compound. I was trusted enough to look for gaps, and close enough to the business to know which gaps mattered. That combination changed everything.&lt;&#x2F;p&gt;
&lt;p&gt;I stopped waiting for the big break to arrive as one dramatic event. The break was the environment. The work was noticing the highest-value gap, taking it seriously, and building enough trust to own the next one.&lt;&#x2F;p&gt;
&lt;p&gt;That is the career arc I actually recognize.&lt;&#x2F;p&gt;
&lt;p&gt;Not “I started in IT and escaped it.”&lt;&#x2F;p&gt;
&lt;p&gt;More like: I kept following useful signals, almost went all in a few times, then finally landed somewhere that made going all in feel obvious.&lt;&#x2F;p&gt;
&lt;p&gt;That is where I am today.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>I Cut Tax Prep From 3 Hours To 30 Minutes With A Local LLM</title>
        <published>2026-05-05T00:00:00+00:00</published>
        <updated>2026-05-05T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://orchestrateai.dev/blog/i-prepared-my-taxes-with-a-local-llm/"/>
        <id>https://orchestrateai.dev/blog/i-prepared-my-taxes-with-a-local-llm/</id>
        
        <content type="html" xml:base="https://orchestrateai.dev/blog/i-prepared-my-taxes-with-a-local-llm/">&lt;p&gt;I did not use AI to do my taxes. I used a local LLM to do the boring half of tax prep: the part that turns a folder of 150 randomly named expense PDFs into a package my accountant can actually read. Most marketing around ai tax preparation wants to sell you a one-click filing experience, but that is not where local models earn their keep in 2026. They earn it on unglamorous batch work that sits between “I have receipts” and “my accountant has clean numbers.”&lt;&#x2F;p&gt;
&lt;p&gt;This year, that batch took me 30 minutes. Last year, the same volume of invoices was something like 3 hours of click, rename, tag, and spreadsheet drudgery. The frontier models did not close that gap; a local Qwen did.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-was-actually-running&quot;&gt;What was actually running&lt;&#x2F;h2&gt;
&lt;p&gt;Opus 4.7 (high) and GPT-5.5 (high) were eight hours into an autonomous run on something completely unrelated: migrating my desktop orchestrator from a failed Rust + iced GUI attempt to a Tauri build I’d finally conceded to. They were busy, my Claude and Codex subscriptions were getting maxed out, and my judgment was already tied up supervising that run.&lt;&#x2F;p&gt;
&lt;p&gt;But my 3090 was sitting idle.&lt;&#x2F;p&gt;
&lt;p&gt;So I pointed OpenCode at a local Qwen checkpoint, &lt;code&gt;Qwen3.6-35B-A3B-UD-Q4_K_S.gguf&lt;&#x2F;code&gt;, the Unsloth dynamic Q4_K_S quant, and handed it the invoices folder. A3B means it’s a mixture-of-experts: 35B total parameters with only about 3B active per token, which is exactly why it ran fast on a single consumer GPU while my paid models were doing real work elsewhere. The same workflow runs fine on Qwen 3.5 35B if that’s what you already have on disk; the 3.x family has been the most usable local checkpoint for this kind of structured batch work for a while now.&lt;&#x2F;p&gt;
&lt;p&gt;Local models have gotten to the point where they can handle almost all tasks outside of complex coding, and they can definitely bring me from “I have my expense invoices” to “clean numbers and organized files for my accountant.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-actual-workflow&quot;&gt;The actual workflow&lt;&#x2F;h2&gt;
&lt;p&gt;Four steps, all handled through Python tool use:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Read.&lt;&#x2F;strong&gt; All 150 invoices were text-based PDFs, not scans, so no OCR was needed; straight text extraction was enough.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Rename.&lt;&#x2F;strong&gt; Convert filenames like &lt;code&gt;INV-7741-2.pdf&lt;&#x2F;code&gt; into &lt;code&gt;2026-03-14-acme-supplies.pdf&lt;&#x2F;code&gt; using the invoice date and vendor parsed from the document body.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Extract.&lt;&#x2F;strong&gt; Pull subtotal, sales tax, and total off each invoice into a structured row.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Organize.&lt;&#x2F;strong&gt; Drop renamed files into per-vendor folders and produce a single &lt;code&gt;xlsx&lt;&#x2F;code&gt; I could hand to my accountant.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;OpenCode drove the loop, the model wrote the small Python snippets it needed, and I watched context length and stepped in when it looked close to drifting.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-local-for-this-specifically&quot;&gt;Why local for this specifically&lt;&#x2F;h2&gt;
&lt;p&gt;Privacy was the obvious reason. Every invoice in that folder had a vendor name, an amount, a payment method, and a handful had bank-routing detail printed on payment confirmations. That is not data I want passing through a cloud API’s retention window just to save thirty seconds of orchestration. When people ask “can ChatGPT help with taxes,” the honest answer is “yes, and you’ll be uploading every line of your business spending to a third party in the process.” For accountant-package prep, that trade is a clear no for me.&lt;&#x2F;p&gt;
&lt;p&gt;A second, smaller reason was cost: I did not want to spend frontier credits on text extraction grunt work while two paid models were already eight hours into an autonomous job. Local Qwen on a 3090 is functionally free for this shape of task once the GPU is paid for. Privacy is the reason I’d recommend other operators do this; cost was the nudge that pushed me to do it that night.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-went-wrong&quot;&gt;What went wrong&lt;&#x2F;h2&gt;
&lt;p&gt;Honestly, very little went wrong. Nothing broke structurally. The only thing I had to actively manage was context, because you cannot shove 150 PDFs of varying length into a single window and hope. I broke the batch into chunks, summarized state between them, and made sure the model wasn’t carrying noise from one invoice into the next. That is operator hygiene, not model failure.&lt;&#x2F;p&gt;
&lt;p&gt;I still spot-checked every extracted row against the source PDF before exporting the xlsx, not because I caught the model lying, because I didn’t on this run, but because the cost of a wrong total reaching my accountant is asymmetric. Two minutes of review against a real risk of garbage-in-garbage-out is an obvious trade.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-would-never-trust-it-with&quot;&gt;What I would never trust it with&lt;&#x2F;h2&gt;
&lt;p&gt;I would never let any model, local or frontier, make a filing decision, classify a deductible, choose between business and personal allocation, or write anything that sounds like tax advice. That is what I pay an accountant for. The pattern that works for me is strictly upstream: clean inputs, organized files, summed totals. After that, a human with credentials takes over, and the model never sees the return.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-goes-next-year&quot;&gt;Where this goes next year&lt;&#x2F;h2&gt;
&lt;p&gt;The slow human step right now is the part before the model starts: logging into vendor portals, pulling email PDFs, sorting receipts. Next year, I expect to push the local Qwen further. By then, I expect to be comfortable enough with browser-use agents to pull invoices straight from portals and email, so the model can do the fetching as well as the structuring. That would compress another few hours out of the workflow before any extraction starts.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-takeaway&quot;&gt;The takeaway&lt;&#x2F;h2&gt;
&lt;p&gt;A local Qwen 35B did not do my taxes. It cleaned the runway so my accountant could file from a finished package instead of a folder of garbage filenames. That is a smaller claim than most AI tax prep headlines make, and a more honest one. The frontier models were doing the eight-hour autonomous work elsewhere; the local model handled the 30-minute batch on the side. Both pulled weight, and neither one was allowed near my return.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Best LLM For Coding As Of May 2026: Opus 4.7 And GPT-5.5, The Copilot And The Cracked Dev</title>
        <published>2026-05-05T00:00:00+00:00</published>
        <updated>2026-05-05T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://orchestrateai.dev/blog/opus-and-gpt-5-the-copilot-and-the-cracked-dev/"/>
        <id>https://orchestrateai.dev/blog/opus-and-gpt-5-the-copilot-and-the-cracked-dev/</id>
        
        <content type="html" xml:base="https://orchestrateai.dev/blog/opus-and-gpt-5-the-copilot-and-the-cracked-dev/">&lt;p&gt;For months I brute-forced complex problems with a single agent. Lovable experiments, an experimental RAG pipeline in Claude Code, attempts to chain dynamic workflow agents with API access. I kept telling myself the models weren’t good enough yet. The truth was simpler. I hadn’t built a system that allowed them to develop production grade work.&lt;&#x2F;p&gt;
&lt;p&gt;The system that finally worked is not complicated. There is a loop with seven stages: brainstorm, initiative, plan, plan review, execution, code review, fixer. I run more than one frontier model inside that chain so they keep each other honest. Once I started working that way, the difference showed up fast.&lt;&#x2F;p&gt;
&lt;p&gt;That loop now runs many of my daily tasks at the company I work for, organizes and documents my work cycles, and handles every side project I touch. It is not a coding tool. It runs my newsletters, video generation, SOPs, and documentation. Most of my work can be solved programmatically, and the system works for just about all of it. From inside it, “which LLM is best for coding” is the wrong approach. The real question is which models fit where in a structured agent-loop hierarchy. That is where Opus 4.7 and GPT-5.5 Codex split into clearly different jobs.&lt;&#x2F;p&gt;
&lt;p&gt;I use Opus as the architect. It is the model I want on bigger picture work, brainstorming, and high level planning. I use GPT-5.5 Codex as the cracked dev who lives in his mom’s basement and gets shit done. Both are extremely capable. The differences start to show the moment you throw a complex initiative at them. If “best llm for coding” means “wins a one-prompt benchmark,” the question is uninteresting. If it means “still pulling weight on cycle four of a hard problem,” the answer depends on which job you are asking about. The rest of this post is how I split those jobs.&lt;&#x2F;p&gt;
&lt;p&gt;This split is fresh. The jump from Opus 4.5 and Codex 5.3 to Opus 4.7 and GPT-5.5 Codex was a real gear change, not a minor version bump. The roles I describe below would not have been this clean six months ago.&lt;&#x2F;p&gt;
&lt;p&gt;The real fork is not claude code vs codex or claude vs gpt. It is which seat each one is sitting in when the work gets hard. A single-model loop drifts. Two models with clear ownership at each stage do not.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-loop&quot;&gt;The Loop&lt;&#x2F;h2&gt;
&lt;p&gt;Each stage has one owner. The first three are Opus. The last four are Codex. The handoff happens at the plan boundary, where the architect hands the spec to the builder and gets out of the way.&lt;&#x2F;p&gt;
&lt;pre class=&quot;ascii-diagram&quot;&gt;&lt;code&gt;
+--------------------------------------+
| 1. BRAINSTORM            Opus 4.7    |
| shape the problem                    |
+--------------------------------------+
                  |
                  v
+--------------------------------------+
| 2. INITIATIVE            Opus 4.7    |
| frame scope and goals                |
+--------------------------------------+
                  |
                  v
+--------------------------------------+
| 3. PLAN                  Opus 4.7    |
| contracts, waves, ACs                |
+--------------------------------------+
                  |
                  v
+--------------------------------------+
| 4. PLAN REVIEW       GPT-5.5 Codex   |
| stress test the plan                 |
+--------------------------------------+
                  |
                  v
+--------------------------------------+
| 5. EXECUTION         GPT-5.5 Codex   |
| implement + tests                    |
+--------------------------------------+
                  |
                  v
+--------------------------------------+
| 6. CODE REVIEW       GPT-5.5 Codex   |
| find regressions                     |
+--------------------------------------+
                  |
                  v
+--------------------------------------+
| 7. FIXER             GPT-5.5 Codex   |
| repair, rerun checks                 |
+--------------------------------------+
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;where-opus-wins&quot;&gt;Where Opus wins&lt;&#x2F;h2&gt;
&lt;p&gt;Opus is the model I want when the work is shape-finding. Brainstorm sessions where I am not yet sure what the right system looks like. Initiatives where the question is “what are we building, and what are we explicitly choosing not to build.” ADR sessions where two reasonable options exist and someone has to commit to one.&lt;&#x2F;p&gt;
&lt;p&gt;I have watched Codex botch design jobs that Opus then walked through cleanly. Codex is not dumb. Codex just wants to start typing. Opus is comfortable sitting with the question longer, and that is the trait that matters at the front of the loop.&lt;&#x2F;p&gt;
&lt;p&gt;The same instinct shows up on exploratory ADRs. Opus pushes back. It asks the second question. It writes the version of the doc that survives someone else reading it next quarter. That is the kind of work I do not want to lose to “ship faster.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-codex-wins&quot;&gt;Where Codex wins&lt;&#x2F;h2&gt;
&lt;p&gt;Codex wins when the work needs miles. Implementation, tests, verification, fix loops, all of it. Hand it a clear contract and it goes from spec to merged change without complaint.&lt;&#x2F;p&gt;
&lt;p&gt;The place I notice it most is debugging. When I am stuck on a complex race condition, a weird state bug, or a regression that hides on the third reproduction, Codex is the one who finds it. It will read more code than Opus, run more probes than Opus, and re-attempt more times before throwing in the towel.&lt;&#x2F;p&gt;
&lt;p&gt;It also does not sulk. If fix one fails, it tries fix two with the same energy. That sounds small. In a long debugging session it is the difference between landing the fix and walking away tired.&lt;&#x2F;p&gt;
&lt;p&gt;If “which llm is best for coding” means raw shipping velocity, Codex is the obvious answer. It turns minutes into merged code, and it does it cycle after cycle.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-each-one-fails&quot;&gt;Where each one fails&lt;&#x2F;h2&gt;
&lt;p&gt;Opus is really great for almost all tasks. It just sometimes gets lost in big codebases and produces solutions that need a refinement pass after. The architect impulse is what makes Opus strong at the front of the loop and clumsy at the back. Opus can handle a five-line change just fine; it just needs to be contained, because it will sometimes take initiative to rework code you did not ask it to touch.&lt;&#x2F;p&gt;
&lt;p&gt;Codex is also great. It is just less of a free thinker. Big picture work, exploratory design, ambiguous prompts. That is not its lane. What it is, is the ultimate workhorse for projects that are well defined. Hand it a clear task inside a clean structure and it rarely misses the mark and rarely needs a second pass.&lt;&#x2F;p&gt;
&lt;p&gt;This is why I do not believe in single-model evangelism. Both models are excellent. Both are wrong in predictable ways, and the loop exists to catch each one with the other.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-verdict&quot;&gt;The verdict&lt;&#x2F;h2&gt;
&lt;p&gt;If you make me answer “best llm for coding” or “best ai model for coding” in one line, here it is. Use GPT-5.5 Codex when you need raw implementation velocity. Use Opus 4.7 when you need bigger picture thinking and real design work. Stop pretending those are the same job.&lt;&#x2F;p&gt;
&lt;p&gt;If you cannot support role splitting yet, start where your pain is loudest.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Pain is design quality, ambiguous scope, or unclear direction: start with Opus.&lt;&#x2F;li&gt;
&lt;li&gt;Pain is throughput, debugging, or fix-loop drag: start with Codex.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Then graduate to a two-model loop. That is where the real lift shows up, because each model gets to do the work it is best at and the other one catches its blind spots.&lt;&#x2F;p&gt;
&lt;p&gt;A note on the search side. People still type &lt;code&gt;opus 4.6 vs gpt 5.4&lt;&#x2F;code&gt; even though the conversation has moved to 4.7 and 5.5, and I still watch &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;openrouter.ai&#x2F;rankings&quot;&gt;OpenRouter rankings&lt;&#x2F;a&gt; because market behavior matters. That search behavior is not noise. It tells me what people actually want: not ideology, just role reliability.&lt;&#x2F;p&gt;
&lt;p&gt;So my committed take as of May 2026: Opus is the architect. GPT-5.5 Codex is the cracked dev who lives in his mom’s basement and gets shit done. Put them in a disciplined loop. Let each do the job it is best at. Judge outcomes by what ships without waking you up tomorrow.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
