Zeikar’s Lab

How My Agent-Team Revise Loop Earned a 300-Line Protocol

2026-05-22T00:00:00+00:00

This is about the autonomous revise loops in hyperclaude (code) — a Claude Code plugin built around a deliberate split: Claude builds, Codex critiques. Two of its skills, hyper-plan-loop and hyper-implement-loop, take a task and run plan → review → revise (or implement → review → fix) on their own, looping until Codex returns no blocking findings or a hard cap is hit. A single Claude-side teammate stays alive across rounds; Codex stays the reviewer.

If you sketch that on a whiteboard, it’s twenty lines:

spawn teammate
loop:
  reply = teammate.produce()
  result = codex.review(reply)
  if result.clean: break
  send(teammate, result.findings)
teardown

The actual SKILL.md plus shared reference is north of 400 lines. Almost none of that growth was planned — it was bugs found by dogfooding that prompt-only discipline could not survive. This post walks through five of them, roughly in the order they bit me.

The naive loop has more failure modes than lines

Two properties of Claude Code’s experimental agent-teams runtime matter for everything below.

A teammate kept idle between turns keeps its process and full context alive. That is the entire reason these loops exist — re-spawning a fresh planner each round would lose the context the planner just accumulated. Persistent teammate, bounded review loop.

The lead only acts on deliveries — a teammate reply, an idle notification, a bridge result. There is no poll/wait primitive. If the lead misclassifies a delivery, the loop hangs or self-confuses; it cannot just “check again.”

Both properties cut both ways. Persistence lets useful context survive across rounds; it also lets stale messages from prior rounds survive. The delivery-only model means every wake is load-bearing; it also means a misrouted wake has nowhere else to go. Every failure below sits at that intersection.

#1 — The plain-text reply is invisible

The first version of the planner ended its turn like this:

WROTE: .hyperclaude/plans/20260522-1430-foo.md

— printed as plain text in its own assistant turn, followed by going idle. The lead is supposed to read WROTE: … and proceed. Easy.

Wrong. When a teammate goes idle, the lead receives a payload-less wake: {type: "idle_notification", ...}. The teammate’s plain text is not in there. The notification confirms idle happened; it carries no body. Whatever the teammate printed to its own transcript stays in the teammate’s transcript — invisible to the lead’s mailbox.

The fix is structural: replies must travel by SendMessage. The planner’s spawn prompt now says, in many words, “first call SendMessage({to: 'team-lead', message: 'WROTE: '}), then idle.” Plain assistant text is allowed but ignored. The SendMessage call is the contract.

That alone wasn’t enough to make idle handling sane. Replies-via-SendMessage and idle-as-wake-signal are independent patterns; both need their own rules. Everything below #1 is the idle half.

#2 — During a five-minute Codex review, anything can show up in the mailbox

plan-review runs in a fresh codex exec subprocess and can take five to ten minutes on a non-trivial plan. The lead spends those minutes blocked on a single Bash call.

During that window, the teammate is idle, but the messaging runtime is not. Two kinds of garbage kept arriving:

A re-emit of the previous round’s reply. Cause varied — sometimes a RESEND:-style nag pattern crept into the spawn prompt, sometimes a teammate woken from idle by an earlier corrective re-sent its prior reply. The effect was the same: when the lead came back from Codex, the mailbox held a WROTE: … that looked like the answer to the round it just finished, but was actually the answer to two rounds ago.

A stale idle_notification from the prior round. The teammate finished its reply and went idle. The idle wake was queued. The lead spent five minutes on Codex. When the lead resumed and then sent the next solicitation, the queued idle could still arrive AFTER the solicitation went out — landing as if it were the teammate’s response to the new solicitation, which it cannot possibly be.

Same bug family: a delivery from round N showing up in round N+1’s slot. Without something to tell them apart, the loop accepts garbage as success — or escalates a “missing reply” corrective because the only delivery it can see is stale.

The fix for the first flavor is a request-id counter. Every solicitation the lead sends carries a monotonically increasing integer; the teammate echoes it verbatim in WROTE: . The lead is the sole id source. An incoming reply with id < expected is, by definition, stale — ignore content, stay waiting for the real reply. An id greater than expected is impossible (lead-owned) and is a protocol violation — teardown and stop.

The counter is one integer. The pattern it fixes — “during a long blocking step, deliveries from prior rounds can race the current one” — outlives this specific loop.

#3 — `idle.timestamp` vs. `solicit_sent_at`, the 1-round-lag race

The id counter handles WROTE: replies. It doesn’t handle idle notifications, which carry no id.

In dogfooding, this race surfaced repeatedly:

Round N: teammate replies, then idles. Reply is delivered first; idle is queued.
Lead accepts the reply, runs Codex (five-plus minutes).
Codex returns Major findings. Lead mints round N+1, sends.
Now the queued idle from step 1 arrives.
Lead is awaiting round N+1’s reply and sees an idle. Naive logic: “teammate idled without replying — corrective round-trip.”
Lead mints a fresh-id corrective, sends. Teammate, still working on round N+1, now has two outstanding solicitations queued in its mailbox.
Teammate replies to round N+1 first. From the lead’s view, the id is for the original round N+1, not the corrective — so it looks stale. Lead mints yet another corrective.
The loop never converges. Each round produces an idle that arrives one round late; each one kicks off another corrective.

The 1-round-lag race. It chewed an afternoon of dogfooding before the cause was clear.

The fix is a timestamp guard. Right before each SendMessage, the lead captures solicit_sent_at via Bash date -u +%FT%TZ. The idle_notification payload carries the teammate’s idle.timestamp — wall-clock at which the teammate actually went idle. If idle.timestamp < solicit_sent_at, the idle cannot possibly be a response to the current solicit. Ignore silently; stay waiting.

That guard sits in the protocol for a reason that generalizes: in any persistent-teammate loop with a long-running blocking step in between rounds, you need timestamps anchored at the lead’s send, not at the teammate’s reply. Otherwise prior-round deliveries will impersonate the current round’s.

#4 — `assistant-turn-start` is not a substitute for `date -u` right before send

The seductive shortcut here is to use the lead’s current turn-start time as solicit_sent_at. It’s “free” — already in the context — and it’s almost right.

It’s not right. A single lead turn can:

Start at wall-clock T.
Receive Codex review JSON over five minutes.
Mint a fresh id.
Send the next solicitation, at wall-clock T + 5min.

A queued idle from a prior round with idle.timestamp = T + 2min slots neatly between turn-start and actual-send. Comparing to T says “this idle came after my round started, so it must be a response to my new solicit.” It can’t be — the SendMessage for that solicit hadn’t happened yet.

The protocol requires date -u as the last tool call before SendMessage, every time. The spec wording — “the field-definition rule above is binding — assistant-turn start is NOT valid” — exists because someone (me) tried to optimize the Bash call away and reintroduced the race.

This is the smallest concrete fix in the protocol. It is also the one that took me longest to believe.

#5 — `expected_request_id == null` collapse

By the time you have a request-id counter and a timestamp guard, you have two state variables that interact: expected_request_id (the id the lead is waiting on, or null) and awaiting_reply (boolean). The merge temptation is real — isn’t expected_request_id == null the same as awaiting_reply == false?

Same at the value level. Not the same at the classification level. There are two phases:

Phase 1 (awaiting_reply == false): the lead is not waiting on anything. A WROTE: arriving in this phase is, by definition, stale or duplicate — there is no current id to match. Compare to request_id_counter (the last id ever minted), not to expected_request_id (which is null). If you collapse the phases, you’re either comparing against null mid-classification or you’re feeding a stale duplicate into the same accept logic that handles fresh replies — and silently treating each one as either a violation or a success depending on which null-check you put first.

Phase 2 (awaiting_reply == true): the lead is specifically waiting on expected_request_id. Now reqid < expected_request_id is a stale leftover (ignore + stay waiting), reqid == expected_request_id is the candidate genuine reply (run the accept rule), reqid > expected_request_id is impossible (teardown).

The two phases route deliveries through different rules — and one of those rules is a silent ignore. Skip the phase split and you either keep escalating on stale duplicates (the loop never converges) or you accept stale ones as the answer (the loop converges on the wrong content).

The lesson generalizes beyond this loop: when you have a state machine with a busy state and an idle state, and inputs arrive in both, the routing logic for each state has to be authored separately. The “they look the same, let’s merge” instinct is exactly what makes long-running async protocols fragile.

What this protocol actually is

Every section above buys back one specific dogfooded failure. The reviewer (Codex) is long-running. The teammate is persistent. The runtime delivers messages and idle wakes; the lead routes them. The protocol is the routing table, and every entry exists because routing it any other way produced a bug I watched happen.

That is also what stops the protocol from being elegant. It’s the precipitate of failures, not a coherent design from first principles. Two cross-loop sections — §E (the request-id state machine) and §B (unsolicited messages) — collect the parts shared by plan-loop and implement-loop. Each loop’s local failure-protocol.md then binds the loop-specific bits: reply-token shape (WROTE: vs. DONE: ), accept regex, post-acceptance validation.

If I were starting over: the request-id counter, SendMessage transport for replies, and the solicit_sent_at timestamp eat about 80% of the failure surface. The rest is fence-posting — phase splits, unsolicited-message backstops, teardown ordering — the kind of thing you only realize is necessary after the obvious version breaks at 2 a.m.

The general shape, lifted out of hyperclaude: persistent teammates plus a long-running reviewer create races that prompt-only discipline cannot fix. The lead has to own ids, own timestamps, and treat every delivery as a router input — not an answer to whatever it sent most recently. Once that mental model is in place, the rest is bookkeeping.

내 agent-team 리바이즈 루프가 300줄짜리 프로토콜을 갖게 된 이유

2026-05-22T00:00:00+00:00

hyperclaude (코드)의 autonomous revise loop 이야기다. Claude는 만들고 Codex는 비평한다는 분업 위에 세운 Claude Code 플러그인이고, 그중 두 스킬 — hyper-plan-loop, hyper-implement-loop — 가 태스크 하나를 받아서 plan → review → revise (또는 implement → review → fix)를 자기들끼리 돌린다. Codex가 더 이상 블로커를 안 내거나 hard cap에 닿을 때까지. Claude 쪽 teammate 하나는 라운드 사이에 계속 살아 있고, Codex는 계속 리뷰어다.

화이트보드에 그리면 20줄짜리다:

spawn teammate
loop:
  reply = teammate.produce()
  result = codex.review(reply)
  if result.clean: break
  send(teammate, result.findings)
teardown

실제 SKILL.md + 공유 reference는 400줄을 넘는다. 그 분량은 거의 다 미리 설계한 게 아니라, 도그푸딩하다 발견한 버그 중에 prompt만으로는 못 막는 것들 때문에 자라났다. 이 글은 그중 다섯 개를, 대체로 나를 물어뜯은 순서대로 풀어본다.

나이브한 루프는 줄 수보다 실패 모드가 더 많다

Claude Code의 experimental agent-teams 런타임에서 아래 모든 얘기에 깔린 두 가지 속성:

라운드 사이에 idle 상태로 둔 teammate는 프로세스와 전체 컨텍스트가 살아 있다. 이 루프들이 존재하는 이유 자체가 이거다 — 매 라운드마다 fresh planner를 새로 spawn하면 planner가 막 쌓은 컨텍스트를 잃는다. Persistent teammate, bounded review loop.

Lead는 delivery에만 반응한다 — teammate의 메시지, idle notification, bridge 결과. Poll/wait 프리미티브가 없다. Lead가 delivery를 잘못 분류하면 루프는 멈추거나 자가당착에 빠진다. “다시 확인해 봐” 같은 옵션이 없다.

두 속성 모두 양날의 칼이다. Persistence가 라운드 간 컨텍스트를 살리는 동시에, 이전 라운드의 stale 메시지도 살린다. Delivery-only 모델 덕분에 wake 하나하나가 의미를 갖지만, 잘못 라우팅된 wake는 갈 곳이 없다. 아래 모든 실패 모드가 이 교차점에 있다.

#1 — plain-text 답장은 보이지 않는다

Planner 초기 버전은 자기 턴을 이렇게 끝냈다:

WROTE: .hyperclaude/plans/20260522-1430-foo.md

자기 assistant 턴에 plain text로 출력하고 idle. Lead는 그 WROTE: …를 읽고 진행하면 된다. 쉽다.

틀렸다. Teammate가 idle로 들어가면 lead는 payload 없는 wake를 받는다: {type: "idle_notification", ...}. Teammate의 plain text는 그 안에 들어 있지 않다. Idle이 발생했다는 사실만 들어 있고, body는 없다. Teammate가 자기 transcript에 출력한 건 teammate의 transcript에 남고, lead의 mailbox에는 안 보인다.

수정은 구조적이다. 답장은 반드시 SendMessage로 보내야 한다. Planner의 spawn prompt는 이제 길게 적혀 있다 — “먼저 SendMessage({to: 'team-lead', message: 'WROTE: '})를 호출한 다음에 idle로 들어가라.” Plain assistant text는 허용되지만 무시된다. SendMessage 호출이 contract다.

이것만으로는 idle 처리가 멀쩡해지지 않는다. “답장은 SendMessage로”, “idle은 wake signal”은 독립된 패턴이고 각자 자기 규칙이 필요하다. #1 이후는 전부 idle 쪽 얘기다.

#2 — Codex 리뷰가 5분 도는 동안엔 mailbox에 뭐든지 들어올 수 있다

plan-review는 fresh codex exec 서브프로세스에서 돌고, 평범한 plan에서도 5~10분 걸린다. 그 시간 동안 lead는 Bash 호출 하나에 블로킹된다.

그 윈도우 동안 teammate는 idle이지만, 메시지 런타임은 idle이 아니다. 두 종류의 쓰레기가 들어왔다:

이전 라운드 답장의 재전송. 원인은 다양했다 — RESEND: 같은 nag 패턴이 spawn prompt에 슬쩍 끼어든 적도 있고, 직전 corrective 때 idle에서 깨어난 teammate가 옛 답장을 다시 보낸 적도 있다. 결과는 똑같다. Codex에서 돌아온 lead의 mailbox에 WROTE: …가 와 있는데, 방금 끝낸 라운드의 답장처럼 보이지만 실제로는 두 라운드 전의 답장이다.

이전 라운드의 idle_notification이 늦게 도착. Teammate가 답장을 보내고 idle로 들어갔다. Idle wake가 큐잉됐다. Lead는 Codex로 5분을 보냈다. Lead가 재개해서 그다음 솔리시테이션을 보냈더니, 큐잉돼 있던 그 idle이 솔리시테이션 이후에 도착했다. 마치 새 솔리시테이션에 대한 응답인 것처럼 — 그럴 수가 없는데도.

같은 버그 패밀리다. 라운드 N의 delivery가 라운드 N+1 자리에 나타나는 거. 둘을 구별할 장치가 없으면, 루프는 쓰레기를 success로 받거나 — 더 나쁘게는 — 보이는 delivery가 stale뿐이라서 “답장 없음” corrective를 escalate한다.

첫 번째 종류의 수정은 request-id 카운터다. Lead가 보내는 모든 솔리시테이션에 monotonic increasing integer를 붙이고, teammate는 WROTE: 에 그 숫자를 그대로 echo한다. ID 발급은 lead 단독이다. 들어온 답장의 id가 기다리는 id보다 작으면 정의상 stale이다 — 내용 무시, 진짜 답장 계속 기다림. 기대값보다 큰 id는 lead 단독 발급이라 불가능하다 — 프로토콜 위반이니 teardown 후 stop.

카운터는 integer 하나다. 이게 막는 패턴 — “긴 블로킹 단계 동안 이전 라운드의 delivery가 현재 라운드와 race할 수 있다” — 은 이 루프 너머에서도 유효하다.

#3 — `idle.timestamp` vs. `solicit_sent_at`, 1-round-lag race

ID 카운터는 WROTE: 답장을 처리한다. Idle notification은 id를 안 들고 다니니까 못 처리한다.

도그푸딩 중에 이 race가 반복해서 떴다:

라운드 N: teammate가 답장 보내고 idle. 답장이 먼저 전달되고, idle은 큐잉.
Lead가 답장 accept, Codex 리뷰 시작 (5분+).
Codex가 Major findings 반환. Lead가 라운드 N+1 mint, 송신.
이제서야 step 1에서 큐잉된 idle이 도착.
Lead는 라운드 N+1의 답장을 기다리는 중인데 idle이 보인다. 나이브한 로직: “Teammate가 답장도 없이 idle로 들어갔네 — corrective 보내야겠다.”
Lead가 fresh-id corrective mint해서 송신. Teammate는 여전히 라운드 N+1 작업 중인데 mailbox에 솔리시테이션이 두 개 쌓였다.
Teammate가 라운드 N+1에 먼저 답장. Lead가 보기엔 이건 원래 라운드 N+1의 id지 corrective의 id가 아니다 — 그래서 stale로 보인다. Lead가 또 다른 corrective mint.
루프가 수렴하지 않는다. 매 라운드마다 idle이 한 라운드 늦게 도착하고, 그때마다 새 corrective가 시작된다.

이게 1-round-lag race다. 원인 파악하기 전까지 도그푸딩 오후 하나가 통째로 날아갔다.

수정은 timestamp guard다. Lead는 매 SendMessage 직전에 Bash date -u +%FT%TZ로 solicit_sent_at을 캡처한다. idle_notification payload는 teammate의 idle.timestamp를 가져온다 — teammate가 실제로 idle로 들어간 wall-clock. idle.timestamp < solicit_sent_at이면 이 idle은 현재 솔리시테이션의 응답일 수가 없다. Silently 무시, 계속 대기.

이 guard가 프로토콜에 들어간 이유는 일반화가 된다: persistent teammate + 라운드 사이 long-running blocking step인 모든 루프에서는 lead의 send에 anchor된 timestamp가 필요하다. Teammate 답장 기준이 아니라. 안 그러면 이전 라운드 delivery가 현재 라운드를 사칭한다.

#4 — `assistant-turn-start`는 send 직전 `date -u`의 대용품이 아니다

여기서 솔깃한 단축경로는 lead의 현재 turn-start 시각을 solicit_sent_at으로 쓰는 거다. “공짜다” — 이미 컨텍스트에 있고 — 거의 맞다.

거의 맞지, 맞는 게 아니다. Lead 한 턴은 이럴 수 있다:

Wall-clock T에 시작.
5분 동안 Codex review JSON 받음.
Fresh id mint.
Wall-clock T + 5min에 다음 솔리시테이션 송신.

이전 라운드의 큐잉된 idle이 idle.timestamp = T + 2min이라면 turn-start와 actual-send 사이에 정확히 끼인다. T랑 비교하면 “이 idle은 내 라운드 시작 후에 왔으니까 새 솔리시테이션 응답이군” — 그럴 수가 없다. 그 솔리시테이션의 SendMessage는 아직 일어나지도 않았다.

프로토콜은 date -u를 SendMessage 직전의 마지막 tool call로 강제한다. 매번. 스펙 문구 — “the field-definition rule above is binding — assistant-turn start is NOT valid” — 가 들어간 건, 누군가(나) Bash 호출을 최적화로 빼버렸다가 race를 다시 끌어들였기 때문이다.

프로토콜에서 가장 작은 구체적 수정이다. 그리고 내가 가장 늦게 믿게 된 수정이기도 하다.

#5 — `expected_request_id == null` collapse

Request-id 카운터와 timestamp guard가 자리잡으면 상호작용하는 state 변수가 둘 생긴다: expected_request_id (lead가 기다리는 id, 안 기다리면 null)와 awaiting_reply (boolean). 합치고 싶어진다 — expected_request_id == null이 곧 awaiting_reply == false 아닌가?

값 레벨에서는 같다. 분류 레벨에서는 안 같다. Phase가 둘이다:

Phase 1 (awaiting_reply == false): lead는 아무것도 안 기다린다. 이 phase에 WROTE:가 들어오면 정의상 stale 또는 중복이다 — 매칭할 현재 id가 없다. request_id_counter (지금까지 mint된 마지막 id)에 비교해야 하지, expected_request_id (지금 null)랑 비교하면 안 된다. Phase를 합치면 분류 도중에 null이랑 비교하거나, stale duplicate를 fresh reply랑 같은 accept 로직에 던지게 된다 — null check 순서에 따라 각각이 silently violation이 되거나 success가 된다.

Phase 2 (awaiting_reply == true): lead는 정확히 expected_request_id를 기다린다. 이제 reqid < expected_request_id는 stale leftover (무시 + 계속 대기), reqid == expected_request_id는 candidate genuine reply (accept rule 실행), reqid > expected_request_id는 불가능 (teardown)이다.

두 phase가 delivery를 서로 다른 규칙으로 라우팅하고, 그중 하나는 silent ignore다. Phase 분리를 건너뛰면 stale duplicate에 corrective를 계속 escalate하거나 (루프 안 수렴), stale을 답장으로 받아들이거나 (잘못된 콘텐츠로 수렴)다.

이 교훈도 이 루프 너머로 일반화된다: busy state와 idle state가 있고 둘 다에 input이 들어오는 state machine에서, 두 state의 라우팅 로직은 따로 작성해야 한다. “둘이 똑같아 보이는데 합치자” 본능이 long-running async protocol을 fragile하게 만드는 정확한 원인이다.

이 프로토콜의 정체

위 다섯 섹션 하나하나가 도그푸드된 실패 하나를 사 온다. Reviewer (Codex)는 long-running이다. Teammate는 persistent다. 런타임이 메시지와 idle wake를 deliver하고, lead가 그걸 라우팅한다. 프로토콜은 그 라우팅 테이블이고, 모든 엔트리는 다르게 라우팅하면 발생하는 버그를 내가 실제로 봤기 때문에 있다.

그게 동시에 이 프로토콜이 우아하지 않은 이유다. 첫 원칙으로부터의 일관된 디자인이 아니라 실패의 침전물이다. Cross-loop 섹션 두 개 — §E (request-id state machine)와 §B (unsolicited messages) — 가 plan-loop과 implement-loop이 공유하는 부분을 모으고, 각 루프의 로컬 failure-protocol.md가 loop-specific bit를 묶는다: reply-token 형태 (WROTE: vs. DONE: ), accept regex, post-acceptance validation.

다시 짠다면: request-id 카운터, 답장의 SendMessage 전송, solicit_sent_at timestamp — 이 셋이 실패 면적의 80%를 먹는다. 나머지는 fence-posting이다 — phase 분리, unsolicited-message backstop, teardown 순서 — 명백한 버전이 새벽 2시에 깨진 다음에야 필요하다는 걸 깨닫게 되는 종류.

hyperclaude를 빼고 일반화하면: persistent teammate + long-running reviewer는 prompt-only discipline으로 못 막는 race를 만든다. Lead가 id를 소유하고, timestamp를 소유하고, 모든 delivery를 “방금 보낸 것에 대한 답장”이 아니라 router input으로 다뤄야 한다. 이 멘탈 모델만 자리잡으면 나머지는 bookkeeping이다.

Three ways to generate Open Graph images, and the one I built

2026-05-09T00:00:00+00:00

The OG card you saw before clicking this link — the one Slack or Twitter or Facebook would have shown if you pasted this URL — wasn’t drawn by me. It was generated by dogimg, a small service I built because I tried three different ways to make Open Graph images for this site, and only one of them survived contact with actually writing posts.

This is a walk through those three approaches, in order, and why URL-driven generation won.

Stage 1: Hand-designing OG images in Figma

The first OG image for zeikar.dev was a 1200×630 PNG. I made it in Figma. I exported it. I dragged it into the repo. It looked fine.

It looked fine for one page. By the third post, I was looking at a future where every new post meant another Figma file, another export, another drag-in — and any time I changed the site’s brand color or favicon, every OG image already shipped was visually stale.

The honest problem: an OG image is a poster of metadata that already exists on the page. The </code>, the description, the theme color, the favicon. All of it is already there for browsers and crawlers to read. Hand-designing OG images means writing the same content twice — once for the page, once for the poster. <h2 id="stage-2-param-driven-generators-vercelog-image-and-friends">Stage 2: Param-driven generators (vercel/og-image and friends)</h2> The next stop was <code class="language-plaintext highlighter-rouge">@vercel/og</code>-style services, where you call an endpoint with the content as query parameters: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://og-generator.example/api/og?title=My+Post&description=...&theme=teal </code></pre></div></div> <a href="https://github.com/vercel/og-image"><code class="language-plaintext highlighter-rouge">vercel/og-image</code></a> is the canonical example. The image is generated dynamically, you get a templated layout, and there are no PNGs in the repo. A real improvement over Figma. But the responsibility didn’t actually move. Every time I wrote a post, something still had to pack the post’s metadata into a query string. Either I did it by hand, or I wrote a build-time step that read each post’s front matter, encoded title and description, and emitted a URL with the correct params. The page already knows its own title. The plugin reads it. The plugin re-encodes it. The service receives it and renders it. Three copies of the same string for one image. The frame I landed on: URL-as-template. The URL is a template you fill in, and you do the filling. <h2 id="stage-3-a-url-driven-generator-dogimg">Stage 3: A URL-driven generator (dogimg)</h2> <a href="/projects/dogimg/">dogimg</a> is what fell out of asking the next question: why is the caller packing metadata at all? The page already serves an HTML document with <code class="language-plaintext highlighter-rouge"><title></code>, <code class="language-plaintext highlighter-rouge"><meta name="description"></code>, <code class="language-plaintext highlighter-rouge"><meta property="og:*"></code>, <code class="language-plaintext highlighter-rouge"><meta name="theme-color"></code>, and a favicon link. That’s the source of truth. Why not call that? The API is one parameter: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://dogimg.vercel.app/api/og?url={URL} </code></pre></div></div> What it does, in three steps: <ol> <li>Fetch the HTML at <code class="language-plaintext highlighter-rouge">{URL}</code>.</li> <li>Parse <code class="language-plaintext highlighter-rouge">og:*</code>, <code class="language-plaintext highlighter-rouge">twitter:*</code>, <code class="language-plaintext highlighter-rouge"><title></code>, <code class="language-plaintext highlighter-rouge"><meta name="theme-color"></code>, and the favicon from the document.</li> <li>Render a 1200×630 PNG with <a href="https://vercel.com/docs/og-image-generation/"><code class="language-plaintext highlighter-rouge">@vercel/og</code></a>, using the page’s theme color as a gradient accent and the favicon as the card’s icon.</li> </ol> The frame here is URL-as-truth. The caller doesn’t pack anything. The page already knows what it’s about, and dogimg asks the page directly. If the post’s title changes, the OG image changes — without redeploying the generator, regenerating PNGs, or even thinking about it. In HTML it’s a single tag: <div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><meta property="og:image" content="https://dogimg.vercel.app/api/og?url=https://your-site.com/post" /> </code></pre></div></div> That’s the entire integration on the consumer side. <h2 id="the-payoff-one-jekyll-plugin-every-page-covered">The payoff: one Jekyll plugin, every page covered</h2> zeikar.dev wires this up at build time in <a href="https://github.com/zeikar/zeikar.github.io/blob/main/_plugins/og_image.rb">_plugins/og_image.rb</a>. It runs on <code class="language-plaintext highlighter-rouge">post_read</code> and, for every document or default-layout page that doesn’t already set <code class="language-plaintext highlighter-rouge">image:</code> in front matter, points it at dogimg: <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code>encoded = CGI.escape(base + item.url) item.data["image"] = "https://dogimg.vercel.app/api/og?url=#{encoded}" </code></pre></div></div> Coverage falls out of <code class="language-plaintext highlighter-rouge">site.documents</code> (posts plus every collection — <code class="language-plaintext highlighter-rouge">_projects/*.md</code> is included for free) and <code class="language-plaintext highlighter-rouge">site.pages</code> filtered to <code class="language-plaintext highlighter-rouge">layout: default</code>. <code class="language-plaintext highlighter-rouge">jekyll-seo-tag</code> then emits <code class="language-plaintext highlighter-rouge">og:image</code> and <code class="language-plaintext highlighter-rouge">twitter:image</code> from <code class="language-plaintext highlighter-rouge">page.image</code> once each, no duplication. Writing a post is writing a post. There is no OG step. The card you saw before clicking this link is the proof — same path as everything else on the site. </article> <article> <h1>Open Graph 이미지 만드는 세 가지 방법, 그리고 내가 만든 한 가지</h1> 2026-05-09T00:00:00+00:00 이 링크를 누르기 전에 본 OG 카드(Slack이나 Twitter, Facebook에 이 URL을 붙였다면 떴을 그 미리보기)는 내가 그린 게 아니다. <a href="https://dogimg.vercel.app">dogimg</a>가 생성했다. 이 사이트의 OG 이미지를 만드는 방법을 세 번 다른 방향으로 시도했고, 그중 실제로 글을 계속 써도 살아남은 건 하나뿐이라 만들게 된 작은 서비스다. 이 글은 그 세 가지 접근을 순서대로 짚고, 왜 URL-driven 생성이 이겼는지에 대한 글이다. <h2 id="1단계-figma에서-og-이미지-손으로-만들기">1단계: Figma에서 OG 이미지 손으로 만들기</h2> zeikar.dev의 첫 OG 이미지는 1200×630 PNG였다. Figma에서 만들고, export하고, 레포에 끌어다 놓았다. 괜찮아 보였다. 페이지가 하나일 때는 괜찮았다. 세 번째 글을 쓸 즈음에는 미래가 보였다 — 글이 늘어날 때마다 Figma 파일 하나, export 한 번, drag-in 한 번. 사이트의 brand color나 favicon이 바뀌면 이미 배포된 OG 이미지들은 전부 시각적으로 낡은 것이 된다. 솔직한 문제 정의: OG 이미지는 페이지에 이미 존재하는 메타데이터를 시각화한 포스터다. <code class="language-plaintext highlighter-rouge"><title></code>, description, theme color, favicon. 브라우저와 크롤러가 읽을 정보로 이미 다 거기 있다. OG 이미지를 손으로 디자인한다는 건 같은 콘텐츠를 두 번 쓰는 일이다 — 한 번은 페이지를 위해, 한 번은 포스터를 위해. <h2 id="2단계-param-driven-생성기-vercelog-image-류">2단계: param-driven 생성기 (vercel/og-image 류)</h2> 다음으로 도착한 곳은 <code class="language-plaintext highlighter-rouge">@vercel/og</code> 스타일 서비스였다. 콘텐츠를 query parameter로 넘겨서 호출하는 방식이다: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://og-generator.example/api/og?title=My+Post&description=...&theme=teal </code></pre></div></div> <a href="https://github.com/vercel/og-image"><code class="language-plaintext highlighter-rouge">vercel/og-image</code></a>가 대표적인 예시다. 이미지가 동적으로 생성되고, 템플릿 레이아웃이 적용되며, 레포에 PNG를 안 들고 있어도 된다. Figma 단계보다는 분명한 진전이다. 하지만 책임은 사실 옮겨가지 않았다. 글을 쓸 때마다 무언가가 글의 메타데이터를 query string으로 packing해야 한다. 직접 손으로 하든, 빌드 단계에서 front matter를 읽어 title과 description을 인코딩해 올바른 param이 박힌 URL을 뱉어내든. 페이지는 이미 자기 title을 안다. 플러그인이 그걸 읽는다. 플러그인이 다시 인코딩한다. 서비스가 받아서 렌더링한다. 이미지 하나에 같은 문자열이 세 번 복사되는 셈이다. 내 머릿속에 자리잡은 프레임은 이거였다 — URL-as-template. URL은 채워야 할 템플릿이고, 채우는 일은 호출자 몫이다. <h2 id="3단계-url-driven-생성기-dogimg">3단계: URL-driven 생성기 (dogimg)</h2> <a href="/projects/dogimg/">dogimg</a>는 그다음 질문에서 떨어져 나왔다 — 왜 호출자가 메타데이터를 packing하고 있는가? 페이지는 이미 <code class="language-plaintext highlighter-rouge"><title></code>, <code class="language-plaintext highlighter-rouge"><meta name="description"></code>, <code class="language-plaintext highlighter-rouge"><meta property="og:*"></code>, <code class="language-plaintext highlighter-rouge"><meta name="theme-color"></code>, favicon link가 다 박힌 HTML 문서를 서빙하고 있다. 그게 진실의 출처다. 그 페이지에 직접 물어보면 안 되나? API는 파라미터 하나다: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://dogimg.vercel.app/api/og?url={URL} </code></pre></div></div> 내부 동작은 세 단계: <ol> <li><code class="language-plaintext highlighter-rouge">{URL}</code>에서 HTML을 fetch.</li> <li>문서에서 <code class="language-plaintext highlighter-rouge">og:*</code>, <code class="language-plaintext highlighter-rouge">twitter:*</code>, <code class="language-plaintext highlighter-rouge"><title></code>, <code class="language-plaintext highlighter-rouge"><meta name="theme-color"></code>, favicon을 파싱.</li> <li><a href="https://vercel.com/docs/og-image-generation/"><code class="language-plaintext highlighter-rouge">@vercel/og</code></a>로 1200×630 PNG를 렌더 — 페이지의 theme color를 gradient 액센트로, favicon을 카드 아이콘으로 사용.</li> </ol> 이 단계의 프레임은 URL-as-truth다. 호출자는 아무것도 packing하지 않는다. 페이지는 이미 자기에 대해 알고 있고, dogimg는 그 페이지에 직접 묻는다. 글의 title이 바뀌면 OG 이미지도 바뀐다 — 생성기 재배포도, PNG 재생성도, 심지어 신경 쓰는 것조차 필요 없이. HTML로는 태그 한 줄이다: <div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><meta property="og:image" content="https://dogimg.vercel.app/api/og?url=https://your-site.com/post" /> </code></pre></div></div> 소비자 입장에서의 통합은 이게 전부다. <h2 id="결과-jekyll-플러그인-하나로-모든-페이지-커버">결과: Jekyll 플러그인 하나로 모든 페이지 커버</h2> zeikar.dev에서는 빌드 타임에 <a href="https://github.com/zeikar/zeikar.github.io/blob/main/_plugins/og_image.rb">_plugins/og_image.rb</a>가 이걸 묶어준다. <code class="language-plaintext highlighter-rouge">post_read</code> 훅에서, front matter에 <code class="language-plaintext highlighter-rouge">image:</code>를 명시하지 않은 모든 document와 default 레이아웃 페이지에 대해 <code class="language-plaintext highlighter-rouge">page.image</code>를 dogimg URL로 설정한다: <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code>encoded = CGI.escape(base + item.url) item.data["image"] = "https://dogimg.vercel.app/api/og?url=#{encoded}" </code></pre></div></div> 커버리지는 <code class="language-plaintext highlighter-rouge">site.documents</code>(posts와 모든 컬렉션 — <code class="language-plaintext highlighter-rouge">_projects/*.md</code>도 자동으로 포함)와 <code class="language-plaintext highlighter-rouge">layout: default</code>로 필터링된 <code class="language-plaintext highlighter-rouge">site.pages</code>에서 자연스럽게 떨어진다. 그러면 <code class="language-plaintext highlighter-rouge">jekyll-seo-tag</code>가 <code class="language-plaintext highlighter-rouge">page.image</code>를 받아 <code class="language-plaintext highlighter-rouge">og:image</code>와 <code class="language-plaintext highlighter-rouge">twitter:image</code>를 각각 한 번씩만 emit한다. 중복 없음. 글 쓰기는 그냥 글 쓰기다. OG 단계라는 게 없다. 이 링크를 누르기 전에 본 카드가 그 증거다 — 사이트의 다른 모든 페이지와 같은 경로로 만들어졌다. </article> <article> <h1>Why Google Search Console can’t fetch your github.io sitemap</h1> 2026-05-07T00:00:00+00:00 This is a story about an XML file that wasn’t broken. Specifically, why Google Search Console kept saying <code class="language-plaintext highlighter-rouge">Couldn't fetch</code> on my <code class="language-plaintext highlighter-rouge">sitemap.xml</code>, why every diagnostic I ran came back green, and why the answer turned out to have nothing to do with the XML. <h2 id="the-setup">The setup</h2> <code class="language-plaintext highlighter-rouge">zeikar.github.io</code> was a Jekyll site on GitHub Pages. The root <code class="language-plaintext highlighter-rouge">sitemap.xml</code> was a sitemap index — three sub-sitemaps under the same hostname: <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap><loc>https://zeikar.github.io/sitemap-main.xml</loc></sitemap> <sitemap><loc>https://zeikar.github.io/backend-interview-guide/sitemap.xml</loc></sitemap> <sitemap><loc>https://zeikar.github.io/charivo/sitemap.xml</loc></sitemap> </sitemapindex> </code></pre></div></div> The main one covers blog posts and project pages. The other two come from sub-projects published as their own GitHub Pages sites under the same hostname. Submit <code class="language-plaintext highlighter-rouge">https://zeikar.github.io/sitemap.xml</code> to Google Search Console; GSC reads the index, fetches each sub-sitemap, and queues the URLs for indexing. That was the plan. What GSC actually did was sit at <code class="language-plaintext highlighter-rouge">Couldn't fetch</code> for days. Resubmitting didn’t help. Waiting didn’t help. <h2 id="five-green-checks">Five green checks</h2> <h3 id="xml-validation">XML validation</h3> First suspect: the served XML itself. <code class="language-plaintext highlighter-rouge">xmllint</code> against what GitHub Pages actually returns: <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -sS https://zeikar.github.io/sitemap.xml | xmllint --noout -; echo $? 0 </code></pre></div></div> And it validates against the official sitemap.org schema: <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -sS https://zeikar.github.io/sitemap.xml | xmllint --schema siteindex.xsd --noout - - validates </code></pre></div></div> All three sub-sitemaps validate too, against the corresponding <code class="language-plaintext highlighter-rouge">sitemap.xsd</code>. Green. <h3 id="http--content-type">HTTP & Content-Type</h3> Maybe GitHub Pages serves it with the wrong content type. <code class="language-plaintext highlighter-rouge">curl -I</code>: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -sI https://zeikar.github.io/sitemap.xml | head -3 HTTP/2 200 server: GitHub.com content-type: application/xml </code></pre></div></div> <code class="language-plaintext highlighter-rouge">200 OK</code>, <code class="language-plaintext highlighter-rouge">application/xml</code>. The bytes start with <code class="language-plaintext highlighter-rouge"><?xml</code> — no BOM, UTF-8 clean. Green. <h3 id="googlebot-user-agent">Googlebot User-Agent</h3> Maybe Google’s bot sees something different from my browser. Diffing the default-UA fetch against a Googlebot-UA fetch: <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ diff <(curl -sS https://zeikar.github.io/sitemap.xml) \ <(curl -sSA "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \ https://zeikar.github.io/sitemap.xml) </code></pre></div></div> (empty diff). Identical bytes. Green. <h3 id="sitemap-index-scope-rules">Sitemap index scope rules</h3> The <a href="https://developers.google.com/search/docs/crawling-indexing/sitemaps/large-sitemaps">sitemap index spec</a> requires that referenced sub-sitemaps live at the same path or deeper than the index, and on the same host. My index is at <code class="language-plaintext highlighter-rouge">/sitemap.xml</code> — root scope, so anything on the host qualifies. The three sub-sitemaps are all on <code class="language-plaintext highlighter-rouge">zeikar.github.io</code>, two of them in deeper paths (<code class="language-plaintext highlighter-rouge">/backend-interview-guide/</code>, <code class="language-plaintext highlighter-rouge">/charivo/</code>). Green. <h3 id="robotstxt">robots.txt</h3> A robots.txt block could shut everything down. Mine has the opposite: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>User-agent: * Allow: / Sitemap: https://zeikar.github.io/sitemap.xml </code></pre></div></div> Allow <code class="language-plaintext highlighter-rouge">/</code>, declare the sitemap explicitly. Green. <hr /> So: every diagnostic on the artifact came back clean. The XML was fine. The HTTP was fine. The bot could reach it. The path scope was legal. robots.txt was permissive. And GSC still said <code class="language-plaintext highlighter-rouge">Couldn't fetch</code>. <h2 id="the-pivot">The pivot</h2> Searching for the exact error string lands on a pattern that’s been documented across years of public reports: GSC frequently fails to fetch sitemaps from <code class="language-plaintext highlighter-rouge">*.github.io</code> subdomains, even when those same sitemaps work fine for other indexers like Bing. The same XML on a custom domain gets fetched instantly. (<a href="https://support.google.com/webmasters/thread/352368538">Google Search Central thread</a>, <a href="https://github.com/orgs/community/discussions/149884">GitHub community discussion</a>, <a href="https://github.com/cotes2020/jekyll-theme-chirpy/issues/2658">Chirpy theme issue #2658</a>, <a href="https://dev.to/stankukucka/google-search-console-cant-fetch-sitemap-on-github-pages-31kn">a dev.to walkthrough</a>.) There’s no official explanation, and the public threads run on competing community theories. One framing comes from a contributor in the Chirpy thread: that GSC may behave differently depending on whether you’ve registered the site as a URL prefix property or a Domain property — and on <code class="language-plaintext highlighter-rouge">.github.io</code> you can only register a URL prefix property, since the apex belongs to GitHub. They report moving to a custom domain (verifying it as a Domain property via DNS), keeping the GitHub Pages backend unchanged, and the sitemap submitting immediately. Worth noting: Google’s <a href="https://developers.google.com/webmaster-tools/v1/sitemaps/submit">Search Console API</a> and <a href="https://support.google.com/webmasters/answer/34592">property documentation</a> both list URL-prefix properties as valid sitemap-submission targets, so this isn’t a documented requirement — only an observed correlation in the threads. A different theory in the same threads is that GitHub Pages rate-limits Google’s automation IP ranges, surfacing as <code class="language-plaintext highlighter-rouge">URL_FETCH_STATUS_MISC_ERROR</code> inside Google’s fetcher. I can’t verify either from outside both systems. What’s clear is the empirical pattern: same artifact, different host, completely different GSC behavior. <h2 id="the-fix">The fix</h2> So I bought <code class="language-plaintext highlighter-rouge">zeikar.dev</code> and set up the standard GitHub Pages custom domain: <code class="language-plaintext highlighter-rouge">A</code>/<code class="language-plaintext highlighter-rouge">AAAA</code> records on the apex pointing at GitHub’s IPs, a <code class="language-plaintext highlighter-rouge">CNAME</code> file in the repo, and <code class="language-plaintext highlighter-rouge">url: "https://zeikar.dev"</code> in <code class="language-plaintext highlighter-rouge">_config.yml</code>. Resubmitted the sitemap to GSC. GSC fetched it on the first try. The XML structure was unchanged. The Jekyll build and sub-sitemap layout were unchanged. The HTTP headers were unchanged. The only thing that moved was the hostname inside every URL — <code class="language-plaintext highlighter-rouge"><loc></code> values and the <code class="language-plaintext highlighter-rouge">Sitemap:</code> line in <code class="language-plaintext highlighter-rouge">robots.txt</code> flipped from <code class="language-plaintext highlighter-rouge">zeikar.github.io</code> to <code class="language-plaintext highlighter-rouge">zeikar.dev</code>, but nothing else. <h2 id="what-i-should-have-tried-first">What I should have tried first</h2> When every diagnostic on the artifact comes back clean, the bug is upstream of the artifact. The cheapest debugging step in that situation is the one that swaps the substrate — not the one that pokes the artifact harder. A few hours of XML and HTTP-header diagnostics, when 30 seconds of “let me try a different hostname” would have shown me the answer. Different shape from the <a href="/blog/from-getauthtoken-to-launchwebauthflow/">getAuthToken</a> and <a href="/blog/from-chrome-cookies-to-chips/">CHIPS</a> posts, but the same family of mistake — I was tuning the wrong thing. </article> <article> <h1>Google Search Console이 github.io 사이트맵을 못 가져오는 이유</h1> 2026-05-07T00:00:00+00:00 망가지지 않은 XML 파일에 대한 이야기다. 구체적으로는, Google Search Console이 내 <code class="language-plaintext highlighter-rouge">sitemap.xml</code>에 <code class="language-plaintext highlighter-rouge">Couldn't fetch</code>를 계속 띄운 이유, 내가 돌린 모든 진단이 초록불을 켠 이유, 그리고 답이 결국 XML과 무관했던 이유. <h2 id="세팅">세팅</h2> <code class="language-plaintext highlighter-rouge">zeikar.github.io</code>는 GitHub Pages 위에서 도는 Jekyll 사이트였다. 루트의 <code class="language-plaintext highlighter-rouge">sitemap.xml</code>은 사이트맵 index 였고, 같은 호스트 아래 서브 사이트맵 세 개를 가리켰다: <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap><loc>https://zeikar.github.io/sitemap-main.xml</loc></sitemap> <sitemap><loc>https://zeikar.github.io/backend-interview-guide/sitemap.xml</loc></sitemap> <sitemap><loc>https://zeikar.github.io/charivo/sitemap.xml</loc></sitemap> </sitemapindex> </code></pre></div></div> main은 블로그 글과 프로젝트 페이지를 포괄하고, 나머지 두 개는 같은 호스트네임 아래에 별도 GitHub Pages로 배포된 서브 프로젝트의 사이트맵이다. <code class="language-plaintext highlighter-rouge">https://zeikar.github.io/sitemap.xml</code>을 Google Search Console에 제출하면, GSC가 인덱스를 읽고 서브 사이트맵을 fetch한 뒤 URL들을 인덱싱 큐에 넣는다. 그게 계획이었다. 실제로 GSC가 한 일은 며칠 동안 <code class="language-plaintext highlighter-rouge">Couldn't fetch</code>에 머무는 것이었다. 재제출도 소용없었다. 기다려도 소용없었다. <h2 id="다섯-번의-초록불">다섯 번의 초록불</h2> <h3 id="xml-검증">XML 검증</h3> 첫 번째 용의자: 서빙되는 XML 그 자체. GitHub Pages가 실제로 응답하는 바이트에 <code class="language-plaintext highlighter-rouge">xmllint</code>를 돌려보면 well-formed: <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -sS https://zeikar.github.io/sitemap.xml | xmllint --noout -; echo $? 0 </code></pre></div></div> 공식 sitemap.org 스키마 검증도 통과: <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -sS https://zeikar.github.io/sitemap.xml | xmllint --schema siteindex.xsd --noout - - validates </code></pre></div></div> 서브 사이트맵 세 개도 각자의 <code class="language-plaintext highlighter-rouge">sitemap.xsd</code>로 검증 통과. 초록불. <h3 id="http--content-type">HTTP & Content-Type</h3> GitHub Pages가 잘못된 content type으로 서빙할 가능성. <code class="language-plaintext highlighter-rouge">curl -I</code>: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -sI https://zeikar.github.io/sitemap.xml | head -3 HTTP/2 200 server: GitHub.com content-type: application/xml </code></pre></div></div> <code class="language-plaintext highlighter-rouge">200 OK</code>, <code class="language-plaintext highlighter-rouge">application/xml</code>. 바이트는 <code class="language-plaintext highlighter-rouge"><?xml</code>로 시작 — BOM 없음, UTF-8 깔끔. 초록불. <h3 id="googlebot-user-agent">Googlebot User-Agent</h3> Google bot이 내 브라우저와 다른 응답을 받을 가능성. 기본 UA fetch와 Googlebot UA fetch를 diff: <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ diff <(curl -sS https://zeikar.github.io/sitemap.xml) \ <(curl -sSA "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \ https://zeikar.github.io/sitemap.xml) </code></pre></div></div> (diff 비어있음). 동일한 바이트. 초록불. <h3 id="사이트맵-인덱스-스코프-규칙">사이트맵 인덱스 스코프 규칙</h3> <a href="https://developers.google.com/search/docs/crawling-indexing/sitemaps/large-sitemaps">사이트맵 인덱스 스펙</a>은 인덱스에서 참조하는 서브 사이트맵이 인덱스와 같거나 더 깊은 경로에, 그리고 같은 호스트에 있어야 한다고 규정한다. 내 인덱스는 <code class="language-plaintext highlighter-rouge">/sitemap.xml</code> — 루트 스코프라 같은 호스트의 어떤 경로든 OK. 서브 사이트맵 세 개는 모두 <code class="language-plaintext highlighter-rouge">zeikar.github.io</code> 위에 있고, 두 개는 더 깊은 경로(<code class="language-plaintext highlighter-rouge">/backend-interview-guide/</code>, <code class="language-plaintext highlighter-rouge">/charivo/</code>)에 있다. 초록불. <h3 id="robotstxt">robots.txt</h3> robots.txt 차단은 모든 걸 망가뜨릴 수 있다. 내 robots.txt는 정반대였다: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>User-agent: * Allow: / Sitemap: https://zeikar.github.io/sitemap.xml </code></pre></div></div> <code class="language-plaintext highlighter-rouge">/</code> 허용에, 사이트맵을 명시적으로 선언. 초록불. <hr /> 그래서: 아티팩트에 대한 모든 진단이 깨끗하게 통과했다. XML도 멀쩡. HTTP도 멀쩡. 봇이 접근 가능. 경로 스코프 합법. robots.txt 허용. 그런데도 GSC는 <code class="language-plaintext highlighter-rouge">Couldn't fetch</code>라고 했다. <h2 id="패턴">패턴</h2> 정확한 에러 문구로 검색해보니 수년에 걸쳐 보고된 패턴이 있었다: GSC가 <code class="language-plaintext highlighter-rouge">*.github.io</code> 서브도메인의 사이트맵을 반복적으로 못 가져온다. 같은 사이트맵을 Bing 같은 다른 인덱서는 멀쩡히 가져간다. 같은 XML을 커스텀 도메인으로 옮기면 즉시 fetch된다. (<a href="https://support.google.com/webmasters/thread/352368538">Google Search Central 스레드</a>, <a href="https://github.com/orgs/community/discussions/149884">GitHub community discussion</a>, <a href="https://github.com/cotes2020/jekyll-theme-chirpy/issues/2658">Chirpy 테마 이슈 #2658</a>, <a href="https://dev.to/stankukucka/google-search-console-cant-fetch-sitemap-on-github-pages-31kn">dev.to 사례</a>.) 공식 설명은 없고, 공개된 스레드들은 서로 다른 커뮤니티 가설로 갈린다. 그중 한 framing은 위 Chirpy 스레드의 한 contributor에게서 나온다: GSC의 사이트맵 제출 동작이 URL prefix property 로 등록한 사이트와 Domain property 로 등록한 사이트에서 다르게 보일 수 있다는 관찰이다. <code class="language-plaintext highlighter-rouge">.github.io</code> 서브도메인은 apex가 GitHub 소유라 URL prefix property로만 등록 가능하다. 그 contributor는 본인 소유 도메인으로 옮겨 DNS 인증의 Domain property로 등록한 뒤 — GitHub Pages 백엔드는 그대로 두고 — 사이트맵이 즉시 제출됐다고 적었다. 짚어둘 점: Google의 <a href="https://developers.google.com/webmaster-tools/v1/sitemaps/submit">Search Console API 문서</a>와 <a href="https://support.google.com/webmasters/answer/34592">property 종류 안내</a>는 URL prefix property도 사이트맵 제출의 유효한 대상으로 나열하므로, 이건 공식 요구조건이 아니라 스레드들에서 관찰된 상관관계다. 같은 스레드들 안에 떠도는 다른 가설은, GitHub Pages가 Google의 자동화 IP 대역에 레이트리밋을 걸거나 차단을 해서 Google fetcher 안에서 <code class="language-plaintext highlighter-rouge">URL_FETCH_STATUS_MISC_ERROR</code>로 노출된다는 것이다. 두 시스템 외부에서 어느 쪽도 검증할 방법은 없다. 분명한 건 경험적 패턴이다: 같은 아티팩트, 다른 호스트, 완전히 다른 GSC 동작. <h2 id="답">답</h2> 그래서 <code class="language-plaintext highlighter-rouge">zeikar.dev</code>를 사서, GitHub Pages 커스텀 도메인 표준 절차대로 연결했다: apex에 GitHub IP를 가리키는 <code class="language-plaintext highlighter-rouge">A</code>/<code class="language-plaintext highlighter-rouge">AAAA</code> 레코드, 레포 루트의 <code class="language-plaintext highlighter-rouge">CNAME</code> 파일, 그리고 <code class="language-plaintext highlighter-rouge">_config.yml</code>의 <code class="language-plaintext highlighter-rouge">url: "https://zeikar.dev"</code>. GSC에 사이트맵을 재제출했다. GSC가 첫 시도에 fetch했다. XML 구조는 그대로. Jekyll 빌드와 서브 사이트맵 레이아웃도 그대로. HTTP 헤더도 그대로. 움직인 건 모든 URL 안의 호스트네임뿐 — 사이트맵 <code class="language-plaintext highlighter-rouge"><loc></code>들과 <code class="language-plaintext highlighter-rouge">robots.txt</code>의 <code class="language-plaintext highlighter-rouge">Sitemap:</code> 줄이 <code class="language-plaintext highlighter-rouge">zeikar.github.io</code>에서 <code class="language-plaintext highlighter-rouge">zeikar.dev</code>로 바뀐 게 전부였다. <h2 id="처음에-해봤어야-할-것">처음에 해봤어야 할 것</h2> 아티팩트에 대한 모든 진단이 깨끗하면, 버그는 아티팩트보다 위 레이어에 있다. 이런 상황에서 가장 싼 디버깅은 substrate를 바꿔보는 것이지, 아티팩트를 더 세게 찔러보는 게 아니다. XML과 HTTP 헤더 디버깅에 몇 시간을 쓴 끝에, “그냥 호스트네임을 바꿔볼까” 30초가 답을 알려줬을 것이었다. <a href="/blog/ko/from-getauthtoken-to-launchwebauthflow/">getAuthToken</a>, <a href="/blog/ko/from-chrome-cookies-to-chips/">CHIPS</a> 글들과 모양은 다르지만, 같은 부류의 실수다 — 잘못된 걸 튜닝하고 있었다. </article> <article> <h1>Chrome Extension OAuth: From getAuthToken to launchWebAuthFlow</h1> 2026-05-05T00:00:00+00:00 This is a story about cancel detection in OAuth — specifically, why we were band-aiding around <code class="language-plaintext highlighter-rouge">chrome.identity.getAuthToken</code> in a Chrome extension, why the band-aid raced itself, and how stepping back to a different <code class="language-plaintext highlighter-rouge">chrome.identity</code> primitive made the band-aid disappear. A small problem, but the shape rhymes with the <a href="/blog/from-chrome-cookies-to-chips/">previous CHIPS post</a>: we tried to fix the wrong thing. <h2 id="setup">Setup</h2> Quick recap: <a href="https://commentarium.app">Commentarium</a> is a comments app, and the <a href="https://github.com/zeikar/commentarium-extension">Chrome extension</a> injects an iframe of <code class="language-plaintext highlighter-rouge">commentarium.app/comments?url=…</code> on every page. The service worker brokers Firebase auth between the iframe and the deployed webapp. The previous post was about how the cookie gets to the iframe; this one is about how the Firebase ID token is generated in the SW in the first place. Until last week, that was a one-liner: <div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code>const { token: accessToken } = await chrome.identity.getAuthToken({ interactive: true }); const credential = GoogleAuthProvider.credential(null, accessToken); await signInWithCredential(auth, credential); </code></pre></div></div> Chrome’s account chooser opens, the user picks an account, you get an access token, you hand it to <code class="language-plaintext highlighter-rouge">GoogleAuthProvider.credential</code> and Firebase signs them in. Done. Until a user closes the chooser without picking anything. <h2 id="the-60-second-spinner">The 60-second spinner</h2> QA report: “click Sign in with Google, the chooser opens, I close it, and the spinner just sits there.” How long? About a minute. A minute is a curious amount of time. Not “forever” (real hangs feel longer in user time), not “instant” (real cancellation is sub-second). Sixty seconds is a budget — somebody is timing something out. The SW DevTools console finally tells us who: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Unchecked runtime.lastError: A listener indicated an asynchronous response by returning true, but the message channel closed before a response was received </code></pre></div></div> The webapp’s <code class="language-plaintext highlighter-rouge">chrome.runtime.sendMessage(EXT_ID, { type: "signIn.google" })</code> was waiting on the SW for a response. The SW had returned <code class="language-plaintext highlighter-rouge">true</code> from the message handler (signaling “I’ll respond async”), then sat awaiting <code class="language-plaintext highlighter-rouge">chrome.identity.getAuthToken</code>. After about a minute, Chrome force-closed the message channel because the SW had idled out. The webapp got a generic channel-closed error, surfaced as “Authentication failed.” <h2 id="why-60-seconds">Why 60 seconds?</h2> Two platform behaviors collide. Behavior #1: <code class="language-plaintext highlighter-rouge">chrome.identity.getAuthToken({ interactive: true })</code>’s cancel callback behaved unreliably in our testing. On most platforms, when the user closes the chooser without selecting, Chrome calls back with <code class="language-plaintext highlighter-rouge">chrome.runtime.lastError = "The user did not approve access."</code> and the Promise rejects. In our macOS QA environment, that path dropped the callback entirely. The Promise neither resolves nor rejects. The await just hangs. Behavior #2: MV3 service workers idle out. The official rule is a 30-second timer that resets on every event the SW handles. A pending message-handler response should keep the SW alive, but in practice the SW is torn down somewhere in the 30-to-60-second window if the work is just sitting in a hung await. When the SW dies, the message channel closes, and the webapp sees “channel closed before response.” Compose them: the cancel path doesn’t fire → SW awaits forever → SW idles out → channel closes → webapp shows generic error after ~60s. <h2 id="first-fix-keepalive--timeout-race">First fix: keepalive + timeout race</h2> The straightforward band-aid: <div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code>async function signInGoogleOp(): Promise<AuthResponse> { // Keep the SW alive while the OAuth flow is pending. const keepAlive = setInterval(() => { void chrome.runtime.getPlatformInfo().catch(() => {}); }, 20_000); // Race against a hard cap so a stuck callback can't hang forever. const timeoutPromise = new Promise<never>((_, reject) => { setTimeout(() => reject({ code: "identity/timeout", message: "..." }), 5 * 60 * 1000); }); try { const tokenResult = await Promise.race([ chrome.identity.getAuthToken({ interactive: true }), timeoutPromise, ]); // ...rest unchanged } finally { clearInterval(keepAlive); } } </code></pre></div></div> <code class="language-plaintext highlighter-rouge">chrome.runtime.getPlatformInfo()</code> is the cheapest <code class="language-plaintext highlighter-rouge">chrome.*</code> call we could think of — it has no side effects and pinging it every 20 seconds keeps the SW from idling out. The 5-minute timeout would surface a clean <code class="language-plaintext highlighter-rouge">identity/timeout</code> error if the cancel callback was really never coming. This worked. The webapp now stops spinning after 5 minutes instead of 60 seconds. 5 minutes is, somehow, worse UX than 60 seconds. <h2 id="racing-yourself">Racing yourself</h2> A code reviewer caught the next problem: Chrome <a href="https://developer.chrome.com/docs/extensions/develop/concepts/service-workers/lifecycle">terminates an extension service worker</a> when a single event or API request takes longer than ~5 minutes to process — independent of any <code class="language-plaintext highlighter-rouge">chrome.*</code> activity going on alongside it. Our message-handler request had been awaiting <code class="language-plaintext highlighter-rouge">getAuthToken</code> since the chooser opened; that was the in-flight single request, and our 5-minute timeout was racing exactly that cap. If Chrome killed the SW first, the timer never fires, the channel closes uncleanly, and we’re back to the original “channel closed” error. Drop the timeout to 60 seconds. We no longer race the cap, but the user still waits a full minute before getting a structured response. The keepalive is doing its job — keeping the SW alive — but only so it can deliver “we gave up” 60 seconds later. That’s the moment it became obvious we were patching the wrong layer. The keepalive was extending an await that should not have been needed in the first place. The cancel signal was sitting somewhere we couldn’t see from inside <code class="language-plaintext highlighter-rouge">getAuthToken</code>. <h2 id="the-right-primitive">The right primitive</h2> <code class="language-plaintext highlighter-rouge">chrome.identity</code> ships two interactive APIs: <table> <thead> <tr> <th>API</th> <th>What it opens</th> <th>Cancel signal</th> </tr> </thead> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">getAuthToken</code></td> <td>Chrome-internal account chooser</td> <td>Sometimes silently dropped</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code></td> <td>A regular browser window pointed at an OAuth URL</td> <td>Promise rejects when the window closes; redirect URL fragment carries <code class="language-plaintext highlighter-rouge">error=access_denied</code> if the provider denies</td> </tr> </tbody> </table> <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code> opens a real Chromium window. When the user closes it via the X button, Chrome reliably fires the callback with an error. There is no internal account-chooser surface to swallow the close event. You give up the convenience of <code class="language-plaintext highlighter-rouge">getAuthToken</code>’s “give me a token, you figure out the rest” — instead you build the OAuth URL yourself, pass it to <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>, and parse the redirect URL fragment that comes back. Roughly 15-20 more lines of code. In exchange, the cancel path becomes ~immediate and observable. <h2 id="the-migration-shape">The migration shape</h2> We kept the existing Firebase wiring intact. Specifically, we used <code class="language-plaintext highlighter-rouge">response_type=token</code> so the redirect carries an <code class="language-plaintext highlighter-rouge">access_token</code> we can pass to <code class="language-plaintext highlighter-rouge">GoogleAuthProvider.credential(null, accessToken)</code> — exactly what <code class="language-plaintext highlighter-rouge">getAuthToken</code> was feeding it. No nonce-verification surface to introduce. <div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code>async function signInGoogleOp(): Promise<AuthResponse> { const state = crypto.randomUUID(); const params = new URLSearchParams({ client_id: import.meta.env.VITE_GOOGLE_OAUTH_WEB_CLIENT_ID, redirect_uri: chrome.identity.getRedirectURL(), response_type: "token", scope: "openid email profile", state, prompt: "select_account", }); let responseUrl: string | undefined; try { responseUrl = await chrome.identity.launchWebAuthFlow({ url: `https://accounts.google.com/o/oauth2/v2/auth?${params}`, interactive: true, }); } catch (err) { return { error: { code: "auth/popup-closed-by-user", message: "..." } }; } const fragment = new URLSearchParams(new URL(responseUrl!).hash.slice(1)); // Verify state first, before reading any other field. if (fragment.get("state") !== state) { return { error: { code: "identity/state-mismatch", message: "..." } }; } const oauthError = fragment.get("error"); if (oauthError) { return { error: { code: oauthError === "access_denied" ? "auth/popup-closed-by-user" : "identity/oauth-error", message: fragment.get("error_description") ?? oauthError, }, }; } const accessToken = fragment.get("access_token"); // ...Firebase block, identical to before } </code></pre></div></div> Three details worth highlighting. Cancel mapping to <code class="language-plaintext highlighter-rouge">auth/popup-closed-by-user</code>. That’s Firebase’s standard error code for popup-based OAuth cancellation. The webapp already had a code path for it — it’s what <code class="language-plaintext highlighter-rouge">signInWithPopup</code> produces in browser flows — so the user-visible “Sign-in cancelled” copy was free. No webapp change needed. State verified first. Both success and error responses echo back <code class="language-plaintext highlighter-rouge">state</code>. Verifying it before reading any other fragment field gates everything downstream on a CSRF check. A maliciously crafted redirect can’t smuggle an <code class="language-plaintext highlighter-rouge">error=access_denied</code> past us as a “user cancelled” signal. No keepalive, no timeout race. Both went away. The OAuth flow now resolves or rejects inside <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>, the API that owns the prompt, so we no longer need a separate SW keepalive or a custom timeout to work around lifetime limits. <h2 id="the-cloud-console-caveat">The Cloud Console caveat</h2> You can’t reuse a “Chrome App”-type OAuth client_id with <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>. Chrome App clients are tied to the manifest’s <code class="language-plaintext highlighter-rouge">oauth2</code> field and don’t accept arbitrary redirect URIs. You need a “Web application” client_id with an authorized redirect URI of exactly <code class="language-plaintext highlighter-rouge">https://<EXTENSION_ID>.chromiumapp.org/</code> — trailing slash matters, that’s what <code class="language-plaintext highlighter-rouge">chrome.identity.getRedirectURL()</code> returns. The extension’s old <code class="language-plaintext highlighter-rouge">oauth2</code> manifest field becomes dead and gets removed. (There’s a separate concern in unpacked-dev workflows: a dev’s local extension ID won’t match the prod ID unless they pin it via the manifest’s <code class="language-plaintext highlighter-rouge">key</code> field. Pinning via <code class="language-plaintext highlighter-rouge">VITE_EXTENSION_KEY</code> was already required so the deployed webapp’s hardcoded prod-EXT_ID <code class="language-plaintext highlighter-rouge">runtime.sendMessage</code> call could reach a local SW. After this migration, pinning also makes <code class="language-plaintext highlighter-rouge">getRedirectURL()</code> return the URI Cloud Console has authorized. One stone, two birds.) <h2 id="what-the-diff-looked-like">What the diff looked like</h2> <table> <thead> <tr> <th> </th> <th>getAuthToken path</th> <th>launchWebAuthFlow path</th> </tr> </thead> <tbody> <tr> <td>Cancel detection</td> <td>Silently dropped on some platforms</td> <td>Promise rejects within ~1s</td> </tr> <tr> <td>SW keepalive ping</td> <td>Required (every 20s)</td> <td>None</td> </tr> <tr> <td>Timeout</td> <td>Required (first 5 min, racing the SW lifetime cap; then 60s)</td> <td>None</td> </tr> <tr> <td>User wait on cancel</td> <td>~60s before generic error</td> <td>~1s, structured <code class="language-plaintext highlighter-rouge">auth/popup-closed-by-user</code></td> </tr> <tr> <td>OAuth client type</td> <td>Chrome App (manifest <code class="language-plaintext highlighter-rouge">oauth2</code>)</td> <td>Web application (Cloud Console redirect URI)</td> </tr> <tr> <td>State / CSRF defense</td> <td>None (Chrome handles it internally)</td> <td>Explicit, verified before fragment is consumed</td> </tr> <tr> <td>Lines in <code class="language-plaintext highlighter-rouge">signInGoogleOp</code></td> <td>~30 (band-aided)</td> <td>~50 (URL build + fragment parse + error mapping)</td> </tr> </tbody> </table> 15-20 more lines of code. Two timer-juggling primitives gone. A 60× UX improvement on the cancel path. <h2 id="takeaways">Takeaways</h2> <ol> <li>Platform-quirk band-aids race other platform quirks. The keepalive + timeout was racing the very SW lifetime cap that we were keeping alive against. Patching at the wrong layer means trading one platform constraint for another.</li> <li>“Right primitive” beats “right workaround.” Once we noticed <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code> had a deterministic cancel signal, the <code class="language-plaintext highlighter-rouge">getAuthToken</code> band-aid looked dated. Thirty lines of timer juggling was mass-and-energy in the wrong place.</li> <li>Verify <code class="language-plaintext highlighter-rouge">state</code> before reading any other OAuth response field. Both success and error redirects echo it. Checking it first is a free CSRF defense and rules out a category of weird “I cancelled but the app says I tried something else” reports.</li> <li>MV3 SW lifetime is a structural constraint, not a tweakable. Treat the 30-second idle timer and ~5-minute per-request cap as architecture inputs. If your design needs a single request to take longer than 5 minutes, you’re probably in the wrong shape — like we were.</li> </ol> The fix once we found it was, again, surprisingly mechanical. The hardest part was admitting the band-aid wasn’t almost working. <hr /> Code: <a href="https://github.com/zeikar/commentarium-extension">commentarium-extension</a>. The migration landed as commit <a href="https://github.com/zeikar/commentarium-extension/commit/7b95db1"><code class="language-plaintext highlighter-rouge">7b95db1</code></a>. </article> <article> <h1>Chrome 확장 프로그램 OAuth: getAuthToken에서 launchWebAuthFlow로</h1> 2026-05-05T00:00:00+00:00 OAuth에서 cancel을 어떻게 감지할 것인가 — 구체적으로는, Chrome 익스텐션에서 <code class="language-plaintext highlighter-rouge">chrome.identity.getAuthToken</code>을 어떻게 band-aid로 둘러싸려고 했는지, 그 band-aid가 왜 자기 자신과 race했는지, 그리고 다른 <code class="language-plaintext highlighter-rouge">chrome.identity</code> primitive로 한 발짝 물러섰더니 band-aid가 사라진 이야기다. 작은 문제지만, 모양은 <a href="/blog/ko/from-chrome-cookies-to-chips/">지난 CHIPS 포스트</a>와 같다 — 우리는 잘못된 걸 고치려 했다. <h2 id="세팅">세팅</h2> 빠른 복습: <a href="https://commentarium.app">Commentarium</a>은 댓글 웹앱이고, <a href="https://github.com/zeikar/commentarium-extension">Chrome 익스텐션</a>이 모든 페이지에 <code class="language-plaintext highlighter-rouge">commentarium.app/comments?url=…</code> iframe을 끼워 넣는다. service worker가 iframe과 배포된 웹앱 사이에서 Firebase 인증을 broker한다. 지난 포스트는 쿠키가 iframe까지 어떻게 가는지에 대한 이야기였고, 이번 포스트는 SW가 Firebase ID token을 애초에 어떻게 만드는지에 대한 이야기다. 지난주까지 그건 한 줄짜리였다: <div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code>const { token: accessToken } = await chrome.identity.getAuthToken({ interactive: true }); const credential = GoogleAuthProvider.credential(null, accessToken); await signInWithCredential(auth, credential); </code></pre></div></div> Chrome 계정 chooser가 뜨고, 사용자가 계정을 고르고, access token이 돌아오고, 그걸 <code class="language-plaintext highlighter-rouge">GoogleAuthProvider.credential</code>에 넘기면 Firebase가 sign-in 시킨다. 끝. 사용자가 chooser를 아무것도 안 고르고 닫기 전까지는. <h2 id="60초-spinner">60초 spinner</h2> QA 보고: “Google로 로그인 누르면 chooser가 뜨는데, 닫으니까 spinner가 멈추질 않는다.” 얼마나? 약 1분. 1분이라는 건 묘한 시간이다. “영원히”는 아니고 (실제 hang은 사용자 체감으로 더 길게 느껴진다), “즉시”도 아니고 (실제 cancel은 sub-second). 60초는 budget이다 — 누군가 timeout을 재고 있는 거다. SW DevTools console이 결국 답을 준다: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Unchecked runtime.lastError: A listener indicated an asynchronous response by returning true, but the message channel closed before a response was received </code></pre></div></div> 웹앱의 <code class="language-plaintext highlighter-rouge">chrome.runtime.sendMessage(EXT_ID, { type: "signIn.google" })</code>가 SW 응답을 기다리고 있었다. SW는 message handler에서 <code class="language-plaintext highlighter-rouge">true</code>를 반환했고 (“async로 응답하겠다”는 신호), 그 상태로 <code class="language-plaintext highlighter-rouge">chrome.identity.getAuthToken</code>을 await한 채 멍하니 앉아 있었다. 약 1분 뒤, SW가 idle out 되면서 Chrome이 message channel을 강제로 닫았다. 웹앱은 generic한 channel-closed 에러를 받았고, “Authentication failed”로 표시됐다. <h2 id="왜-60초인가">왜 60초인가</h2> 플랫폼 동작 두 개가 충돌한다. Behavior #1: <code class="language-plaintext highlighter-rouge">chrome.identity.getAuthToken({ interactive: true })</code>의 cancel callback이 불안정하게 관찰됐다. 대부분의 플랫폼에선 사용자가 chooser를 안 고르고 닫으면 Chrome이 <code class="language-plaintext highlighter-rouge">chrome.runtime.lastError = "The user did not approve access."</code>로 콜백하고 Promise가 reject된다. 그런데 우리 macOS QA 환경에선 그 path가 callback을 통째로 떨궜다. Promise가 resolve도 reject도 안 된다. await가 그냥 hang. Behavior #2: MV3 service worker는 idle out 된다. 공식적으로는 SW가 처리하는 모든 이벤트마다 리셋되는 30초 타이머다. 응답 대기 중인 message handler는 원칙상 SW를 살려둬야 하는데, 실제론 hang된 await 위에 그냥 앉아 있으면 30~60초 사이 어딘가에서 SW가 죽는다. SW가 죽으면 message channel이 닫히고, 웹앱이 “channel closed before response”를 본다. 조합하면: cancel path가 발화 안 됨 → SW 영원히 await → SW idle out → channel closed → 웹앱이 ~60초 후에 generic 에러를 본다. <h2 id="1차-수정-keepalive--timeout-race">1차 수정: keepalive + timeout race</h2> 직진형 band-aid: <div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code>async function signInGoogleOp(): Promise<AuthResponse> { // OAuth가 진행 중인 동안 SW를 살려둔다. const keepAlive = setInterval(() => { void chrome.runtime.getPlatformInfo().catch(() => {}); }, 20_000); // hung callback이 영원히 hang하지 못하도록 hard cap과 race. const timeoutPromise = new Promise<never>((_, reject) => { setTimeout(() => reject({ code: "identity/timeout", message: "..." }), 5 * 60 * 1000); }); try { const tokenResult = await Promise.race([ chrome.identity.getAuthToken({ interactive: true }), timeoutPromise, ]); // ...이하 동일 } finally { clearInterval(keepAlive); } } </code></pre></div></div> <code class="language-plaintext highlighter-rouge">chrome.runtime.getPlatformInfo()</code>는 떠올릴 수 있는 가장 가벼운 <code class="language-plaintext highlighter-rouge">chrome.*</code> 호출이다 — side effect가 없고, 20초마다 ping하면 SW가 idle out 안 된다. 5분 timeout은 cancel callback이 진짜 안 올 때 깔끔한 <code class="language-plaintext highlighter-rouge">identity/timeout</code> 에러를 surface한다. 작동했다. 이제 웹앱이 60초가 아니라 5분 후에 spinner를 멈춘다. 5분이 60초보다 어떻게든 더 나쁜 UX다. <h2 id="자기-자신과-race">자기 자신과 race</h2> 코드 리뷰어가 다음 문제를 잡아줬다: Chrome은 익스텐션 service worker의 single event 또는 API request 하나가 ~5분을 넘기면 SW를 <a href="https://developer.chrome.com/docs/extensions/develop/concepts/service-workers/lifecycle">종료한다</a> — 옆에서 <code class="language-plaintext highlighter-rouge">chrome.*</code> 활동이 돌고 있어도 무관하다. 우리 message-handler request는 chooser가 열린 순간부터 <code class="language-plaintext highlighter-rouge">getAuthToken</code>을 await하고 있었고, 그게 in-flight single request였다. 우리 5분 timeout은 정확히 그 cap과 race하고 있었던 거다. Chrome이 SW를 먼저 죽이면 timer가 발화 못 하고, channel이 깔끔하지 않게 닫히고, 우리는 다시 원래의 “channel closed” 에러로 돌아간다. timeout을 60초로 낮췄다. cap과 더 이상 race 안 하지만, 사용자는 여전히 1분 꽉 채워서 기다린 후에야 구조화된 응답을 받는다. keepalive는 자기 일을 하고 있다 — SW를 살려두는 일 — 그걸로 60초 후에 “포기했다”를 전달하기 위해서다. 이 지점에서 우리가 잘못된 layer를 패치하고 있다는 게 명백해졌다. keepalive는 애초에 필요하지 않았어야 할 await를 연장하고 있었다. cancel 신호는 <code class="language-plaintext highlighter-rouge">getAuthToken</code> 안에서 우리가 볼 수 없는 어딘가에 가만히 있었다. <h2 id="올바른-primitive">올바른 primitive</h2> <code class="language-plaintext highlighter-rouge">chrome.identity</code>는 interactive API 두 개를 ship한다: <table> <thead> <tr> <th>API</th> <th>여는 것</th> <th>Cancel 신호</th> </tr> </thead> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">getAuthToken</code></td> <td>Chrome 내부 계정 chooser</td> <td>일부 플랫폼에서 조용히 떨어짐</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code></td> <td>OAuth URL을 가리키는 일반 브라우저 윈도우</td> <td>윈도우 close 시 Promise reject; provider deny 시 redirect URL fragment에 <code class="language-plaintext highlighter-rouge">error=access_denied</code></td> </tr> </tbody> </table> <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>는 진짜 Chromium 윈도우를 연다. 사용자가 X 버튼으로 닫으면 Chrome이 callback을 신뢰성 있게 발화한다 (에러로). close 이벤트를 삼킬 내부 계정 chooser surface가 없다. <code class="language-plaintext highlighter-rouge">getAuthToken</code>의 “토큰 줘, 나머지는 네가 알아서”의 편의는 포기한다 — 대신 OAuth URL을 직접 만들고, <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>에 넘기고, 돌아온 redirect URL fragment를 파싱한다. 코드는 대략 15-20줄 더. 그 대가로 cancel path가 ~즉각적이고 관찰 가능해진다. <h2 id="마이그레이션의-모양">마이그레이션의 모양</h2> 기존 Firebase 연결은 그대로 유지했다. 구체적으로 <code class="language-plaintext highlighter-rouge">response_type=token</code>을 써서 redirect가 <code class="language-plaintext highlighter-rouge">access_token</code>을 fragment에 싣고 오게 했다 — 그걸 <code class="language-plaintext highlighter-rouge">GoogleAuthProvider.credential(null, accessToken)</code>에 그대로 넘긴다. <code class="language-plaintext highlighter-rouge">getAuthToken</code>이 먹이던 그것 그대로. nonce 검증 surface를 도입할 필요가 없다. <div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code>async function signInGoogleOp(): Promise<AuthResponse> { const state = crypto.randomUUID(); const params = new URLSearchParams({ client_id: import.meta.env.VITE_GOOGLE_OAUTH_WEB_CLIENT_ID, redirect_uri: chrome.identity.getRedirectURL(), response_type: "token", scope: "openid email profile", state, prompt: "select_account", }); let responseUrl: string | undefined; try { responseUrl = await chrome.identity.launchWebAuthFlow({ url: `https://accounts.google.com/o/oauth2/v2/auth?${params}`, interactive: true, }); } catch (err) { return { error: { code: "auth/popup-closed-by-user", message: "..." } }; } const fragment = new URLSearchParams(new URL(responseUrl!).hash.slice(1)); // state를 먼저 검증한다 — 다른 필드를 읽기 전에. if (fragment.get("state") !== state) { return { error: { code: "identity/state-mismatch", message: "..." } }; } const oauthError = fragment.get("error"); if (oauthError) { return { error: { code: oauthError === "access_denied" ? "auth/popup-closed-by-user" : "identity/oauth-error", message: fragment.get("error_description") ?? oauthError, }, }; } const accessToken = fragment.get("access_token"); // ...Firebase 블록은 이전과 동일 } </code></pre></div></div> 세 가지 디테일이 짚을 만하다. Cancel을 <code class="language-plaintext highlighter-rouge">auth/popup-closed-by-user</code>로 매핑. Firebase의 popup-based OAuth cancellation 표준 에러 코드다. 웹앱에 이미 그 코드 path가 있었다 — 브라우저 플로우의 <code class="language-plaintext highlighter-rouge">signInWithPopup</code>이 만들어내는 게 그거니까 — 그래서 사용자에게 보이는 “Sign-in cancelled” 카피가 공짜였다. 웹앱 변경 불필요. state를 먼저 검증. 성공/실패 redirect 모두 <code class="language-plaintext highlighter-rouge">state</code>를 echo한다. 다른 fragment field를 읽기 전에 검증하면 downstream 전체가 CSRF check 위에 올라간다. 악의적으로 만든 redirect가 <code class="language-plaintext highlighter-rouge">error=access_denied</code>를 “사용자가 cancel함” 신호로 위장해서 끼워넣지 못한다. keepalive 없음, timeout race 없음. 둘 다 사라졌다. OAuth flow가 이제 prompt를 소유한 <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code> 안에서 resolve / reject되므로, 별도의 SW keepalive와 사용자 정의 timeout으로 lifetime 제약을 우회할 필요가 없다. <h2 id="cloud-console-주의사항">Cloud Console 주의사항</h2> “Chrome App” 타입 OAuth client_id를 <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>에 재사용할 수 없다. Chrome App client는 manifest의 <code class="language-plaintext highlighter-rouge">oauth2</code> field에 묶여 있고 임의의 redirect URI를 받지 않는다. “Web application” client_id가 필요하고, 인가된 redirect URI는 정확히 <code class="language-plaintext highlighter-rouge">https://<EXTENSION_ID>.chromiumapp.org/</code> — trailing slash가 중요하다, <code class="language-plaintext highlighter-rouge">chrome.identity.getRedirectURL()</code>이 돌려주는 게 그것이니까. 익스텐션의 옛 <code class="language-plaintext highlighter-rouge">oauth2</code> manifest field는 dead가 되고 제거된다. (unpacked-dev 워크플로우에 별도 이슈가 하나 있다: 개발자의 로컬 익스텐션 ID는 <code class="language-plaintext highlighter-rouge">key</code> field로 핀하지 않으면 prod ID와 일치하지 않는다. 배포된 웹앱이 prod EXT_ID를 하드코딩한 <code class="language-plaintext highlighter-rouge">runtime.sendMessage</code>로 로컬 SW에 닿게 하려면 이미 <code class="language-plaintext highlighter-rouge">VITE_EXTENSION_KEY</code>로 핀하고 있었다. 이번 마이그레이션 후엔 핀이 <code class="language-plaintext highlighter-rouge">getRedirectURL()</code>의 결과도 Cloud Console에 인가된 URI와 매칭시킨다. 일석이조.) <h2 id="diff는-어땠나">diff는 어땠나</h2> <table> <thead> <tr> <th> </th> <th>getAuthToken path</th> <th>launchWebAuthFlow path</th> </tr> </thead> <tbody> <tr> <td>Cancel 감지</td> <td>일부 플랫폼에서 조용히 떨어짐</td> <td>~1초 안에 Promise reject</td> </tr> <tr> <td>SW keepalive ping</td> <td>필요 (20초마다)</td> <td>없음</td> </tr> <tr> <td>Timeout</td> <td>필요 (처음엔 5분, SW lifetime cap과 race; 이후 60초)</td> <td>없음</td> </tr> <tr> <td>Cancel 시 사용자 대기</td> <td>~60초 후 generic 에러</td> <td>~1초, 구조화된 <code class="language-plaintext highlighter-rouge">auth/popup-closed-by-user</code></td> </tr> <tr> <td>OAuth client 타입</td> <td>Chrome App (manifest <code class="language-plaintext highlighter-rouge">oauth2</code>)</td> <td>Web application (Cloud Console redirect URI)</td> </tr> <tr> <td>State / CSRF 방어</td> <td>없음 (Chrome 내부 처리)</td> <td>명시적, fragment 소비 전 검증</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">signInGoogleOp</code> 라인 수</td> <td>~30 (band-aided)</td> <td>~50 (URL build + fragment parse + error 매핑)</td> </tr> </tbody> </table> 코드 15-20줄 더. timer 저글링 primitive 두 개 사라짐. cancel path UX 60배 개선. <h2 id="교훈">교훈</h2> <ol> <li>플랫폼 quirk band-aid는 다른 플랫폼 quirk와 race한다. keepalive + timeout이 우리가 살려두려고 했던 바로 그 SW lifetime cap과 race하고 있었다. 잘못된 layer를 패치한다는 건 한 플랫폼 제약을 다른 플랫폼 제약으로 바꾸는 것뿐이다.</li> <li>“올바른 primitive”가 “올바른 workaround”를 이긴다. <code class="language-plaintext highlighter-rouge">launchWebAuthFlow</code>가 결정론적인 cancel 신호를 가지고 있다는 걸 보고 나니, <code class="language-plaintext highlighter-rouge">getAuthToken</code> band-aid가 한물간 것처럼 보였다. timer 저글링 30줄은 잘못된 자리에 있던 mass-and-energy였다.</li> <li>OAuth 응답의 다른 필드를 읽기 전에 <code class="language-plaintext highlighter-rouge">state</code>를 검증하라. 성공/실패 redirect 모두 echo한다. 먼저 검사하는 건 공짜 CSRF 방어이고, “취소했는데 앱이 다른 걸 시도했다고 한다” 류의 이상한 보고 카테고리를 통째로 차단한다.</li> <li>MV3 SW lifetime은 구조적 제약이지 tweak 대상이 아니다. 30초 idle 타이머와 ~5분 per-request cap을 아키텍처 입력으로 다뤄라. 디자인이 single request를 5분 넘게 끌어야 한다면, 아마 잘못된 모양이다 — 우리가 그랬듯이.</li> </ol> 답을 찾고 나니 수정은 다시, 놀랍게도 기계적이었다. 가장 어려운 건 band-aid가 거의 작동하고 있던 게 아니란 걸 인정하는 거였다. <hr /> 코드: <a href="https://github.com/zeikar/commentarium-extension">commentarium-extension</a>. 마이그레이션은 커밋 <a href="https://github.com/zeikar/commentarium-extension/commit/7b95db1"><code class="language-plaintext highlighter-rouge">7b95db1</code></a>으로 머지됐다. </article> <article> <h1>Three Agents, One Document: A Claude Code Multi-Agent Doc Pipeline</h1> 2026-05-04T00:00:00+00:00 This is about the agent harness behind <a href="https://github.com/zeikar/backend-interview-guide">backend-interview-guide</a>, a Korean reference covering ~33 documents across database, cloud, system design, and programming. The whole <code class="language-plaintext highlighter-rouge">.claude/</code> setup is small — three agent definitions, one orchestrator skill, one hook — and that’s the point. It’s not impressive because it’s elaborate. It’s reliable because each piece exists to stop a specific failure mode. A reasonable starting point would have been one agent that drafts, self-reviews, and patches the index. That works for one document. It collapses by document ten, because the same agent reviewing its own output drifts into self-agreement, the index gets out of sync silently, and there’s no consistent shape for the orchestrator to branch on. So the harness has three agents — <code class="language-plaintext highlighter-rouge">content-writer</code>, <code class="language-plaintext highlighter-rouge">content-reviewer</code>, <code class="language-plaintext highlighter-rouge">consistency-checker</code> — coordinated by an <code class="language-plaintext highlighter-rouge">interview-guide</code> orchestrator. Three decisions out of that split mattered most. <h2 id="a-writer-shouldnt-review-its-own-work">A writer shouldn’t review its own work</h2> <code class="language-plaintext highlighter-rouge">content-writer</code> and <code class="language-plaintext highlighter-rouge">content-reviewer</code> share no conversational state — the saved file is the handoff boundary. The writer drafts and saves; the orchestrator collects the path; the reviewer reads the file fresh with its own context, with no memory of how the draft was produced. The temptation to merge them is real — one agent could conceivably draft and self-check before returning. It doesn’t work. The writer is attached to its draft. It just produced 800 lines of Markdown; asking it to find what’s wrong is asking it to disagree with itself. In practice it finds nothing, or it finds nits that don’t matter, because critiquing structural choices means admitting the structure was off. The reviewer reads the file as a file. It doesn’t know what was easy or hard to write. It compares against established documents without internal advocacy. The grading rubric is three tiers — <code class="language-plaintext highlighter-rouge">pass</code>, <code class="language-plaintext highlighter-rouge">polish</code>, <code class="language-plaintext highlighter-rouge">rewrite</code> — with this rule on the top one (the harness uses Korean labels in the source; I’m using English equivalents here): <blockquote> pass — no technical errors, no missing tradeoffs, deep enough to handle a “why?” follow-up, same style as the existing documents — all conditions must hold. If the verdict is borderline, grade down. </blockquote> Without that last line, the rubric inflates: the strict ALL-clauses of <code class="language-plaintext highlighter-rouge">pass</code> can be argued away on any single criterion, and an LLM left to pick a verdict on a fence case will tend to pick the kinder one. Pushing tied judgments down — toward <code class="language-plaintext highlighter-rouge">polish</code> (one revision pass) or <code class="language-plaintext highlighter-rouge">rewrite</code> (full rewrite) — forces a real revision cycle instead of letting borderline drafts slip through as publish-ready. The same logic applies to <code class="language-plaintext highlighter-rouge">consistency-checker</code>. The agent definition has this fence written in (translated): <blockquote> Role boundary: fix link and structure issues directly. Report missing content, do not write it — that’s content-writer’s territory. If consistency-checker generates content itself, it bypasses content-writer’s style discovery, AGENTS.md compliance, and interview-fit checks, and unreviewed content with no quality guarantee gets merged in. </blockquote> That’s not documentation. That’s a fence. Without it the consistency checker — which is a competent agent — starts patching missing files because they look like a structure problem. They aren’t. They’re a content problem. Fixing them in the wrong agent skips the writer’s style discovery and the reviewer’s grading, and the harness silently grows documents that nobody graded. The scary failure mode of multi-agent setups isn’t agents disagreeing. It’s agents helpfully crossing role boundaries to “be efficient” — and producing output that no one’s checked. <h2 id="output-contract-the-abi-between-agents">Output Contract: the ABI between agents</h2> The orchestrator branches on the reviewer’s verdict: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IF overall == "rewrite": writer rewrites (max 2 retries) ELIF overall == "polish" AND critical_count > 0: writer applies a single patch (no second review) ELIF overall == "polish": publish; report Enhancement notes only ELIF overall == "pass": publish </code></pre></div></div> The single-patch case skips re-review on purpose: the reviewer already specified the patch concretely, so applying it is a mechanical task rather than a re-judgment, and the SubagentStop hook re-runs the link checker on save to catch any structural breakage. That branch only works if the orchestrator can reliably extract <code class="language-plaintext highlighter-rouge">overall</code> and <code class="language-plaintext highlighter-rouge">critical_count</code> from a free-form review. Telling an agent to “include the grade clearly” isn’t enough — Claude formats it differently every run, sometimes inside a list, sometimes as a section header, sometimes as a sentence. So every agent has an Output Contract. The reviewer’s contract embeds a machine-parseable block in the otherwise human-readable Markdown: <div class="language-md highlighter-rouge"><div class="highlight"><pre class="highlight"><code> </code></pre></div></div> The reviewer fills it in alongside the prose. The orchestrator parses it. The reader skimming <code class="language-plaintext highlighter-rouge">_workspace/{topic}/02_review.md</code> doesn’t see it cluttering the page — HTML comments don’t render. Same idea in the writer’s <code class="language-plaintext highlighter-rouge">Writer Output</code> block (<code class="language-plaintext highlighter-rouge">work type</code>, <code class="language-plaintext highlighter-rouge">target file</code>, <code class="language-plaintext highlighter-rouge">line count</code>, <code class="language-plaintext highlighter-rouge">main sections</code>) and <code class="language-plaintext highlighter-rouge">consistency-checker</code>’s <code class="language-plaintext highlighter-rouge"></code>. This is the boring answer to “how do agents talk to each other”: you give them a machine-parseable side channel and you make filling it in part of the agent’s spec. Not an afterthought, not a postprocess regex over prose. Specified. <h2 id="hooks-catch-the-lies">Hooks catch the lies</h2> Every agent ships with a Self-Verification checklist. The writer’s has nine items: front matter present, anchor links match real headings, terminology consistent, README updated, etc. The agent ticks them off before submitting. The checklist isn’t enough. Agents will report “checked, all valid” while shipping a document with a broken anchor. Not maliciously — they pattern-match the right output and skip the actual verification step, and from their perspective they’ve genuinely confirmed something. The fix is to not trust the report. <code class="language-plaintext highlighter-rouge">.claude/settings.json</code> configures a hook that runs after writer and checker stops: <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ "hooks": { "SubagentStop": [{ "matcher": "content-writer|consistency-checker", "hooks": [{ "type": "command", "command": "python3 scripts/check_markdown_links.py 1>&2 || exit 2", "timeout": 30 }] }] } } </code></pre></div></div> When either of those agents finishes, Claude Code runs the link checker. Exit code 2 hard-fails the agent. The “I verified the links” claim is now policed by a script that actually verified them. The reviewer has no matcher because the reviewer doesn’t write — it has nothing for the script to validate. This is the cheapest reliability win in the whole harness. The hook is fifteen seconds of Python; the bug it prevents is a doc that publishes with a 404. <h2 id="takeaways">Takeaways</h2> <ol> <li>Separate roles by what they’re attached to, not by capability. The writer isn’t dumber than the reviewer. It just owns its draft.</li> <li>If the orchestrator branches on it, parse it. Free-form output for humans is fine. Free-form output for control flow isn’t. Embed a summary block, and make filling it in part of the agent’s spec.</li> <li>Self-verification is a comment. A hook is a contract. A nine-item checklist still ships broken anchors. A fifteen-second script doesn’t.</li> <li>Save the workspace. <code class="language-plaintext highlighter-rouge">_workspace/{topic}/</code> made this post possible. Without it, the only record of how a doc was made would be the doc itself, and that’s not enough to debug or to write about later.</li> </ol> The harness is small. Three agents, one orchestrator skill, one hook. The size isn’t the point. The point is that each piece pays for its complexity by closing a specific failure mode — and the rest of the system stays out of the way. <hr /> Code: <a href="https://github.com/zeikar/backend-interview-guide/tree/main/.claude">.claude/</a>. Project: <a href="https://github.com/zeikar/backend-interview-guide">backend-interview-guide</a>. </article> <article> <h1>에이전트 셋, 문서 하나: Claude Code 멀티 에이전트 문서 파이프라인</h1> 2026-05-04T00:00:00+00:00 <a href="https://github.com/zeikar/backend-interview-guide">backend-interview-guide</a> 프로젝트의 에이전트 하네스 이야기다. database, cloud, system-design, programming 카테고리에 걸쳐 약 33개 문서가 있는 한국어 면접 레퍼런스인데, <code class="language-plaintext highlighter-rouge">.claude/</code> 세팅 자체는 작다 — 에이전트 정의 3개, 오케스트레이터 스킬 하나, hook 하나. 그게 핵심이다. 화려해서 좋은 게 아니라, 각 조각이 특정 실패 모드를 막기 위해 존재하기 때문에 동작이 일관된다. 상식적인 출발점은 한 에이전트가 초안 작성, 자체 리뷰, 인덱스 패치까지 다 하는 거다. 문서 하나일 땐 잘 된다. 문서가 열 개쯤 되면 무너지는데 — 같은 에이전트가 자기 글을 리뷰하면 자기 동의 쪽으로 흐르고, README 인덱스가 조용히 어긋나기 시작하고, 오케스트레이터가 분기할 만한 일관된 출력 형식이 없다. 그래서 하네스는 세 에이전트 — <code class="language-plaintext highlighter-rouge">content-writer</code>, <code class="language-plaintext highlighter-rouge">content-reviewer</code>, <code class="language-plaintext highlighter-rouge">consistency-checker</code> — 를 <code class="language-plaintext highlighter-rouge">interview-guide</code> 오케스트레이터 스킬이 조율하는 구조다. 이 분리에서 가장 중요했던 결정 세 가지를 풀어본다. <h2 id="writer가-자기-글을-리뷰하면-안-된다">writer가 자기 글을 리뷰하면 안 된다</h2> <code class="language-plaintext highlighter-rouge">content-writer</code>와 <code class="language-plaintext highlighter-rouge">content-reviewer</code>는 대화 상태(conversational state)를 공유하지 않는다 — 저장된 파일이 handoff boundary다. writer가 초안을 쓰고 저장하면, 오케스트레이터가 경로를 받아서 reviewer에게 넘기고, reviewer는 그 파일을 자기 컨텍스트로 새로 읽는다. 초안이 어떻게 만들어졌는지에 대한 기억 없이. 하나로 합치고 싶은 유혹이 있다 — 한 에이전트가 초안을 쓴 다음 자체 점검까지 하면 되지 않나? 안 된다. writer는 자기 초안에 attached돼 있다. 방금 800줄짜리 마크다운을 뱉은 상태에서 “잘못된 부분 찾아라”는 건 자기랑 의견을 달리해 보라는 요구다. 실제론 아무것도 못 찾거나, 사소한 nit만 잡는다. 구조적 결정을 비판한다는 건 자기 구조 선택이 틀렸다고 인정하는 거니까. reviewer는 파일을 파일로 읽는다. 작성 과정에서 뭐가 쉬웠고 어려웠는지 모른다. 내부 변호 없이 기존 문서들과 비교한다. 등급 루브릭이 이걸 명시한다 — 세 단계 (<code class="language-plaintext highlighter-rouge">상</code>, <code class="language-plaintext highlighter-rouge">중</code>, <code class="language-plaintext highlighter-rouge">하</code>)에 가장 위 등급은 이렇게 정의돼 있다: <blockquote> 상 (Publish-Ready) — 기술적 오류 없음 / 트레이드오프 누락 없음 / “왜?” 후속 질문에 답할 수 있는 깊이 / 기존 문서와 동일한 스타일 — 모든 조건을 충족. 경계선상이면 낮은 쪽으로 판정한다. </blockquote> 저 마지막 한 줄이 없으면 루브릭은 인플레이션을 일으킨다. <code class="language-plaintext highlighter-rouge">상</code>의 빡빡한 ALL-clauses는 항목 하나만 봐주면 통과시킬 수 있고, fence case에서 LLM은 더 너그러운 쪽을 고르는 경향이 있다. 동률 판정을 아래로 — <code class="language-plaintext highlighter-rouge">중</code> (1회 수정) 또는 <code class="language-plaintext highlighter-rouge">하</code> (전체 재작성) 쪽으로 — 강제로 밀어야 진짜 revision cycle이 돌고, 애매한 초안이 publish-ready로 슬쩍 넘어가지 않는다. 같은 논리가 <code class="language-plaintext highlighter-rouge">consistency-checker</code>에도 적용된다. 에이전트 정의에는 이런 fence가 박혀 있다: <blockquote> 역할 경계: 링크/구조 문제는 직접 수정한다. 콘텐츠 누락은 보고만 한다 (content-writer 영역). consistency-checker가 콘텐츠를 직접 생성하면 content-writer의 스타일 분석, AGENTS.md 준수, 면접 적합성 확보 절차를 우회하게 되어 품질이 보장되지 않는 콘텐츠가 리뷰 없이 추가된다. </blockquote> 저건 문서가 아니라 울타리다. 저 줄이 없으면 — 능력 자체는 충분한 — consistency-checker가 누락 파일을 patch하기 시작한다. 구조 문제처럼 보이기 때문에. 사실은 콘텐츠 문제다. 잘못된 에이전트가 고치면 writer의 스타일 발견 단계와 reviewer의 등급 부여 단계를 건너뛰고, 아무도 검증하지 않은 문서가 조용히 늘어난다. 멀티 에이전트 시스템의 무서운 실패 모드는 에이전트들이 서로 의견 충돌하는 게 아니다. 에이전트들이 “효율적으로” 도와주려고 역할 경계를 친절하게 넘는 거다 — 그리고 아무도 검증 안 한 출력물이 결과로 남는다. <h2 id="output-contract-에이전트-사이의-abi">Output Contract: 에이전트 사이의 ABI</h2> 오케스트레이터는 reviewer의 등급에 따라 분기한다: <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IF overall == "하": writer 재작성 (max 2 retries) ELIF overall == "중" AND critical_count > 0: writer 1회 수정 (재리뷰 없음) ELIF overall == "중": publish; Enhancement 항목만 보고 ELIF overall == "상": publish </code></pre></div></div> 1회 수정 케이스에서 재리뷰를 건너뛰는 건 의도된 거다 — reviewer가 이미 패치를 구체적으로 지정해 놓아서 재판단이 아니라 적용 문제고, SubagentStop hook이 저장 시 링크 체커를 다시 돌려 구조적 깨짐을 잡는다. 이 분기는 오케스트레이터가 자유 형식 리뷰에서 <code class="language-plaintext highlighter-rouge">overall</code>과 <code class="language-plaintext highlighter-rouge">critical_count</code>를 안정적으로 뽑아낼 수 있어야만 작동한다. 에이전트한테 “등급을 명확히 표시하라”고 부탁하는 걸로는 부족하다 — Claude는 매번 다른 형식으로 박는다. 어떤 때는 리스트 안에, 어떤 때는 섹션 헤더로, 어떤 때는 그냥 한 문장으로. 그래서 모든 에이전트에 Output Contract가 있다. reviewer의 contract는 사람이 읽을 수 있는 마크다운 안에 기계 파싱 가능한 블록을 박아둔다: <div class="language-md highlighter-rouge"><div class="highlight"><pre class="highlight"><code> </code></pre></div></div> reviewer는 본문 옆에 이 블록을 채운다. 오케스트레이터는 이걸 파싱한다. <code class="language-plaintext highlighter-rouge">_workspace/{topic}/02_review.md</code>를 사람이 슥 훑으면 이 블록은 안 보인다 — HTML 주석이라 렌더되지 않는다. 같은 아이디어가 writer의 <code class="language-plaintext highlighter-rouge">Writer Output</code> 블록(<code class="language-plaintext highlighter-rouge">작업 유형</code>, <code class="language-plaintext highlighter-rouge">대상 파일</code>, <code class="language-plaintext highlighter-rouge">줄 수</code>, <code class="language-plaintext highlighter-rouge">주요 섹션</code>)과 <code class="language-plaintext highlighter-rouge">consistency-checker</code>의 <code class="language-plaintext highlighter-rouge"></code>에도 있다. “에이전트들끼리 어떻게 통신하나”의 지루한 답은 이거다 — 기계가 파싱할 수 있는 사이드 채널을 주고, 그걸 채우는 걸 에이전트 사양의 일부로 명시한다. 사후 처리도 아니고, 본문에 정규식 돌리는 것도 아니고, 처음부터 spec. <h2 id="hook은-거짓말을-잡는다">Hook은 거짓말을 잡는다</h2> 모든 에이전트는 Self-Verification 체크리스트를 들고 다닌다. writer의 체크리스트는 9개 항목이다 — front matter 존재 여부, 앵커 링크가 실제 헤딩과 일치하는지, 용어 일관성, README 업데이트 여부 등. 에이전트는 제출 전에 하나씩 체크한다. 체크리스트만으론 부족하다. 에이전트는 “확인했음, 모두 valid”라고 보고하면서 깨진 앵커를 가진 문서를 그대로 ship한다. 악의가 있어서가 아니라 — 보고서의 패턴만 맞추고 실제 검증 단계는 건너뛴다. 에이전트 입장에선 진짜로 확인했다고 느낀다. 해법은 보고서를 신뢰하지 않는 거다. <code class="language-plaintext highlighter-rouge">.claude/settings.json</code>은 writer와 checker가 끝날 때마다 도는 hook을 걸어둔다: <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ "hooks": { "SubagentStop": [{ "matcher": "content-writer|consistency-checker", "hooks": [{ "type": "command", "command": "python3 scripts/check_markdown_links.py 1>&2 || exit 2", "timeout": 30 }] }] } } </code></pre></div></div> 저 두 에이전트 중 하나가 끝나면 Claude Code가 링크 체커를 실행한다. exit code 2면 에이전트를 hard-fail시킨다. “링크 검증했음” 주장을 실제로 검증하는 스크립트가 감독한다. reviewer는 matcher에 없다 — reviewer는 글을 안 쓰니까 검증할 게 없다. 이게 하네스에서 가장 가성비 좋은 reliability 장치다. hook 자체는 15초짜리 파이썬 스크립트인데, 막아주는 버그는 404 링크가 박힌 채 publish되는 일이다. <h2 id="takeaways">Takeaways</h2> <ol> <li>역할은 능력이 아니라 attached된 대상으로 분리한다. writer가 reviewer보다 멍청한 게 아니다. 자기 초안에 매여 있을 뿐이다.</li> <li>오케스트레이터가 분기에 쓸 거면, 파싱 가능하게 만들어라. 사람용 자유 형식은 괜찮다. 제어 흐름용은 안 된다. 요약 블록을 박고, 그 블록 채우기를 에이전트 사양에 넣어라.</li> <li>Self-verification은 코멘트다. Hook은 계약이다. 9개짜리 체크리스트도 깨진 앵커를 통과시킨다. 15초짜리 스크립트는 안 통과시킨다.</li> <li>Workspace를 저장하라. <code class="language-plaintext highlighter-rouge">_workspace/{topic}/</code>이 있어서 이 글이 가능했다. 없었다면 문서가 어떻게 만들어졌는지의 유일한 기록은 결과물 자체였을 거고, 그걸로는 디버깅도 회고도 안 된다.</li> </ol> 하네스는 작다. 에이전트 셋, 오케스트레이터 스킬 하나, hook 하나. 크기가 핵심이 아니다. 핵심은 각 조각이 특정 실패 모드를 닫음으로써 자기 복잡도를 정당화하고, 나머지 시스템이 길을 비켜준다는 거다. <hr /> 코드: <a href="https://github.com/zeikar/backend-interview-guide/tree/main/.claude">.claude/</a>. 프로젝트: <a href="https://github.com/zeikar/backend-interview-guide">backend-interview-guide</a>. </article> </main></body></html>

Zeikar’s Lab

How My Agent-Team Revise Loop Earned a 300-Line Protocol

The naive loop has more failure modes than lines

#1 — The plain-text reply is invisible

#2 — During a five-minute Codex review, anything can show up in the mailbox

#3 — idle.timestamp vs. solicit_sent_at, the 1-round-lag race

#4 — assistant-turn-start is not a substitute for date -u right before send

#5 — expected_request_id == null collapse

What this protocol actually is

내 agent-team 리바이즈 루프가 300줄짜리 프로토콜을 갖게 된 이유

나이브한 루프는 줄 수보다 실패 모드가 더 많다

#1 — plain-text 답장은 보이지 않는다

#2 — Codex 리뷰가 5분 도는 동안엔 mailbox에 뭐든지 들어올 수 있다

#3 — idle.timestamp vs. solicit_sent_at, 1-round-lag race

#4 — assistant-turn-start는 send 직전 date -u의 대용품이 아니다

#5 — expected_request_id == null collapse

이 프로토콜의 정체

Three ways to generate Open Graph images, and the one I built

Stage 1: Hand-designing OG images in Figma

#3 — `idle.timestamp` vs. `solicit_sent_at`, the 1-round-lag race

#4 — `assistant-turn-start` is not a substitute for `date -u` right before send

#5 — `expected_request_id == null` collapse

#3 — `idle.timestamp` vs. `solicit_sent_at`, 1-round-lag race

#4 — `assistant-turn-start`는 send 직전 `date -u`의 대용품이 아니다

#5 — `expected_request_id == null` collapse