e2e codex skill w/ chrome-devtools-mcp

chrome-devtools-mcp 기반 e2e 테스트 및 QA 가이드 생성하는 Codex Skill

Table of Contents

요즘 구현 끝낸 뒤 chrome-devtools-mcp로 e2e 테스트를 진행하고 있다.

/clarify
chrome-devtools-mcp 를 이용한 e2e 테스트 계획을 세우자.
대상은 refs/origin/develop 대비 현재 브랜치에서 변경된 모든 커밋/변경사항이야.

어떤 방향으로 무엇(UI)을 테스트해야하고, 그것이 어떤 결과를 가져야 하는지 먼저 happy-path 를 구성해보자.

진행 시 interrupts req + mock data 이용하는데, 절대로 기존 서버 req/res 구조를 변경해서는 안돼.
서버 API 문서는 https://.../ 여기를 참고할 수 있어.
또한 절대로 임의 추측/판단하지 말고, 반드시 실제 데이터/코드/문서/조사결과만을 바탕으로 진행하자.

이 claude-code 프롬프트에서 시작해 지금은 codex 로 아래와 같이 스킬을 구성했다. ctx가 1M 이기도 하고, 5h/1w 한도가 널널해서 codex로 하기로 했다.

diff-aware-web-e2e
---
name: diff-aware-web-e2e
description: Plan all impacted web E2E paths for current branch changes against a user-provided base ref, then execute only the user-selected paths with chrome-devtools-mcp. By default, use code-derived page-level request and response interception plus mock data unless the user asks for real API behavior. Use when the user wants evidence-based UI test planning and execution without changing server request or response contracts.
---

# Diff-Aware Web E2E

Use this skill when the user wants to turn current branch changes into concrete, evidence-based web E2E checks.

## What This Skill Does

- Reads `<base_ref>...HEAD` changes to find impacted UI areas.
- Produces a plan-level user product intent summary before scenario planning output.
- Builds a full scenario inventory covering directly impacted user paths plus diff-backed edge-case and regression-focus checks.
- Derives request and response shapes from code and docs before planning default interception and mock data.
- States the planned mock target request, mock target response, and mock verification approach in plan output when API behavior matters.
- Writes scenarios with step-by-step actions and step-by-step expected UI states.
- Shows the full planned inventory, marks a recommended set with reasons, then asks the user which scenario IDs or recommended set to execute.
- Optionally executes only the selected scenarios with `chrome-devtools-mcp`.
- Automatically cleans up run-owned Chrome and `chrome-devtools-mcp` OS processes before any relaunch, after execution work, and after QA handoff markdown writing.
- Renders QA handoff screenshots inline with Markdown image syntax `![]()` instead of plain file links.
- Reports only evidence-backed results.

## Default Mode

Start in **plan-only** mode and do not execute until the user chooses which planned scenarios to run.

## Inputs

Collect only what is missing:

- `mode`: `plan` or `execute`
- `base_ref`: required; compare `<base_ref>...HEAD`
- `change_focus`: optional user concern or priority area
- `target_area`: optional specific feature or page
- `api_docs_url`: optional; use when provided
- `mock_mode`: `auto` (default; use interception plus mock data unless the user says otherwise), `required`, or `off`

## Core Rules

- Never invent affected pages, API behavior, or expected results.
- If `base_ref` is missing, ask the user for it and stop until it is provided.
- Before the scenario inventory, provide a plan-level user product intent summary.
- Build the user product intent summary from user input first, then supplement with defensible diff or related code evidence when needed.
- Structure the user product intent summary using `Confirmed` and `Inferred`.
- If product intent remains partially unclear, include `Open Questions` or `Unclear Intent` and continue planning when the scenarios are still defensible.
- Keep the user product intent summary informative only; do not change scenario priority or recommended-set logic just because of it.
- Derive every scenario from at least one concrete source:
  - changed code
  - tests or stories
  - API docs
  - observed browser or network evidence
- Unless the user explicitly asks for real API behavior or `mock_mode: off`, treat chrome-devtools-mcp-based page-level request and response interception plus mock data as the default strategy.
- Derive mock request and response shapes from code and docs before planning or executing page-level interception and mocking.
- Plan the full impacted scenario inventory before suggesting execution.
- Cover directly impacted user paths plus same-screen or same-flow paths, and include diff-backed edge-case and regression-focus scenarios when they are defensibly tied to the change.
- Expand each planned path until a clear completion state is reached.
- Include step-by-step user actions and step-by-step expected UI states.
- When API behavior matters, say which request and response will be intercepted or mocked and how mock verification will be checked for each scenario.
- Provide scenario IDs, priority, and a recommended set with reasons for planned paths.
- Never auto-select or auto-execute the recommended set.
- Do not change server request or response contracts.
- Before any run-owned Chrome or `chrome-devtools-mcp` launch, briefly say that this skill will automatically remove the run-owned browser and MCP processes after work.
- Cleanup targets are limited to run-owned Chrome, remote-debugging Chrome, and `chrome-devtools-mcp` OS processes started during the current chat and current skill invocation.
- Before launching a replacement run-owned browser or MCP for recovery or QA capture work, clean up the current run-owned processes first.
- After execution ends in `pass`, `fail`, or `block`, clean up run-owned Chrome and `chrome-devtools-mcp` OS processes before any follow-up result or QA handoff message.
- If QA handoff writing needs additional captures, a new run-owned launch is allowed only after the current run-owned processes are cleaned up first.
- After the QA handoff markdown file is written, clean up any run-owned Chrome and `chrome-devtools-mcp` OS processes before the follow-up message.
- Treat cleanup as complete only when the owned OS processes are actually gone.
- Briefly report cleanup success. If cleanup fails, briefly report that the run-owned browser or MCP process may still remain, then continue.
- Never terminate `chrome-devtools-mcp`, Chrome, or remote-debugging processes that this run did not start.
- If evidence is insufficient, ask or stop.

## Evidence Order

1. Changed code and tests
2. Provided API docs
3. Runtime DOM or network evidence
4. User confirmation

See [evidence-rules.md](references/evidence-rules.md).

## Planning Workflow

1. If `base_ref` is missing, ask the user for it and stop.
2. Read the diff against `<base_ref>...HEAD`.
3. Find directly impacted UI entry points and related routes.
4. Read only the minimal supporting code, tests, and docs needed to map the full impacted scenario inventory.
5. Derive request and response shapes from code first by tracing the changed UI trigger, API caller, request builder, shared client, and response consumer.
6. If `api_docs_url` is provided, inspect it to confirm endpoint purpose and response shapes.
7. Before the scenario inventory, produce a plan-level user product intent summary that includes:
   - `Confirmed`: user-stated intent or intent made explicit in provided product context
   - `Inferred`: defensible intent inferred from the diff or related code
   - `Open Questions` or `Unclear Intent` when intent remains partially unresolved
   - brief evidence notes showing why each item is defensible
8. For each scenario, produce:
   - scenario ID
   - coverage relation: `direct impact`, `same-screen branch`, or `same-flow regression`
   - scenario objective: `primary`, `edge-case`, or `regression-focus`
   - target UI or flow
   - why it is tied to the diff
   - ordered user actions
   - step-by-step expected UI states
   - request and response derivation evidence when API behavior matters
   - default execution strategy: mocked or real API fallback
   - mock target request and response when API behavior matters
   - mock verification plan
   - injection approach only when it is needed to explain feasibility
   - priority
   - recommended-set status and reason
9. Show the user product intent summary, show the full scenario inventory, show the recommended set, and ask the user which scenario IDs or recommended set should move to execution.

See [planning-rules.md](references/planning-rules.md).

## Execution Workflow

Use this only after the user chooses which planned scenario IDs or recommended set to execute.

1. Before any run-owned launch, briefly say that this skill will automatically remove the run-owned browser and MCP processes after work.
2. If this same chat and skill invocation already owns run-owned Chrome or `chrome-devtools-mcp` OS processes from a prior attempt, clean them up before starting a replacement launch.
3. Start a run-owned isolated Chrome and `chrome-devtools-mcp` context.
4. Check that the current MCP runtime supports isolated execution, timeout tuning, and log capture. If that is clearly missing, report `block` with the missing preconditions.
5. Do not attach cleanup behavior to any Chrome or MCP process not started by this run.
6. Load the planned request and response derivation evidence together with the planned mock targets and verification checks.
7. Unless the user opted out or the selected scenario was explicitly planned as a real API fallback, apply the planned page-level request and response interception and mock data with `initScript` or `evaluate_script` using only code-derived or doc-derived payload shapes.
8. If needed, navigate to the selected target path.
9. Wait for stable UI evidence before judging results.
10. Use snapshots for structure and screenshots for reporting.
11. Inspect console and network activity for corroborating evidence.
12. On connection failure, retry in-place, then clean up the current run-owned browser and MCP processes, then start a replacement isolated run-owned instance, then collect logs and report `block` if recovery still fails.
13. Determine `pass`, `fail`, or `block`.
14. Clean up the run-owned Chrome and `chrome-devtools-mcp` OS processes before any follow-up result or QA handoff message.
15. Briefly report the selected path result together with the cleanup status.

Use these `chrome-devtools-mcp` capabilities when relevant:

- `list_pages`
- `select_page`
- `new_page`
- `navigate_page`
- `wait_for`
- `take_snapshot`
- `take_screenshot`
- `list_network_requests`
- `get_network_request`
- `list_console_messages`
- `evaluate_script`

See [execution-rules.md](references/execution-rules.md).

## Mocking Policy

- Unless the user explicitly asks for real API behavior or `mock_mode: off`, treat chrome-devtools-mcp-based page-level request and response interception plus mock data as the default strategy.
- Planning should assume mocked execution first and describe the target request, target response, and mock verification approach for each scenario when API behavior matters.
- Use code-first request and response derivation before deciding whether page-level mocking is safe.
- First check whether page-level mocking is sufficient.
- Page-level mocking may use `initScript` or `evaluate_script` to patch `fetch` or `XMLHttpRequest`.
- For mocked flows, do not require a real network request as mandatory evidence.
- Verify that mocking actually took effect using observable DOM evidence, explicit mock-hit evidence, or both.
- Do not claim this is full browser-level interception.
- If page-level mocking is unreliable or the scenario needs broader interception than it can safely cover, fall back to real API behavior or report `block`.

## Output Format

### Plan Mode

- User product intent summary shown before the scenario inventory
- `Confirmed`, `Inferred`, and `Open Questions` or `Unclear Intent` when needed
- Evidence notes for the user product intent summary
- Full impacted scenario inventory
- Scenario ID, coverage relation, scenario objective, priority, and recommended-set status for each scenario
- Recommended set with a short reason for each included scenario
- Evidence for each scenario
- Step-by-step actions
- Step-by-step expected UI states
- Request and response derivation evidence when API behavior matters
- Default execution strategy
- Mock target request and response when API behavior matters
- Mock verification plan
- Injection approach only when it is needed to explain feasibility
- A final clarify step asking which scenario IDs or recommended set should move to execution

### Execute Mode

- Selected path result: `pass`, `fail`, or `block`
- Run-owned cleanup status note
- Screenshot
- Brief evidence summary
- Relevant network notes, or mock-hit notes for mocked flows
- Mock verification notes
- Recovery notes if connection handling was needed
- A concise summary of the execution steps used to get to the result

## Completion Wrap-Up

- After all planned or selected execution work is complete, summarize the execution steps used:
  - diff basis
  - path selection logic
  - request and response derivation basis
  - execution setup
  - mocking approach
  - evidence collected
  - blockers or recoveries
- If execution was performed, always ask whether to create a concise QA handoff document in Korean aimed at QA or planners.
- If the user says yes, do not draft the QA handoff document yet.
- First produce a concise QA handoff plan and ask the user to approve that plan before writing any file.
- The QA handoff plan must include:
  - proposed `.md` save path with a single recommended location for user confirmation
  - intended audience
  - document section outline
  - scenario coverage to include
  - existing screenshots that are good enough to reuse
  - additional screenshots that must be captured or recaptured
  - device context for each screenshot: desktop or mobile
  - why each screenshot is needed for fast QA understanding
- Do not draft the QA handoff document unless the user approves that QA handoff plan.
- After the user approves the QA handoff plan, write the QA handoff as a `.md` file.
- If QA handoff writing needs additional captures, start a new run-owned Chrome and `chrome-devtools-mcp` context only after cleaning up any current run-owned browser or MCP processes from this same chat and skill invocation.
- After the QA handoff markdown file is written, clean up any run-owned Chrome and `chrome-devtools-mcp` OS processes before the follow-up message, and briefly report the cleanup status.
- If the user asks for the QA handoff document, include:
  - write it in Korean for QA or planning audiences
  - scenario ID and title
  - goal
  - scope or covered user path in product terms
  - setup and mock strategy in audience-friendly terms
  - steps
  - expected UI
  - actual evidence
  - inline screenshots rendered with Markdown image syntax `![]()` and short captions, using element-focused captures with minimal surrounding noise when possible
  - network or mock verification notes
  - blockers or open risks
- Exclude development implementation details, code-level explanations, internal reasoning, and backend contract discussion unless the audience explicitly asks for them.
- Do not use plain file links as the primary screenshot presentation in QA handoff markdown.
- Use full-page screenshots only when the element-focused capture would hide necessary product context.
- Prefer screenshots that center the changed UI with only the surrounding context needed to understand the state.
- Match screenshot device context to the product path being documented and say which captures are desktop or mobile.
- If the existing screenshots are too broad, show the wrong device context, or do not make the changed UI easy to understand, take additional screenshots before drafting the QA handoff.

## When To Stop And Ask

- `base_ref` is missing
- No reliable UI candidate can be tied to the diff
- A completion path cannot be justified from code, docs, or observed behavior
- Request or response shapes cannot be derived from code or docs without guessing
- Code-derived request or response shapes conflict with observed runtime evidence
- The planned scenario inventory is ready and user path selection is required
- Mocking is required but safe scope is unclear
- Authentication or setup prevents reliable execution
- The user approved QA handoff creation and the QA handoff plan still needs approval
- The proposed QA handoff save path needs user confirmation
- Existing screenshots are not sufficient and additional capture scope still needs confirmation

## Non-Goals

- Branch-to-branch visual diff systems
- App-wide route crawling unrelated to directly impacted paths
- Full browser-level request interception guarantees
- Mobile WebView-specific flows
- Changing backend contracts to make tests easier
- Killing externally managed Chrome or MCP processes

diff base 전달 - E2E 테스트 계획 - 시나리오 수립 - e2e 테스트 실행(ralph-loop) - QA 가이드 제안 - QA 가이드 작성(옵션, ralph-loop) 이런 흐름으로 진행한다. diff 사이즈에 따라 다르지만, 시나리오 수립 후 실행하면 opus-4.6(thinking)/gpt-5.4(xhigh+fast) 기준 약 15~40분 정도 작업을 수행한다. 실행 단계에서는 끝까지 수행할 수 있도록 ralph-loop 를 잘 확용하면 좋다.

그런데 따로 말해주지 않으면 claude-code 대비 findings(scenarios) 를 조금 덜 찾아준다… 프롬프트를 좀 조정해 세부적으로 시나리오를 제공하도록 했다.

chrome-devtools-mcp는 chrome bin 을 이용한다. 그래서 그런가 조금 자주 실패하기에(특히 transport closed), 실패 시 재시도 - 현재 run-owned 정리 - replacement launch - 그래도 안 되면 로그 남기고 block & 알림 이 순서로 상황 접근하도록 했다. 참고로 transport closed 는 단순히 mcp-브라우저 통신이 닫힌 상황.

그래서 병렬 실행을 위해서도 조금 손봐야 한다. Shared browser 가 아닌 isolated launch 혹은 그에 준하는 ownership model을 사용하도록 해서 환경 자체를 분리해야 한다. 실행 끝나면 run-owned browser/MCP 정리까지 같이 가져가도록 한 건 그 연장선.

wait_for는 페이지 진입 후 콘텐츠가 바로 보이지 않을 수 있기에(가령 SPA) 구성해줬다. Puppeteer 에도 이런 API가 있다.

req/res는 기본적으로 가로채서 mocking한다. 특히 로그인 필요한 기능이 많아서 종종 핸드오프되기에… 계정(토큰)을 .env나 직접 전달하는건 조금 이상하기도 하고. 물론 이것도 계획에서 사용자에게 물어본다.

재밌었던 점은 QA 가이드를 상당히 잘 만들어 준다는 점. QA 뿐만 아니라 기획자 등 비개발자에게 관련 내용을 전달할 때 매우 편했다. 어떤 점이 어떻게 변경되었고, 이를 어떻게 테스트할 수 있는지와 같은… 물론 이 역시 프롬프트에 존재한다. Execution 끝나면 QA 가이드 만들어줄지 물어보고, Yes 하면 세부적으로 계획을 세운다. 목적, 예상 범위, 예상 스크린샷 중심으로 말해주고, 스크린샷도 markdown inline image로 정리하게 했다.

references 디렉터리를 이용해 planning+evidence, execute 시 어떻게 접근해야 하는지도 문서화했다.

Planning Rules
# Planning Rules

## Scope

Plan the full impacted scenario inventory for changes in the current branch against `<base_ref>...HEAD`.
Before the scenario inventory, provide a plan-level user product intent summary.
Unless the user explicitly asks for real API behavior or `mock_mode: off`, planning should assume page-level request and response interception plus mock data as the default execution strategy.

## User Product Intent Summary

Create a single plan-level summary before the scenario list.

Use these fields:

- `Confirmed`: intent explicitly stated by the user or made explicit in provided product context
- `Inferred`: defensible intent inferred from the diff or closely related code
- `Open Questions` or `Unclear Intent`: unresolved gaps that do not block a defensible scenario plan

Intent evidence should prefer user input first and use diff or related code only as supporting evidence.
Do not change scenario priority or recommended-set logic based on this summary alone.

## Candidate UI Signals

Use these signals to infer impacted UI:

- route or page files
- router config
- changed components imported by pages
- button, heading, link, or `data-testid` strings
- tests, stories, or Playwright specs

## Scenario Taxonomy

Every scenario must include both labels:

- `coverage relation`: `direct impact`, `same-screen branch`, or `same-flow regression`
- `scenario objective`: `primary`, `edge-case`, or `regression-focus`

Use defensible pairings:

- `primary` usually covers the main changed path and commonly pairs with `direct impact`
- `edge-case` covers alternate, boundary, empty, error, or validation states that are defensibly tied to the diff and commonly pairs with `direct impact` or `same-screen branch`
- `regression-focus` covers behavior that should remain stable around the changed path and commonly pairs with `same-screen branch` or `same-flow regression`

If a scenario needs an unusual pairing, explain why that pairing is justified by the diff or supporting code.

## Code-Based Request And Response Derivation

When API behavior matters, derive request and response shapes from code first:

1. identify the changed UI trigger
2. find the action handler or event path
3. trace the API caller or shared client
4. inspect request builders, params, payload keys, and headers
5. inspect response consumers, branch conditions, and rendered UI states
6. use API docs only to confirm or supplement what code already supports
7. use runtime network evidence only to verify or compare against the code-derived understanding

If code and docs do not support a request or response shape, do not invent one.

## Scenario Construction

For each scenario, include:

- scenario ID
- coverage relation
- scenario objective
- target page or flow
- changed code evidence proving why the scenario is tied to the diff
- start point
- completion condition
- step-by-step user actions
- step-by-step expected UI states
- request and response derivation evidence when API behavior matters
- default execution strategy: mocked or real API fallback
- mock target request and response when API behavior matters
- mock verification plan
- injection approach only when it is needed to explain feasibility
- priority
- recommended-set status and reason
- confidence and any assumptions that still need user confirmation

## Path Expansion

- Expand each directly impacted path until the user reaches a clear completion state.
- Do not stop at the first changed screen if the affected flow continues.
- Include same-screen branches and same-flow regression checks when they are defensibly tied to the changed path.
- Include separate diff-backed `edge-case` and `regression-focus` scenarios when the changed path exposes them.
- Do not broaden into unrelated feature-wide regression coverage.

## Documentation Use

- If `api_docs_url` is available, use it to confirm endpoint purpose and response shape after tracing the code path.
- If no docs are available, rely on code first and runtime evidence only as supporting verification.
- If neither code nor docs support an expected API outcome, do not invent one.

## Question Policy

Ask only when a missing fact blocks a defensible plan.

If `base_ref` is missing, ask for it before reading the diff.

If request or response shapes cannot be derived from code or docs without guessing, ask before planning mocks.

If code-derived request or response shapes conflict with runtime evidence in a way that changes the scenario expectation, ask before continuing.

If mocked execution looks unsafe or unsupported for the scenario and no defensible real API fallback is available, ask or plan the scenario as `block`.

If user product intent is only partially clear but the impacted scenarios are still defensible, continue planning and record the gaps under `Open Questions` or `Unclear Intent`.

After presenting the full planned scenario inventory and recommended set, ask the user which scenario IDs or recommended set should move to execution.
Evidence Rules
# Evidence Rules

## Allowed Evidence

- changed source files
- tests and stories
- official API docs when provided
- observed DOM state
- observed network requests and responses
- observed console output
- explicit mock-hit markers exposed by the patched page layer

## Path Coverage Standard

- The planning result should account for all directly impacted user paths that can be justified from the diff and supporting code.
- The planning result should also account for same-screen branch and same-flow regression scenarios when they are defensibly tied to those directly impacted paths.
- The planning result should include separate diff-backed `edge-case` and `regression-focus` scenarios when the changed path exposes them.
- Each planned path should explain why it is tied to the change and why its coverage relation and scenario objective are justified.
- When API behavior matters, each planned path should also say which request and response will be mocked by default and how mock confirmation will be checked.

## Disallowed Behavior

- inventing routes or user flows
- inventing response fields not supported by code or docs
- inventing mock payloads not supported by code or docs
- claiming pass or fail without a visible or observable signal
- changing backend request or response contracts
- claiming mocking worked without observable confirmation
- requiring a real network request as mandatory proof for a page-level mocked flow

## Reporting Standard

Each conclusion should be traceable to one or more concrete observations.

Each step-level expected UI state should be traceable to code, docs, tests, or observed behavior.

When planning a mocked flow, the report should name the intended mock target request, intended mock target response, and mock verification approach in concise scenario-level terms.

For page-level mocked flows, acceptable confirmation may come from DOM state, explicit mock-hit markers, console markers, or a real network request when one still occurs.

For QA handoff screenshots, prefer relevant element-focused captures with minimal surrounding noise. Use broader page captures only when the focused capture would hide necessary product context.

For QA handoff screenshots, the chosen capture should make the changed UI easy to understand quickly for QA readers, not just prove that the page existed.

For QA handoff screenshots, match the capture to the documented device context, including desktop versus mobile.

If an existing screenshot is too broad, lacks the changed UI focus, or shows the wrong device context, treat it as insufficient and capture a better one before drafting the QA handoff.

If code-derived request or response shapes conflict with runtime evidence, say so and stop or report `block` instead of guessing.

The final report should also include a short execution-step summary describing how the result was reached.

If confidence is low, say so and explain what evidence is missing.
Execution Rules
# Execution Rules

## Browser Strategy

- Each execution owns its own isolated Chrome and `chrome-devtools-mcp` context.
- Never terminate Chrome, remote-debugging Chrome, or `chrome-devtools-mcp` processes that this run did not start.
- If the user needs to log in manually inside the isolated run-owned browser, pause and resume after the handoff.

## Run-Owned Cleanup

- Before any run-owned Chrome or `chrome-devtools-mcp` launch, briefly tell the user that this skill will automatically remove the run-owned browser and MCP processes after work.
- Cleanup scope is limited to run-owned Chrome, remote-debugging Chrome, and `chrome-devtools-mcp` OS processes started in the current chat and current skill invocation.
- Before any replacement launch for recovery or QA captures, clean up the current run-owned browser and MCP processes first.
- Attempt cleanup on every execution end state: `pass`, `fail`, or `block`.
- After the QA handoff markdown file is written, clean up any remaining run-owned browser or MCP OS processes before the follow-up message.
- Treat cleanup as complete only when the owned OS processes are actually gone.
- On cleanup success, say briefly that the run-owned browser and MCP processes were removed.
- On cleanup failure, say briefly that cleanup failed and the run-owned browser or MCP process may still remain, then continue.

## Preflight

- Before execution, verify that the active MCP runtime is configured for isolated launches or another equally safe ownership model.
- If the runtime clearly lacks isolation, timeout tuning, or log capture, report `block` and name the missing preconditions.

## MCP Tool Usage

- Use `take_snapshot` to inspect page structure before acting.
- Use `wait_for` for stable UI evidence instead of fixed sleeps when possible.
- Use `take_screenshot` for final reporting artifacts.
- Use `list_network_requests` and `get_network_request` to confirm real API activity or compare against the code-derived understanding when real requests occur.
- Use `list_console_messages` to spot frontend regressions or collect explicit mock-hit markers when available.
- For QA handoff screenshots, prefer element-focused captures with minimal surrounding noise and use full-page captures only when broader product context is necessary.
- For QA handoff markdown, render screenshots inline with Markdown image syntax `![]()` instead of plain file links.
- Before drafting a QA handoff document, review whether the existing screenshots clearly show the changed UI state for the intended audience.
- If an existing screenshot is too broad, hides the changed UI, or uses the wrong device context, recapture it.
- Match the screenshot viewport and framing to the documented product context, including desktop versus mobile.

## Connection Recovery

- On `transport closed` or similar MCP connection failures, retry once or twice in the current run-owned context.
- If retry fails, clean up the current run-owned browser and MCP processes, then recreate the isolated run-owned Chrome and MCP context and continue.
- If recovery still fails, enable log capture, preserve the failure evidence, and report `block`.
- Prefer explicit logging and timeout tuning over silent retries.
- When the client configuration allows it, raise `startup_timeout_ms` and capture `--log-file` output for failed runs.
- Prefer isolated run-owned launches over auto-connecting to external browsers for parallel execution.

## Result Labels

- `pass`: expected UI and supporting evidence match
- `fail`: expected UI or supporting evidence clearly diverge
- `block`: missing auth, missing data, unsupported mocking, insufficient evidence, or unresolved code versus runtime conflicts

## Mocking

- Unless the user explicitly asks for real API behavior or `mock_mode: off`, request and response interception with mock data is the default strategy.
- Follow the planned mock targets and mock verification checks from plan mode unless runtime evidence forces a safer fallback.
- Derive request and response shapes from code and docs before writing any mock payload.
- Page-level mocking may patch `fetch` and `XMLHttpRequest`.
- For mocked flows, a real network request is optional evidence, not mandatory evidence.
- Confirm that mocking took effect with observable DOM evidence, explicit mock-hit evidence, or both.
- Treat explicit mock-hit evidence as something the patched layer exposes and can be checked with the page or console state.
- Do not present page-level mocking as complete interception coverage.
- If the flow depends on navigation requests, service workers, or subresource control, treat that as unsupported unless the project already provides a safe mechanism.
- If the planned mock target cannot be intercepted safely at the page level, fall back to real API behavior only when the selected scenario still stays evidence-based; otherwise report `block`.
- If request or response shapes cannot be derived without guessing, stop and ask instead of inventing payloads.
- If code-derived request or response shapes conflict with observed runtime behavior in a way that changes the selected scenario, stop and ask or report `block`.

implicit invocation 시 어떤 prompt로 진입할지, 그리고 어떤 순서로 정리할지 명시된 파일. base_ref 요구, full scenario inventory, cleanup, QA handoff plan 선행, inline screenshot 규칙까지 이 파일에 같이 녹여뒀다.

openai.yaml
interface:
  display_name: 'Diff-Aware Web E2E'
  short_description: 'Plan all impacted paths, then execute chosen ones'
  default_prompt: 'Use $diff-aware-web-e2e to ask me for the required base_ref, plan the full impacted web E2E scenario inventory for current branch changes against <base_ref>...HEAD, derive request and response shapes from code before proposing mocks, include diff-backed edge-case and regression-focus scenarios, recommend a scenario set, then help me choose which scenarios to execute. Before any run-owned Chrome or chrome-devtools-mcp launch, briefly say that this skill will automatically remove its run-owned browser and MCP processes after work. If a relaunch is needed for recovery or QA captures, clean up any current run-owned browser and MCP processes first. After execution finishes in any result state, clean up the run-owned browser and MCP before follow-up messages or QA handoff questions. If I say yes to QA handoff, do not draft it yet. First propose a concise QA handoff plan with a recommended markdown save path, section outline, screenshot reuse versus recapture plan, and desktop or mobile capture context, then wait for my approval before writing the file. When you write the QA markdown file, render screenshots inline with Markdown image syntax ![]() instead of plain file links, and after the file is written, clean up any run-owned browser and MCP processes again before the follow-up message.'

policy:
  allow_implicit_invocation: true

이 스킬 만들 때는 나오지 않았었는데, Codex 도 3월 28일부터인가 Marketplace 를 지원해서 이쪽으로 이동해야 한다…

혼자 사용하면 재미없고 발전이 없기에 팀 ai 활용사례? 리포지토리에도 올렸다. 가이드와 함께 설치/검증 스크립트를 제공했다.

Installation
#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
SOURCE_DIR="${REPO_ROOT}/skills/diff-aware-web-e2e"
TARGET_DIR="${HOME}/.codex/skills/diff-aware-web-e2e"

if [[ ! -f "${SOURCE_DIR}/SKILL.md" ]]
then
  echo "Error: source skill file not found: ${SOURCE_DIR}/SKILL.md" >&2
  exit 1
fi

mkdir -p "$(dirname "${TARGET_DIR}")"

if [[ -d "${TARGET_DIR}" ]]
then
  BACKUP_DIR="$(mktemp -d "${TARGET_DIR}.bak.XXXXXX")"
  rmdir "${BACKUP_DIR}"
  mv "${TARGET_DIR}" "${BACKUP_DIR}"
  echo "Backed up existing skill directory: ${BACKUP_DIR}"
fi

cp -R "${SOURCE_DIR}" "${TARGET_DIR}"

echo "Installed diff-aware-web-e2e skill: ${TARGET_DIR}"
echo "Next: run ./scripts/verify-diff-aware-web-e2e-skill.sh"
Verification
#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
SOURCE_DIR="${REPO_ROOT}/skills/diff-aware-web-e2e"
TARGET_DIR="${HOME}/.codex/skills/diff-aware-web-e2e"

hash_file() {
  local file_path
  file_path="$1"

  if command -v shasum >/dev/null 2>&1
  then
    shasum -a 256 "${file_path}" | awk '{print $1}'
    return
  fi

  if command -v sha256sum >/dev/null 2>&1
  then
    sha256sum "${file_path}" | awk '{print $1}'
    return
  fi

  echo "Error: neither shasum nor sha256sum is available." >&2
  exit 1
}

hash_stdin() {
  if command -v shasum >/dev/null 2>&1
  then
    shasum -a 256 | awk '{print $1}'
    return
  fi

  if command -v sha256sum >/dev/null 2>&1
  then
    sha256sum | awk '{print $1}'
    return
  fi

  echo "Error: neither shasum nor sha256sum is available." >&2
  exit 1
}

dir_manifest_hash() {
  local dir_path
  dir_path="$1"

  (
    cd "${dir_path}"
    find . -type f | LC_ALL=C sort | while read -r relative_path
    do
      local_hash="$(hash_file "${dir_path}/${relative_path#./}")"
      printf '%s  %s\n' "${local_hash}" "${relative_path}"
    done
  ) | hash_stdin
}

if [[ ! -f "${SOURCE_DIR}/SKILL.md" ]]
then
  echo "Error: source skill file not found: ${SOURCE_DIR}/SKILL.md" >&2
  exit 1
fi

if [[ ! -d "${TARGET_DIR}" ]]
then
  echo "Error: target skill directory not found: ${TARGET_DIR}" >&2
  echo "Run: ./scripts/install-diff-aware-web-e2e-skill.sh"
  exit 1
fi

SOURCE_HASH="$(dir_manifest_hash "${SOURCE_DIR}")"
TARGET_HASH="$(dir_manifest_hash "${TARGET_DIR}")"

echo "Source: ${SOURCE_DIR}"
echo "Target: ${TARGET_DIR}"
echo "Source manifest SHA-256: ${SOURCE_HASH}"
echo "Target manifest SHA-256: ${TARGET_HASH}"

if [[ "${SOURCE_HASH}" == "${TARGET_HASH}" ]]
then
  echo "Status: synchronized"
  echo "Standard usage: \$diff-aware-web-e2e <your request>"
  exit 0
fi

echo "Status: mismatch"
echo "Run: ./scripts/install-diff-aware-web-e2e-skill.sh"
exit 2