e2e skill w/ chrome-devtools-mcp

chrome-devtools-mcp 활용한 e2e 테스트 및 QA 가이드 생성하는 Skill

Table of Contents

요즘 구현 끝낸 뒤 chrome-devtools-mcp로 e2e 테스트를 진행하고 있다.

/clarify
chrome-devtools-mcp 를 이용한 e2e 테스트 계획을 세우자.
대상은 refs/origin/develop 대비 현재 브랜치에서 변경된 모든 커밋/변경사항이야.

어떤 방향으로 무엇(UI)을 테스트해야하고, 그것이 어떤 결과를 가져야 하는지 먼저 happy-path 를 구성해보자.

진행 시 interrupts req + mock data 이용하는데, 절대로 기존 서버 req/res 구조를 변경해서는 안돼.
서버 API 문서는 https://.../ 여기를 참고할 수 있어.
또한 절대로 임의 추측/판단하지 말고, 반드시 실제 데이터/코드/문서/조사결과만을 바탕으로 진행하자.

이 claude-code 프롬프트에서 시작해 지금은 codex 로 아래와 같이 스킬을 구성했다. ctx가 1M 이기도 하고, 5h/1w 한도가 널널해서 codex로 하기로 했다.

---
name: diff-aware-web-e2e
description: Plan all impacted web E2E paths for current branch changes against a user-provided base ref, then execute only the user-selected paths with chrome-devtools-mcp. Use when the user wants evidence-based UI test planning and execution without changing server request or response contracts.
---

# Diff-Aware Web E2E

Use this skill when the user wants to turn current branch changes into concrete, evidence-based web E2E checks.

## What This Skill Does

- Reads `<base_ref>...HEAD` changes to find impacted UI areas.
- Builds a full scenario inventory covering directly impacted user paths plus same-screen or same-flow core branch and regression checks.
- Writes scenarios with step-by-step actions and step-by-step expected UI states.
- Shows the full planned inventory, marks a recommended set with reasons, then asks the user which scenario IDs or recommended set to execute.
- Optionally executes only the selected scenarios with `chrome-devtools-mcp`.
- Reports only evidence-backed results.

## Default Mode

Start in **plan-only** mode and do not execute until the user chooses which planned scenarios to run.

## Inputs

Collect only what is missing:

- `mode`: `plan` or `execute`
- `base_ref`: required; compare `<base_ref>...HEAD`
- `change_focus`: optional user concern or priority area
- `target_area`: optional specific feature or page
- `api_docs_url`: optional; use when provided
- `mock_mode`: `auto`, `required`, or `off`

## Core Rules

- Never invent affected pages, API behavior, or expected results.
- If `base_ref` is missing, ask the user for it and stop until it is provided.
- Derive every scenario from at least one concrete source:
  - changed code
  - tests or stories
  - API docs
  - observed browser/network evidence
- Plan the full impacted scenario inventory before suggesting execution.
- Cover directly impacted user paths plus same-screen or same-flow core branch and regression scenarios that are defensibly tied to the change.
- Expand each planned path until a clear completion state is reached.
- Include step-by-step user actions and step-by-step expected UI states.
- Provide scenario IDs, priority, and a recommended set with reasons for planned paths.
- Never auto-select or auto-execute the recommended set.
- Do not change server request or response contracts.
- Never terminate `chrome-devtools-mcp`, Chrome, or remote-debugging processes that this run did not start.
- If evidence is insufficient, ask or stop.

## Evidence Order

1. Changed code and tests
2. Provided API docs
3. Runtime network evidence
4. User confirmation

See [evidence-rules.md](references/evidence-rules.md).

## Planning Workflow

1. If `base_ref` is missing, ask the user for it and stop.
2. Read the diff against `<base_ref>...HEAD`.
3. Find directly impacted UI entry points and related routes.
4. Read only the minimal supporting code, tests, and docs needed to map the full impacted scenario inventory.
5. If `api_docs_url` is provided, inspect it for relevant endpoints and response shapes.
6. For each scenario, produce:
   - scenario ID
   - coverage type: direct impact, same-screen branch, or same-flow regression
   - target UI or flow
   - why it is tied to the diff
   - ordered user actions
   - step-by-step expected UI states
   - expected API or network result when defensible
   - whether mocking is needed
   - priority
   - recommended-set status and reason
7. Show the full scenario inventory, show the recommended set, and ask the user which scenario IDs or recommended set should move to execution.

See [planning-rules.md](references/planning-rules.md).

## Execution Workflow

Use this only after the user chooses which planned scenario IDs or recommended set to execute.

1. Start a run-owned isolated Chrome and `chrome-devtools-mcp` context.
2. Check that the current MCP runtime supports isolated execution, timeout tuning, and log capture. If that is clearly missing, report `block` with the missing preconditions.
3. Do not attach cleanup behavior to any Chrome or MCP process not started by this run.
4. If needed, navigate to the selected target path.
5. Wait for stable UI evidence before judging results.
6. Use snapshots for structure and screenshots for reporting.
7. Inspect console and network activity for corroborating evidence.
8. On connection failure, retry in-place, then restart the isolated run-owned instance, then collect logs and report `block`.
9. Report `pass`, `fail`, or `block`.

Use these `chrome-devtools-mcp` capabilities when relevant:

- `list_pages`
- `select_page`
- `new_page`
- `navigate_page`
- `wait_for`
- `take_snapshot`
- `take_screenshot`
- `list_network_requests`
- `get_network_request`
- `list_console_messages`
- `evaluate_script`

See [execution-rules.md](references/execution-rules.md).

## Mocking Policy

- Treat intercepts and mock data as the default strategy when they are feasible.
- First check whether page-level mocking is sufficient.
- Page-level mocking may use `initScript` or `evaluate_script` to patch `fetch` or `XMLHttpRequest`.
- Verify that mocking actually took effect using DOM and network evidence.
- Do not claim this is full browser-level interception.
- If page-level mocking is unreliable or the scenario needs broader interception than it can safely cover, fall back to real API behavior or report `block`.

## Output Format

### Plan Mode

- Full impacted scenario inventory
- Scenario ID, coverage type, priority, and recommended-set status for each scenario
- Recommended set with a short reason for each included scenario
- Evidence for each scenario
- Step-by-step actions
- Step-by-step expected UI states
- Expected API or network result when defensible
- Mock requirement, feasibility, or constraint
- A final clarify step asking which scenario IDs or recommended set to execute

### Execute Mode

- Selected path result: `pass`, `fail`, or `block`
- Screenshot
- Brief evidence summary
- Relevant network or console notes
- Recovery notes if connection handling was needed
- A concise summary of the process used to get to the result
- After all selected work is complete, optionally suggest creating a concise QA handoff document for QA or planners, not developers, when that would help manual verification

## Completion Wrap-Up

- After all planned or selected execution work is complete, summarize the process used:
  - diff basis
  - path selection logic
  - execution setup
  - mocking approach
  - evidence collected
  - blockers or recoveries
- If it seems useful for manual QA, suggest a concise QA handoff document aimed at QA or planners.
- Do not draft that QA handoff document unless the user asks for it.

## When To Stop And Ask

- `base_ref` is missing
- No reliable UI candidate can be tied to the diff
- A completion path cannot be justified from code, docs, or observed behavior
- The planned scenario inventory is ready and user path selection is required
- Mocking is required but safe scope is unclear
- Authentication or setup prevents reliable execution

## Non-Goals

- Branch-to-branch visual diff systems
- App-wide route crawling unrelated to directly impacted paths
- Full browser-level request interception guarantees
- Mobile WebView-specific flows
- Changing backend contracts to make tests easier
- Killing externally managed Chrome or MCP processes

diff base 전달 - E2E 테스트 계획 - 시나리오 수립 - 테스트 실행 - QA 가이드 제안 - QA 가이드 작성(옵션) 이런 흐름으로 진행한다. diff 사이즈에 따라 다르지만, 시나리오 수립 후 실행하면 opus-4.6(thinking)/gpt-5.4(xhigh+fast) 기준 약 15~40분 정도 작업을 수행한다.

그런데 따로 말해주지 않으면 claude-code 대비 findings(scenarios) 를 조금 덜 찾아준다… 프롬프트를 좀 조정해 가능한 세부적으로 제공하도록 했다.

chrome-devtools-mcp는 chrome bin 을 이용한다. 그래서 그런가 조금 자주 실패하기에(특히 transport closed), 실패 시 재시도 - 재생성 - 로그 남기고 block & 알림 이 순서로 상황 접근하도록 했다. 참고로 transport closed 는 단순히 mcp-브라우저 통신이 닫힌 상황.

그래서 병렬 실행을 위해서도 조금 손봐야 한다. 명시적으로 지정하지 않으면 종종 shared로 브라우저 사용하는 것으로 보인다. Skill 에도 나와있지만, 크롬 브라우저 사용 시 --isolated 옵션을 전달하도록 해야 한다(이 skill에서만 사용될거라 생각하기에 mcp 설정보다 skill에서 전달하도록 했다). 그 외에도 headless 모드 사용하도록 mcp 설정했고.

wait_for는 페이지 진입 후 콘텐츠가 바로 보이지 않을 수 있기에(가령 SPA) 구성해줬다. Puppeteer 에도 이런 API가 있다.

req/res는 API doc 기준 mocking을 주로 활용한다. 특히 로그인 필요한 기능이 많아서 종종 핸드오프되기에… 계정(토큰)을 .env나 직접 전달하는건 조금 이상하기도 하고.

재밌었던 점은 QA 가이드를 상당히 잘 만들어 준다는 점. QA 뿐만 아니라 기획자 등 비개발자에게 관련 내용을 전달할 때 매우 편했다.

references 디렉터리를 이용해 planning+evidence, execute 시 어떻게 접근해야 하는지도 문서화했다. 이런게 컨텍스트 효율적으로 사용하는 방법이라 한다.

# Planning Rules

## Scope

Plan the full impacted scenario inventory for changes in the current branch against `<base_ref>...HEAD`.

## Candidate UI Signals

Use these signals to infer impacted UI:

- route or page files
- router config
- changed components imported by pages
- button, heading, link, or `data-testid` strings
- tests, stories, or Playwright specs

## Scenario Construction

For each scenario, include:

- scenario ID
- coverage type: direct impact, same-screen branch, or same-flow regression
- target page or flow
- changed code evidence proving why the scenario is tied to the diff
- start point
- completion condition
- step-by-step user actions
- step-by-step expected UI states
- expected API or network outcome when defensible
- mock need and mock feasibility
- priority
- recommended-set status and reason
- confidence and any assumptions that still need user confirmation

## Path Expansion

- Expand each directly impacted path until the user reaches a clear completion state.
- Do not stop at the first changed screen if the affected flow continues.
- Include same-screen branches and same-flow regression checks when they are defensibly tied to the changed path.
- Do not broaden into unrelated feature-wide regression coverage.

## Documentation Use

- If `api_docs_url` is available, use it to confirm endpoint purpose and response shape.
- If no docs are available, rely on code and runtime evidence instead.
- If neither code nor docs support an expected API outcome, do not invent one.

## Question Policy

Ask only when a missing fact blocks a defensible plan.

If `base_ref` is missing, ask for it before reading the diff.

After presenting the full planned scenario inventory and recommended set, ask the user which scenario IDs or recommended set should move to execution.
# Evidence Rules

## Allowed Evidence

- changed source files
- tests and stories
- official API docs when provided
- observed DOM state
- observed network requests and responses
- observed console output

## Path Coverage Standard

- The planning result should account for all directly impacted user paths that can be justified from the diff and supporting code.
- The planning result should also account for same-screen branch and same-flow regression scenarios when they are defensibly tied to those directly impacted paths.
- Each planned path should explain why it is tied to the change and why its coverage type is justified.

## Disallowed Behavior

- inventing routes or user flows
- inventing response fields not supported by code or docs
- claiming pass or fail without a visible or observable signal
- changing backend request or response contracts
- claiming mocking worked without observable confirmation

## Reporting Standard

Each conclusion should be traceable to one or more concrete observations.

Each step-level expected UI state should be traceable to code, docs, tests, or observed behavior.

The final report should also include a short process summary describing how the result was reached.

If confidence is low, say so and explain what evidence is missing.
# Execution Rules

## Browser Strategy

- Each execution owns its own isolated Chrome and `chrome-devtools-mcp` context.
- Never terminate Chrome, remote-debugging Chrome, or `chrome-devtools-mcp` processes that this run did not start.
- If the user needs to log in manually inside the isolated run-owned browser, pause and resume after the handoff.

## Preflight

- Before execution, verify that the active MCP runtime is configured for isolated launches or another equally safe ownership model.
- If the runtime clearly lacks isolation, timeout tuning, or log capture, report `block` and name the missing preconditions.

## MCP Tool Usage

- Use `take_snapshot` to inspect page structure before acting.
- Use `wait_for` for stable UI evidence instead of fixed sleeps when possible.
- Use `take_screenshot` for final reporting artifacts.
- Use `list_network_requests` and `get_network_request` to confirm API activity.
- Use `list_console_messages` to spot frontend regressions.

## Connection Recovery

- On `transport closed` or similar MCP connection failures, retry once or twice in the current run-owned context.
- If retry fails, recreate the isolated run-owned Chrome and MCP context and continue.
- If recovery still fails, enable log capture, preserve the failure evidence, and report `block`.
- Prefer explicit logging and timeout tuning over silent retries.
- When the client configuration allows it, raise `startup_timeout_ms` and capture `--log-file` output for failed runs.
- Prefer isolated run-owned launches over auto-connecting to external browsers for parallel execution.

## Result Labels

- `pass`: expected UI and supporting evidence match
- `fail`: expected UI or supporting evidence clearly diverge
- `block`: missing auth, missing data, unsupported mocking, or insufficient evidence

## Mocking

- Mocking is the default strategy when it can safely support the selected path.
- Page-level mocking may patch `fetch` and `XMLHttpRequest`.
- Confirm that mocking took effect with observable DOM or network evidence.
- Do not present page-level mocking as complete interception coverage.
- If the flow depends on navigation requests, service workers, or subresource control, treat that as unsupported unless the project already provides a safe mechanism.

혼자 사용하면 재미없고 발전이 없기에 팀 ai 활용사례? 리포지토리에도 올렸다. 가이드와 함께 설치/검증 스크립트를 제공했다.

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
SOURCE_DIR="${REPO_ROOT}/skills/diff-aware-web-e2e"
TARGET_DIR="${HOME}/.codex/skills/diff-aware-web-e2e"

if [[ ! -f "${SOURCE_DIR}/SKILL.md" ]]
then
  echo "Error: source skill file not found: ${SOURCE_DIR}/SKILL.md" >&2
  exit 1
fi

mkdir -p "$(dirname "${TARGET_DIR}")"

if [[ -d "${TARGET_DIR}" ]]
then
  TIMESTAMP="$(date +%Y%m%d%H%M%S)"
  BACKUP_DIR="${TARGET_DIR}.bak.${TIMESTAMP}"
  mv "${TARGET_DIR}" "${BACKUP_DIR}"
  echo "Backed up existing skill directory: ${BACKUP_DIR}"
fi

cp -R "${SOURCE_DIR}" "${TARGET_DIR}"

echo "Installed diff-aware-web-e2e skill: ${TARGET_DIR}"
echo "Next: run ./scripts/verify-diff-aware-web-e2e-skill.sh"
#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
SOURCE_DIR="${REPO_ROOT}/skills/diff-aware-web-e2e"
TARGET_DIR="${HOME}/.codex/skills/diff-aware-web-e2e"

hash_file() {
  local file_path
  file_path="$1"

  if command -v shasum >/dev/null 2>&1
  then
    shasum -a 256 "${file_path}" | awk '{print $1}'
    return
  fi

  if command -v sha256sum >/dev/null 2>&1
  then
    sha256sum "${file_path}" | awk '{print $1}'
    return
  fi

  echo "Error: neither shasum nor sha256sum is available." >&2
  exit 1
}

hash_stdin() {
  if command -v shasum >/dev/null 2>&1
  then
    shasum -a 256 | awk '{print $1}'
    return
  fi

  if command -v sha256sum >/dev/null 2>&1
  then
    sha256sum | awk '{print $1}'
    return
  fi

  echo "Error: neither shasum nor sha256sum is available." >&2
  exit 1
}

dir_manifest_hash() {
  local dir_path
  dir_path="$1"

  (
    cd "${dir_path}"
    find . -type f | LC_ALL=C sort | while read -r relative_path
    do
      local_hash="$(hash_file "${dir_path}/${relative_path#./}")"
      printf '%s  %s\n' "${local_hash}" "${relative_path}"
    done
  ) | hash_stdin
}

if [[ ! -f "${SOURCE_DIR}/SKILL.md" ]]
then
  echo "Error: source skill file not found: ${SOURCE_DIR}/SKILL.md" >&2
  exit 1
fi

if [[ ! -d "${TARGET_DIR}" ]]
then
  echo "Error: target skill directory not found: ${TARGET_DIR}" >&2
  echo "Run: ./scripts/install-diff-aware-web-e2e-skill.sh"
  exit 1
fi

SOURCE_HASH="$(dir_manifest_hash "${SOURCE_DIR}")"
TARGET_HASH="$(dir_manifest_hash "${TARGET_DIR}")"

echo "Source: ${SOURCE_DIR}"
echo "Target: ${TARGET_DIR}"
echo "Source manifest SHA-256: ${SOURCE_HASH}"
echo "Target manifest SHA-256: ${TARGET_HASH}"

if [[ "${SOURCE_HASH}" == "${TARGET_HASH}" ]]
then
  echo "Status: synchronized"
  echo "Standard usage: \$diff-aware-web-e2e <your request>"
  exit 0
fi

echo "Status: mismatch"
echo "Run: ./scripts/install-diff-aware-web-e2e-skill.sh"
exit 2