Files
kochwas/docs/superpowers/plans/2026-04-17-kochwas-phase-2-import.md
Hendrik 9ddceb563b docs: add Phase 2 plan (import pipeline)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:08:22 +02:00

8.1 KiB

Kochwas — Phase 2: Import Pipeline

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task.

Goal: End-to-end recipe import: given a whitelisted URL, the server downloads the HTML, extracts the Recipe JSON-LD, downloads the image, and persists everything to SQLite. A preview endpoint returns the parsed recipe without persisting. Profile CRUD and whitelist CRUD are in place.

Depends on: Phase 1 (scaffold, DB, parsers).

Architecture: Pure-function parsers from Phase 1 combined with two new server modules (image-downloader, recipe-importer). Importer orchestrates: fetch → extract → download image → transaction insert. Preview short-circuits before the DB write. All network I/O is injectable for testing.

Tech Stack: undici (Node fetch), better-sqlite3, vitest.


File Structure (this phase)

src/lib/server/http.ts                         # fetch wrapper (timeout, max-size, UA)
src/lib/server/images/image-downloader.ts      # URL → local file (sha256)
src/lib/server/recipes/repository.ts           # recipe CRUD against DB
src/lib/server/recipes/importer.ts             # orchestrate preview + import
src/lib/server/domains/repository.ts           # allowed_domain CRUD
src/lib/server/domains/whitelist.ts            # isDomainAllowed(url)
src/lib/server/profiles/repository.ts          # profile CRUD

src/routes/api/recipes/preview/+server.ts      # GET ?url=
src/routes/api/recipes/import/+server.ts       # POST { url }
src/routes/api/profiles/+server.ts             # GET, POST
src/routes/api/domains/+server.ts              # GET, POST, DELETE

tests/integration/image-downloader.test.ts
tests/integration/importer.test.ts
tests/integration/profiles-api.test.ts         # optional

Task 1: HTTP wrapper

File: src/lib/server/http.ts

Minimal wrapper around fetch that enforces timeout, max body size, and sensible UA. Returns the body as text or Buffer.

  • Implement with unit-style test in tests/integration/http.test.ts that asserts timeout behavior using a slow local server (via http.createServer).

Implementation sketch:

export type FetchOptions = {
  maxBytes?: number;   // default 10 * 1024 * 1024
  timeoutMs?: number;  // default 10_000
  userAgent?: string;  // default "Kochwas/0.1"
};

export async function fetchText(url: string, opts: FetchOptions = {}): Promise<string> { ... }
export async function fetchBuffer(url: string, opts: FetchOptions = {}): Promise<{ data: Buffer; contentType: string | null }> { ... }

Enforce:

  • Only http: / https: schemes.
  • AbortController on timeoutMs.
  • Stream + count bytes, abort at maxBytes.
  • Set User-Agent header.

Commit: feat(http): add fetchText/fetchBuffer with timeout and size limits


Task 2: Whitelist check

Files: src/lib/server/domains/repository.ts, src/lib/server/domains/whitelist.ts

Repository:

export function listDomains(db): AllowedDomain[]
export function addDomain(db, domain, displayName?, profileId?): AllowedDomain
export function removeDomain(db, id): void

Whitelist helper:

export function isDomainAllowed(db: Database, urlString: string): boolean

Normalize by new URL(urlString).hostname.toLowerCase(), strip leading www., compare against allowed_domain.domain (also normalized).

Integration test: insert chefkoch.de, verify allowed for https://www.chefkoch.de/x, https://chefkoch.de/y, denied for https://fake.de/x.

Commit: feat(domains): add allowed-domain repository and whitelist check


Task 3: Image downloader

File: src/lib/server/images/image-downloader.ts

Interface:

export async function downloadImage(url: string, targetDir: string): Promise<string | null>

Behavior:

  • Uses fetchBuffer from Task 1.
  • Computes SHA256 of bytes → filename <sha256><.ext>.
  • Extension from content-type: image/jpeg.jpg, image/png.png, image/webp.webp, else .bin.
  • Writes to targetDir only if file doesn't yet exist.
  • Returns relative path (basename). Returns null on fetch failure (best-effort).

Test (integration):

  • Start a local http.createServer serving a tiny PNG.
  • Call downloadImage, assert the file exists in a tmp dir, assert idempotency (second call returns same filename, doesn't rewrite).

Commit: feat(images): add sha256-deduplicated image downloader


Task 4: Recipe repository

File: src/lib/server/recipes/repository.ts

Functions:

export function insertRecipe(db, recipe: Recipe): number  // returns id
export function getRecipeById(db, id: number): Recipe | null
export function getRecipeBySourceUrl(db, url: string): Recipe | null
export function deleteRecipe(db, id: number): void

insertRecipe is a transaction:

  1. Insert row in recipe.
  2. Insert ingredients (position preserved).
  3. Insert steps.
  4. For each tag: insert-or-find in tag, insert into recipe_tag.
  5. After insert, trigger the recipe_fts update by issuing UPDATE recipe SET title = title WHERE id = ? so the update-trigger fires and refreshes ingredients_concat / tags_concat. (simpler than duplicating the FTS write)

Integration test: full round-trip: scale recipe from a fixture through extractRecipeFromHtml, insert, read back, assert ingredients/steps/tags preserved. Also assert FTS finds it by an ingredient name.

Commit: feat(recipes): add recipe repository (insert/get/delete with FTS refresh)


Task 5: Recipe importer

File: src/lib/server/recipes/importer.ts

export async function previewRecipe(url: string): Promise<Recipe>
// fetch html, extract recipe, return — throws on failure
export async function importRecipe(db, imageDir: string, url: string): Promise<{ id: number }>
// preview + image download + repository insert (or return existing on dup)
  • Before fetching: isDomainAllowed else throw { code: 'DOMAIN_BLOCKED' }.
  • importRecipe checks getRecipeBySourceUrl first; if hit, returns existing id (idempotent).
  • image_path is the filename returned by the downloader (NULL if download failed).
  • source_domain filled from URL hostname.

Test:

  • Local test HTTP server serves a known HTML fixture (one from Phase 1 is fine).
  • Assert previewRecipe(url).title contains expected string.
  • Assert importRecipe inserts a row, returns id, second call returns same id.

Commit: feat(recipes): add recipe importer (preview + persist)


Task 6: Profile repository

File: src/lib/server/profiles/repository.ts

export function listProfiles(db): Profile[]
export function createProfile(db, name: string, avatarEmoji?: string | null): Profile
export function renameProfile(db, id: number, newName: string): void
export function deleteProfile(db, id: number): void

Integration test: round-trip + UNIQUE constraint on name.

Commit: feat(profiles): add profile repository


Task 7: API endpoints

Routes (each its own +server.ts):

  • GET /api/recipes/preview?url=… — 200 with Recipe JSON; 400 on bad URL; 403 if domain blocked; 422 if no Recipe JSON-LD found.
  • POST /api/recipes/import — body { url } — 200 { id, duplicate: boolean }; 403/422 as above.
  • GET /api/profiles / POST /api/profiles { name, avatar_emoji? } / DELETE /api/profiles/[id].
  • GET /api/domains / POST /api/domains { domain, display_name? } / DELETE /api/domains/[id].

Each endpoint: narrow input with Zod; map thrown error codes to HTTP codes.

Integration test: hit each endpoint via SvelteKit's event.fetch or direct handler invocation; assert status and shape.

Commit: feat(api): expose preview/import/profile/domain endpoints


Phase 2 Done-When

  • npm test green across unit + integration.
  • Manual smoke: dev server up, curl -s "http://localhost:5173/api/recipes/preview?url=<chefkoch-url>" returns a recipe JSON.
  • Manual smoke: curl -X POST -H "content-type: application/json" -d '{"url":"..."}' http://localhost:5173/api/recipes/import returns { id: N, duplicate: false }, and a second POST returns duplicate: true.
  • Image is saved in ./data/images/.
  • data/kochwas.db has one recipe row after the import with populated ingredients and steps.