docs: add Phase 2 plan (import pipeline)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
206
docs/superpowers/plans/2026-04-17-kochwas-phase-2-import.md
Normal file
206
docs/superpowers/plans/2026-04-17-kochwas-phase-2-import.md
Normal file
@@ -0,0 +1,206 @@
|
|||||||
|
# Kochwas — Phase 2: Import Pipeline
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task.
|
||||||
|
|
||||||
|
**Goal:** End-to-end recipe import: given a whitelisted URL, the server downloads the HTML, extracts the Recipe JSON-LD, downloads the image, and persists everything to SQLite. A preview endpoint returns the parsed recipe without persisting. Profile CRUD and whitelist CRUD are in place.
|
||||||
|
|
||||||
|
**Depends on:** Phase 1 (scaffold, DB, parsers).
|
||||||
|
|
||||||
|
**Architecture:** Pure-function parsers from Phase 1 combined with two new server modules (`image-downloader`, `recipe-importer`). Importer orchestrates: fetch → extract → download image → transaction insert. Preview short-circuits before the DB write. All network I/O is injectable for testing.
|
||||||
|
|
||||||
|
**Tech Stack:** undici (Node fetch), better-sqlite3, vitest.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure (this phase)
|
||||||
|
|
||||||
|
```
|
||||||
|
src/lib/server/http.ts # fetch wrapper (timeout, max-size, UA)
|
||||||
|
src/lib/server/images/image-downloader.ts # URL → local file (sha256)
|
||||||
|
src/lib/server/recipes/repository.ts # recipe CRUD against DB
|
||||||
|
src/lib/server/recipes/importer.ts # orchestrate preview + import
|
||||||
|
src/lib/server/domains/repository.ts # allowed_domain CRUD
|
||||||
|
src/lib/server/domains/whitelist.ts # isDomainAllowed(url)
|
||||||
|
src/lib/server/profiles/repository.ts # profile CRUD
|
||||||
|
|
||||||
|
src/routes/api/recipes/preview/+server.ts # GET ?url=
|
||||||
|
src/routes/api/recipes/import/+server.ts # POST { url }
|
||||||
|
src/routes/api/profiles/+server.ts # GET, POST
|
||||||
|
src/routes/api/domains/+server.ts # GET, POST, DELETE
|
||||||
|
|
||||||
|
tests/integration/image-downloader.test.ts
|
||||||
|
tests/integration/importer.test.ts
|
||||||
|
tests/integration/profiles-api.test.ts # optional
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 1: HTTP wrapper
|
||||||
|
|
||||||
|
**File:** `src/lib/server/http.ts`
|
||||||
|
|
||||||
|
Minimal wrapper around fetch that enforces timeout, max body size, and sensible UA. Returns the body as text or Buffer.
|
||||||
|
|
||||||
|
- [ ] Implement with unit-style test in `tests/integration/http.test.ts` that asserts timeout behavior using a slow local server (via `http.createServer`).
|
||||||
|
|
||||||
|
Implementation sketch:
|
||||||
|
```ts
|
||||||
|
export type FetchOptions = {
|
||||||
|
maxBytes?: number; // default 10 * 1024 * 1024
|
||||||
|
timeoutMs?: number; // default 10_000
|
||||||
|
userAgent?: string; // default "Kochwas/0.1"
|
||||||
|
};
|
||||||
|
|
||||||
|
export async function fetchText(url: string, opts: FetchOptions = {}): Promise<string> { ... }
|
||||||
|
export async function fetchBuffer(url: string, opts: FetchOptions = {}): Promise<{ data: Buffer; contentType: string | null }> { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
Enforce:
|
||||||
|
- Only `http:` / `https:` schemes.
|
||||||
|
- AbortController on timeoutMs.
|
||||||
|
- Stream + count bytes, abort at maxBytes.
|
||||||
|
- Set User-Agent header.
|
||||||
|
|
||||||
|
Commit: `feat(http): add fetchText/fetchBuffer with timeout and size limits`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 2: Whitelist check
|
||||||
|
|
||||||
|
**Files:** `src/lib/server/domains/repository.ts`, `src/lib/server/domains/whitelist.ts`
|
||||||
|
|
||||||
|
Repository:
|
||||||
|
```ts
|
||||||
|
export function listDomains(db): AllowedDomain[]
|
||||||
|
export function addDomain(db, domain, displayName?, profileId?): AllowedDomain
|
||||||
|
export function removeDomain(db, id): void
|
||||||
|
```
|
||||||
|
|
||||||
|
Whitelist helper:
|
||||||
|
```ts
|
||||||
|
export function isDomainAllowed(db: Database, urlString: string): boolean
|
||||||
|
```
|
||||||
|
|
||||||
|
Normalize by `new URL(urlString).hostname.toLowerCase()`, strip leading `www.`, compare against `allowed_domain.domain` (also normalized).
|
||||||
|
|
||||||
|
Integration test: insert `chefkoch.de`, verify allowed for `https://www.chefkoch.de/x`, `https://chefkoch.de/y`, denied for `https://fake.de/x`.
|
||||||
|
|
||||||
|
Commit: `feat(domains): add allowed-domain repository and whitelist check`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 3: Image downloader
|
||||||
|
|
||||||
|
**File:** `src/lib/server/images/image-downloader.ts`
|
||||||
|
|
||||||
|
Interface:
|
||||||
|
```ts
|
||||||
|
export async function downloadImage(url: string, targetDir: string): Promise<string | null>
|
||||||
|
```
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- Uses `fetchBuffer` from Task 1.
|
||||||
|
- Computes SHA256 of bytes → filename `<sha256><.ext>`.
|
||||||
|
- Extension from content-type: `image/jpeg` → `.jpg`, `image/png` → `.png`, `image/webp` → `.webp`, else `.bin`.
|
||||||
|
- Writes to `targetDir` only if file doesn't yet exist.
|
||||||
|
- Returns relative path (basename). Returns null on fetch failure (best-effort).
|
||||||
|
|
||||||
|
Test (integration):
|
||||||
|
- Start a local `http.createServer` serving a tiny PNG.
|
||||||
|
- Call `downloadImage`, assert the file exists in a tmp dir, assert idempotency (second call returns same filename, doesn't rewrite).
|
||||||
|
|
||||||
|
Commit: `feat(images): add sha256-deduplicated image downloader`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 4: Recipe repository
|
||||||
|
|
||||||
|
**File:** `src/lib/server/recipes/repository.ts`
|
||||||
|
|
||||||
|
Functions:
|
||||||
|
```ts
|
||||||
|
export function insertRecipe(db, recipe: Recipe): number // returns id
|
||||||
|
export function getRecipeById(db, id: number): Recipe | null
|
||||||
|
export function getRecipeBySourceUrl(db, url: string): Recipe | null
|
||||||
|
export function deleteRecipe(db, id: number): void
|
||||||
|
```
|
||||||
|
|
||||||
|
`insertRecipe` is a transaction:
|
||||||
|
1. Insert row in `recipe`.
|
||||||
|
2. Insert ingredients (position preserved).
|
||||||
|
3. Insert steps.
|
||||||
|
4. For each tag: insert-or-find in `tag`, insert into `recipe_tag`.
|
||||||
|
5. After insert, trigger the recipe_fts update by issuing `UPDATE recipe SET title = title WHERE id = ?` so the update-trigger fires and refreshes `ingredients_concat` / `tags_concat`. (simpler than duplicating the FTS write)
|
||||||
|
|
||||||
|
Integration test: full round-trip: scale recipe from a fixture through `extractRecipeFromHtml`, insert, read back, assert ingredients/steps/tags preserved. Also assert FTS finds it by an ingredient name.
|
||||||
|
|
||||||
|
Commit: `feat(recipes): add recipe repository (insert/get/delete with FTS refresh)`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 5: Recipe importer
|
||||||
|
|
||||||
|
**File:** `src/lib/server/recipes/importer.ts`
|
||||||
|
|
||||||
|
```ts
|
||||||
|
export async function previewRecipe(url: string): Promise<Recipe>
|
||||||
|
// fetch html, extract recipe, return — throws on failure
|
||||||
|
export async function importRecipe(db, imageDir: string, url: string): Promise<{ id: number }>
|
||||||
|
// preview + image download + repository insert (or return existing on dup)
|
||||||
|
```
|
||||||
|
|
||||||
|
- Before fetching: `isDomainAllowed` else throw `{ code: 'DOMAIN_BLOCKED' }`.
|
||||||
|
- `importRecipe` checks `getRecipeBySourceUrl` first; if hit, returns existing id (idempotent).
|
||||||
|
- `image_path` is the filename returned by the downloader (NULL if download failed).
|
||||||
|
- `source_domain` filled from URL hostname.
|
||||||
|
|
||||||
|
Test:
|
||||||
|
- Local test HTTP server serves a known HTML fixture (one from Phase 1 is fine).
|
||||||
|
- Assert `previewRecipe(url).title` contains expected string.
|
||||||
|
- Assert `importRecipe` inserts a row, returns id, second call returns same id.
|
||||||
|
|
||||||
|
Commit: `feat(recipes): add recipe importer (preview + persist)`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 6: Profile repository
|
||||||
|
|
||||||
|
**File:** `src/lib/server/profiles/repository.ts`
|
||||||
|
|
||||||
|
```ts
|
||||||
|
export function listProfiles(db): Profile[]
|
||||||
|
export function createProfile(db, name: string, avatarEmoji?: string | null): Profile
|
||||||
|
export function renameProfile(db, id: number, newName: string): void
|
||||||
|
export function deleteProfile(db, id: number): void
|
||||||
|
```
|
||||||
|
|
||||||
|
Integration test: round-trip + UNIQUE constraint on name.
|
||||||
|
|
||||||
|
Commit: `feat(profiles): add profile repository`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 7: API endpoints
|
||||||
|
|
||||||
|
Routes (each its own `+server.ts`):
|
||||||
|
|
||||||
|
- `GET /api/recipes/preview?url=…` — 200 with Recipe JSON; 400 on bad URL; 403 if domain blocked; 422 if no Recipe JSON-LD found.
|
||||||
|
- `POST /api/recipes/import` — body `{ url }` — 200 `{ id, duplicate: boolean }`; 403/422 as above.
|
||||||
|
- `GET /api/profiles` / `POST /api/profiles { name, avatar_emoji? }` / `DELETE /api/profiles/[id]`.
|
||||||
|
- `GET /api/domains` / `POST /api/domains { domain, display_name? }` / `DELETE /api/domains/[id]`.
|
||||||
|
|
||||||
|
Each endpoint: narrow input with Zod; map thrown error codes to HTTP codes.
|
||||||
|
|
||||||
|
Integration test: hit each endpoint via SvelteKit's `event.fetch` or direct handler invocation; assert status and shape.
|
||||||
|
|
||||||
|
Commit: `feat(api): expose preview/import/profile/domain endpoints`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 Done-When
|
||||||
|
|
||||||
|
- `npm test` green across unit + integration.
|
||||||
|
- Manual smoke: dev server up, `curl -s "http://localhost:5173/api/recipes/preview?url=<chefkoch-url>"` returns a recipe JSON.
|
||||||
|
- Manual smoke: `curl -X POST -H "content-type: application/json" -d '{"url":"..."}' http://localhost:5173/api/recipes/import` returns `{ id: N, duplicate: false }`, and a second POST returns `duplicate: true`.
|
||||||
|
- Image is saved in `./data/images/`.
|
||||||
|
- `data/kochwas.db` has one recipe row after the import with populated ingredients and steps.
|
||||||
Reference in New Issue
Block a user