feat(importer): Microdata-Fallback für Seiten ohne JSON-LD
All checks were successful
Build & Publish Docker Image / build-and-push (push) Successful in 1m17s
All checks were successful
Build & Publish Docker Image / build-and-push (push) Successful in 1m17s
Bisher scheiterte der Import auf Seiten wie rezeptwelt.de mit „Diese Seite enthält kein Rezept", obwohl unser Such-Filter die Treffer durchließ (Microdata wird seit dem vorherigen Commit erkannt). Jetzt kann der Importer die Daten auch tatsächlich extrahieren: - extractRecipeFromMicrodata(html): parst [itemtype=schema.org/Recipe]- Scopes per linkedom, sammelt itemprop-Werte unter Beachtung der verschachtelten itemscope-Grenzen (HowToStep-Texts landen nicht im Haupt-Scope). - Übernimmt Content-Attribute auf <meta>/<time> (z.B. prepTime="PT20M"), src auf <img>, textContent als Fallback — die Standard-Microdata- Value-Regeln. - Behandelt HowToStep-Items UND einfache <li>/<ol>-Listen als recipeInstructions. - extractRecipeFromHtml ruft JSON-LD zuerst, fällt nur bei null auf Microdata zurück — damit bleibt bestehendes Verhalten stabil. Tests: Königsberger-Klopse-Fixture mit HowToSteps, einfache ol/li- Variante und Priorität-JSON-LD-über-Microdata-Check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -46,6 +46,83 @@ describe('extractRecipeFromHtml', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('extractRecipeFromHtml — Microdata fallback', () => {
|
||||
it('extracts title, ingredients and HowToStep instructions', () => {
|
||||
const html = `<!doctype html><html><body>
|
||||
<article itemscope itemtype="https://schema.org/Recipe">
|
||||
<h1 itemprop="name">Königsberger Klopse</h1>
|
||||
<img itemprop="image" src="/img/klopse.jpg" />
|
||||
<p itemprop="description">Klassische Königsberger Klopse.</p>
|
||||
<meta itemprop="prepTime" content="PT20M" />
|
||||
<meta itemprop="cookTime" content="PT25M" />
|
||||
<span itemprop="recipeYield">4</span>
|
||||
<span itemprop="recipeCuisine">Ostpreußisch</span>
|
||||
<ul>
|
||||
<li itemprop="recipeIngredient">500 g Hackfleisch gemischt</li>
|
||||
<li itemprop="recipeIngredient">1 Zwiebel, fein gewürfelt</li>
|
||||
<li itemprop="recipeIngredient">2 EL Kapern</li>
|
||||
</ul>
|
||||
<ol>
|
||||
<li itemprop="recipeInstructions" itemscope itemtype="https://schema.org/HowToStep">
|
||||
<span itemprop="text">Hackfleisch und Zwiebel vermengen.</span>
|
||||
</li>
|
||||
<li itemprop="recipeInstructions" itemscope itemtype="https://schema.org/HowToStep">
|
||||
<span itemprop="text">Klopse formen und in Salzwasser garen.</span>
|
||||
</li>
|
||||
</ol>
|
||||
</article>
|
||||
</body></html>`;
|
||||
const r = extractRecipeFromHtml(html);
|
||||
expect(r).not.toBeNull();
|
||||
expect(r!.title).toBe('Königsberger Klopse');
|
||||
expect(r!.ingredients.length).toBe(3);
|
||||
expect(r!.ingredients[0].raw_text).toContain('Hackfleisch');
|
||||
expect(r!.steps.length).toBe(2);
|
||||
expect(r!.steps[1].text).toContain('Klopse formen');
|
||||
expect(r!.prep_time_min).toBe(20);
|
||||
expect(r!.cook_time_min).toBe(25);
|
||||
expect(r!.servings_default).toBe(4);
|
||||
expect(r!.cuisine).toBe('Ostpreußisch');
|
||||
expect(r!.image_path).toBe('/img/klopse.jpg');
|
||||
});
|
||||
|
||||
it('handles plain-text recipeInstructions without HowToStep', () => {
|
||||
const html = `<html><body>
|
||||
<div itemscope itemtype="http://schema.org/Recipe">
|
||||
<span itemprop="name">Test</span>
|
||||
<span itemprop="recipeIngredient">1 Apfel</span>
|
||||
<div itemprop="recipeInstructions">
|
||||
<ol>
|
||||
<li>Schälen.</li>
|
||||
<li>Essen.</li>
|
||||
</ol>
|
||||
</div>
|
||||
</div>
|
||||
</body></html>`;
|
||||
const r = extractRecipeFromHtml(html);
|
||||
expect(r).not.toBeNull();
|
||||
expect(r!.steps.length).toBe(2);
|
||||
expect(r!.steps[0].text).toBe('Schälen.');
|
||||
});
|
||||
|
||||
it('prefers JSON-LD when both are present', () => {
|
||||
const html = `<html><head>
|
||||
<script type="application/ld+json">${JSON.stringify({
|
||||
'@type': 'Recipe',
|
||||
name: 'From JSON-LD',
|
||||
recipeIngredient: ['x'],
|
||||
recipeInstructions: ['y']
|
||||
})}</script>
|
||||
</head><body>
|
||||
<div itemscope itemtype="https://schema.org/Recipe">
|
||||
<span itemprop="name">From Microdata</span>
|
||||
</div>
|
||||
</body></html>`;
|
||||
const r = extractRecipeFromHtml(html);
|
||||
expect(r?.title).toBe('From JSON-LD');
|
||||
});
|
||||
});
|
||||
|
||||
describe('hasRecipeMarkup', () => {
|
||||
it('detects JSON-LD Recipe', () => {
|
||||
const html = `<html><head>
|
||||
|
||||
Reference in New Issue
Block a user