fix(importer): Microdata-Steps bei HowToSection + mehrfach-Schritten
All checks were successful
Build & Publish Docker Image / build-and-push (push) Successful in 1m19s
All checks were successful
Build & Publish Docker Image / build-and-push (push) Successful in 1m19s
Rezeptwelt lieferte Zubereitungs-Steps immer als einen einzigen Treffer, oft mit vermischtem Icon-alt-Text. Zwei Ursachen, beide in der generischen Microdata-Logik — kein rezeptwelt-spezifischer Parser nötig. 1. HowToSection wrappt HowToSteps als itemListElement, unser Parser sah nur das erste. Jetzt: recipeInstructions-Container mit itemtype= HowToSection werden abgestiegen, jedes itemListElement wird ein Step. 2. Ein einzelner HowToStep kann intern "1. …<br>2. …<br>3. …" enthalten. Neuer textWithLineBreaks(el) konvertiert <br>/Block-Grenzen zu \n und ignoriert <img>/<script>/<style>. splitStepText(raw) erkennt nummerierte Zeilen und erzeugt einen eigenen Step pro Nummer; Fort- setzungszeilen ohne Nummer hängen an den aktuellen Step an. 3 neue Tests: HowToSection-Kette, inline-nummerierter Multi-Step, <img>-alt-Unterdrückung. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -105,6 +105,60 @@ describe('extractRecipeFromHtml — Microdata fallback', () => {
|
||||
expect(r!.steps[0].text).toBe('Schälen.');
|
||||
});
|
||||
|
||||
it('splits a single HowToStep containing "1.<br>2.<br>3." into separate steps', () => {
|
||||
const html = `<html><body>
|
||||
<div itemscope itemtype="https://schema.org/Recipe">
|
||||
<span itemprop="name">Multi-step</span>
|
||||
<span itemprop="recipeIngredient">x</span>
|
||||
<div itemprop="recipeInstructions" itemscope itemtype="https://schema.org/HowToStep">
|
||||
<p itemprop="text">1. Teig kneten.<br>2. Gehen lassen.<br>3. Backen.</p>
|
||||
</div>
|
||||
</div>
|
||||
</body></html>`;
|
||||
const r = extractRecipeFromHtml(html);
|
||||
expect(r).not.toBeNull();
|
||||
expect(r!.steps.length).toBe(3);
|
||||
expect(r!.steps[0].text).toBe('Teig kneten.');
|
||||
expect(r!.steps[1].text).toBe('Gehen lassen.');
|
||||
expect(r!.steps[2].text).toBe('Backen.');
|
||||
});
|
||||
|
||||
it('handles HowToSection wrapping multiple HowToStep itemListElements', () => {
|
||||
const html = `<html><body>
|
||||
<div itemscope itemtype="https://schema.org/Recipe">
|
||||
<span itemprop="name">Sections</span>
|
||||
<span itemprop="recipeIngredient">x</span>
|
||||
<div itemprop="recipeInstructions" itemscope itemtype="https://schema.org/HowToSection">
|
||||
<div itemprop="itemListElement" itemscope itemtype="https://schema.org/HowToStep">
|
||||
<span itemprop="text">Erst schneiden.</span>
|
||||
</div>
|
||||
<div itemprop="itemListElement" itemscope itemtype="https://schema.org/HowToStep">
|
||||
<span itemprop="text">Dann kochen.</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</body></html>`;
|
||||
const r = extractRecipeFromHtml(html);
|
||||
expect(r!.steps.length).toBe(2);
|
||||
expect(r!.steps[0].text).toBe('Erst schneiden.');
|
||||
expect(r!.steps[1].text).toBe('Dann kochen.');
|
||||
});
|
||||
|
||||
it('ignores <img> alt/title content in step text', () => {
|
||||
const html = `<html><body>
|
||||
<div itemscope itemtype="https://schema.org/Recipe">
|
||||
<span itemprop="name">WithIcon</span>
|
||||
<span itemprop="recipeIngredient">x</span>
|
||||
<div itemprop="recipeInstructions" itemscope itemtype="https://schema.org/HowToStep">
|
||||
<span itemprop="text">Teig <img alt="Icon Teig kneten" src="/x.png"> verarbeiten.</span>
|
||||
</div>
|
||||
</div>
|
||||
</body></html>`;
|
||||
const r = extractRecipeFromHtml(html);
|
||||
expect(r!.steps[0].text).not.toMatch(/Icon Teig kneten/);
|
||||
expect(r!.steps[0].text).toMatch(/Teig.*verarbeiten/);
|
||||
});
|
||||
|
||||
it('prefers JSON-LD when both are present', () => {
|
||||
const html = `<html><head>
|
||||
<script type="application/ld+json">${JSON.stringify({
|
||||
|
||||
Reference in New Issue
Block a user