fix(gateway): keep code blocks verbatim in cleaned text when media present
Some checks failed
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run
Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
OSV-Scanner / Scan lockfiles (push) Has been cancelled
uv.lock check / uv lock --check (push) Has been cancelled

Self-review of the code-block masking fix: the cleanup path ran
media_pattern.sub('') over the _mask_protected_spans() copy of the text and
assigned that back to 'cleaned', so whenever a real MEDIA: tag was delivered
(if media: branch), every fenced code block / inline code / blockquote in the
reply was blanked to whitespace in the user-visible text.

Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned' — masking is a locator,
not a text rewrite. Protected spans survive verbatim. Strengthens the existing
mixed-code test (it only asserted 'Done.' survived, not the code block) and
adds an inline-code-survives regression test. Both fail on the old sub-based
code and pass now.
This commit is contained in:
kshitijk4poor 2026-06-01 12:10:27 +05:30 committed by kshitij
parent ec6261ae2f
commit 6c73e8ffaa

View file

@ -431,7 +431,8 @@ class TestExtractMedia:
assert media[0][0] == "/real/file.png"
def test_media_mixed_code_and_prose(self):
"""Real MEDIA: in prose + example in code block: only prose extracted."""
"""Real MEDIA: in prose + example in code block: only prose extracted,
and the code block survives verbatim in the delivered text."""
content = (
"Here is your file:\n"
"MEDIA:/output/report.pdf\n"
@ -443,6 +444,19 @@ class TestExtractMedia:
assert len(media) == 1
assert media[0][0] == "/output/report.pdf"
assert "Done." in cleaned
# The real tag is stripped from the delivered text...
assert "MEDIA:/output/report.pdf" not in cleaned
# ...but the fenced code block (incl. its example MEDIA: line) must
# survive verbatim — masking is a locator, not a text rewrite.
assert "```text\nMEDIA:/example/path.pdf\n```" in cleaned
def test_inline_code_survives_when_real_media_present(self):
"""When a real MEDIA: tag is delivered, an inline-code example in the
same reply must not be blanked to whitespace."""
content = "See MEDIA:/r/a.png and `MEDIA:/ex/b.png` inline"
media, cleaned = BasePlatformAdapter.extract_media(content)
assert [p for p, _ in media] == ["/r/a.png"]
assert "`MEDIA:/ex/b.png`" in cleaned
class TestMediaInsideSerializedJson: