fix(skills-hub): deduplicate search results by identifier, not name

Browse.sh exposes skills by task name (e.g. "search-listings"), which is
shared across hundreds of sites. Deduplicating by name silently dropped
every browse-sh skill after the first one with a given task name — e.g.
only Airbnb's "search-listings" would survive, collapsing Booking.com,
Zillow, and every other site's variant into nothing.

Switch unified_search() and do_browse() to use r.identifier as the dedup
key. identifier is always globally unique (e.g.
"browse-sh/airbnb.com/search-listings-ddgioa"), so same-named skills from
different browse-sh hostnames are preserved as distinct results.

Update existing TestUnifiedSearchDedup tests to model the real scenario
(same identifier appearing from two sources) and add a regression test
that asserts browse-sh skills with the same name but different hostnames
are never collapsed.
This commit is contained in:
EloquentBrush0x 2026-05-20 22:28:39 +03:00 committed by Teknium
parent 3ce1cf2bb7
commit fc7e04e9ed
3 changed files with 40 additions and 16 deletions

View file

@ -3425,14 +3425,17 @@ def unified_search(query: str, sources: List[SkillSource],
overall_timeout=30,
)
# Deduplicate by name, preferring higher trust levels
# Deduplicate by identifier, preferring higher trust levels.
# identifier is always unique per skill (e.g. "browse-sh/airbnb.com/search-listings-ddgioa").
# Using name would incorrectly collapse browse-sh skills from different sites that share
# the same task name (e.g. "search-listings" from Airbnb and Booking.com).
_TRUST_RANK = {"builtin": 2, "trusted": 1, "community": 0}
seen: Dict[str, SkillMeta] = {}
for r in all_results:
if r.name not in seen:
seen[r.name] = r
elif _TRUST_RANK.get(r.trust_level, 0) > _TRUST_RANK.get(seen[r.name].trust_level, 0):
seen[r.name] = r
if r.identifier not in seen:
seen[r.identifier] = r
elif _TRUST_RANK.get(r.trust_level, 0) > _TRUST_RANK.get(seen[r.identifier].trust_level, 0):
seen[r.identifier] = r
deduped = list(seen.values())
return deduped[:limit]