Improving Duplicate Scanner search results

There are lots of duplicates that Lexicon isn’t picking up currently, even with search tolerance set to high. I’ll try to document as many examples as I can find in this thread. Some may be easy to tweak the searcher to catch while others might not be possible. But I just thought it would be good to document as many cases as possible in a single thread to keep things organised.

For anyone else looking to contribute to this list, Ensure you provide the artist/title fields of both tracks and verify yourself that they aren’t considered a duplicate by running a scan with High search tolerance (put them in a small playlist to make scanning faster).

  • Angerfist and Miss K8 - Bogota (2020 Refix)
  • Angerfist & Miss K8 - Bogota (2020 Refix) (Original Mix)
  • Miss K8 & Angerfist - Bogotá (2020 Refix) (Original Mix)

There are a few things going on here but “and” and “&” should be considered interchangeable for duplicate detection along with accented characters (“a” and “á”)

Allowing a duplicate match with jumbled artist names “Miss K8 & Angerfist” vs “Angerfist & Miss K8” would also be nice, but maybe harder to implement

  • Miss K8 and Nolz - Elevate
  • Miss K8 & Mc Nolz - Elevate
  • Miss K8 & Nolz - Elevate

“&” and “and” issue again, but maybe add “MC” as a potentially ignorable value on higher search tolerances

  • Miss K8 vs. Angerfist - New World Order
  • Miss K8 & Angerfist - New World Order

“vs.” “vs” could also be considered interchangeable with “and” & “&” within the artist field

  • Miss K8 - Impact (Radio Edit)
  • Angerfist & Miss K8 - Impact (Radio Edit)

Might be harder to implement but catching cases where an artist was missed would be handy on higher tolerance levels to catch cases like this

1 Like

Great suggestions!

What I’ll do is make the artist field a bit smarter where it will ignore the order of possible artists and handle “and” synonyms. This will be from low tolerance and up.
I’ll make it ignore " mc " in the artist field as well.

The last one isn’t possible because the information is just not there, unless it is missing a very short artist name, then it could be considered a typo.

If anyone has more of these, keep them coming please!