Understanding Find Duplicates utility logic

[searched the forum and found a related topic here, but not sure if I’m seeing the improvements]

After having scanned the Guilty Pleasure playlist I did get many good suggestions for Duplicate songs, but many also still got through, even with High Tolerance. E.g. some ‘radio edits’ were reported as duplicate (as I expected), which helped me a lot to clean things up.

However, the list still has plenty of duplicates remaining which were not reported. I would like to stand why these weren’t reported with High Tolerance mode on.

Some thoughts / questions on High Tolerance mode:

  • I expected Lexicon to not consider any text in brackets, but I guess it still does? E.g. ‘remixed’, ‘remaster’, ‘edit’, ‘extended’)
  • I expected any special character differences between artists to be ignored.
  • I expected a typo ‘belive’ vs ‘believe’ or ‘Michel’ vs ‘Michael’ to be caught and reported as duplicate (just one character different)
  • I didn’t expected Uppercase or Lowercase differences to still get through, as in ‘Kylie Minoque - Can’t Get You Out Of My Head’
  • All of the above came together in the Vengaboys set of songs :wink:

ps: If you need a proper database with multiple of these duplicates, it seems that mine is a good one for testing :wink:

There are a bunch of rules Lexicon uses here. Just to give you some feedback on what I can see in the screenshot:

  • Nena: titles are too different. Common text like “radio edit” is ignored but “2002 radio” is not common. I don’t think “new version” is common either though could be ignored safely probably.
  • Toto: I would think it finds the typo’s version on high tolerance so that’s something I’ll see if I can improve. Words like “Clean” are only ignored if they are inside parentheses because otherwise it may be part of a title eg “Come back clean”
  • Sir mix a lot: the typo check is actually only titles, not on artists. You can use the Artist Cleanup feature (Quick Fixes in the sidebar) to create uniform artists quite quickly so definitely give that a try.
  • Ace of base: this is a remix and not considered a duplicate even on high tolerance

Thanks! Some good guidance to help me clean things up a bit more. Recipes to the rescue :wink: