So, as far as I understand, the ratio method in the Python Levenshtein library is supposed to give some normalized similarity score and is implemented as "((lensum - levdist) / lensum)", where lensum is the sum of both strings and levdist is the Levenshtein distance. However, I do not understand where this (formula) is coming from? In a few papers I saw that a normalized similarity can be calculated as "((lenmax - levdist) / lenmax)", where lenmax is the maximum length among both strings, so how was it decided to use the sum of lengths in the Python implementation? Are both calculations somehow equivalent? What source do we have that the ratio method is adequate?
This is a comprehension question and not necessarily a programming question. Please let me know if I should move it to another exchange forum, but I thought since it is regarding a Python library this is the place for it.
source https://stackoverflow.com/questions/75139775/understanding-the-python-levenstein-ratio-method
Comments
Post a Comment