Is there any ready-made string similarity check in Qt?
I've read about Levenshtein distance or soundex, is there anything like this built in?
According to this thread (sorry, German language) I've implemented a counting of small substrings that gives me quite reasonable results for Names (which I need to compare) when I take substrings of n=2 chars and an 80% threshold.
I'd like to know if there is a better routine fast enough that I can just take and use.
This is what I use right now (corrections on style and speed are welcome!)
header:
bool isSimilar
(QString a,
QString b, qreal percentage
=80,
int n
= 2, Qt
::CaseSensitivity caseSense
= Qt
::CaseInsensitive);
bool isSimilar(QString a, QString b, qreal percentage=80, int n = 2, Qt::CaseSensitivity caseSense= Qt::CaseInsensitive);
To copy to clipboard, switch view to plain text mode
implementation:
bool MainWindow
::isSimilar(QString a,
QString b, qreal percentage,
int n, Qt
::CaseSensitivity caseSense
) //Iterates substrings in groups of n chars from a und finds these in b.
//The number of hits is then divided by the length of the shorter string.
//To properly take word beginnings and endings into account
//spaces are being inserted before and after the strings.
{
if (a.isEmpty()||b.isEmpty()) return false;
qreal hits=0;
for (int i=0;i<a.count()-(n-1);i++)
{
part=a.mid(i,n);
if (b.contains(part,caseSense)) hits++;
}
if (a.length()<b.length()) return (percentage < (100*hits/(a.length()-(n-1))));
else return (percentage < (100*hits/(b.length()-(n-1))));
}
bool MainWindow::isSimilar(QString a, QString b, qreal percentage, int n, Qt::CaseSensitivity caseSense)
//Iterates substrings in groups of n chars from a und finds these in b.
//The number of hits is then divided by the length of the shorter string.
//To properly take word beginnings and endings into account
//spaces are being inserted before and after the strings.
{
if (a.isEmpty()||b.isEmpty()) return false;
qreal hits=0;
a=QString(" ").repeated(n-1)+a+QString(" ").repeated(n-1);
b=QString(" ").repeated(n-1)+b+QString(" ").repeated(n-1);
QString part;
for (int i=0;i<a.count()-(n-1);i++)
{
part=a.mid(i,n);
if (b.contains(part,caseSense)) hits++;
}
if (a.length()<b.length()) return (percentage < (100*hits/(a.length()-(n-1))));
else return (percentage < (100*hits/(b.length()-(n-1))));
}
To copy to clipboard, switch view to plain text mode
For the name "Markus Bertram" I get these results:
- Bertram, Markus - 93,3
- Markus E. Bertram - 100
- Marcus Emil Bertram - 86,7
- marc bertram - 84,6 (case-insensitive)
- Martin Bertram - 73,3 (false)
- Martin Bergmann - 46,7 (false)
Bookmarks