Results 1 to 3 of 3

Thread: String similarity check

  1. #1
    Join Date
    Jan 2012
    Location
    Dortmund, Germany
    Posts
    159
    Thanks
    69
    Thanked 10 Times in 8 Posts
    Qt products
    Qt4
    Platforms
    Windows Android

    Default String similarity check

    Is there any ready-made string similarity check in Qt?
    I've read about Levenshtein distance or soundex, is there anything like this built in?

    According to this thread (sorry, German language) I've implemented a counting of small substrings that gives me quite reasonable results for Names (which I need to compare) when I take substrings of n=2 chars and an 80% threshold.

    I'd like to know if there is a better routine fast enough that I can just take and use.

    This is what I use right now (corrections on style and speed are welcome!)

    header:
    Qt Code:
    1. bool isSimilar(QString a, QString b, qreal percentage=80, int n = 2, Qt::CaseSensitivity caseSense= Qt::CaseInsensitive);
    To copy to clipboard, switch view to plain text mode 

    implementation:
    Qt Code:
    1. bool MainWindow::isSimilar(QString a, QString b, qreal percentage, int n, Qt::CaseSensitivity caseSense)
    2. //Iterates substrings in groups of n chars from a und finds these in b.
    3. //The number of hits is then divided by the length of the shorter string.
    4. //To properly take word beginnings and endings into account
    5. //spaces are being inserted before and after the strings.
    6. {
    7. if (a.isEmpty()||b.isEmpty()) return false;
    8. qreal hits=0;
    9. a=QString(" ").repeated(n-1)+a+QString(" ").repeated(n-1);
    10. b=QString(" ").repeated(n-1)+b+QString(" ").repeated(n-1);
    11. QString part;
    12. for (int i=0;i<a.count()-(n-1);i++)
    13. {
    14. part=a.mid(i,n);
    15. if (b.contains(part,caseSense)) hits++;
    16. }
    17. if (a.length()<b.length()) return (percentage < (100*hits/(a.length()-(n-1))));
    18. else return (percentage < (100*hits/(b.length()-(n-1))));
    19. }
    To copy to clipboard, switch view to plain text mode 

    For the name "Markus Bertram" I get these results:
    • Bertram, Markus - 93,3
    • Markus E. Bertram - 100
    • Marcus Emil Bertram - 86,7
    • marc bertram - 84,6 (case-insensitive)
    • Martin Bertram - 73,3 (false)
    • Martin Bergmann - 46,7 (false)
    Last edited by sedi; 23rd June 2012 at 16:46. Reason: spelling corrections

  2. #2
    Join Date
    Sep 2011
    Posts
    1,241
    Thanks
    3
    Thanked 127 Times in 126 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: String similarity check

    don't know anything off hand.

    your current implementation should at least include this change

    Qt Code:
    1. const QString spaces = QString(" ").repeated(n-1);
    2. a = spaces + a + spaces;
    3. // etc
    To copy to clipboard, switch view to plain text mode 
    If you have a problem, CUT and PASTE your code. Do not retype or simplify it. Give a COMPLETE and COMPILABLE example of your problem. Otherwise we are all guessing the problem from a fabrication where relevant details are often missing.

  3. The following user says thank you to amleto for this useful post:

    sedi (23rd June 2012)

  4. #3
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: String similarity check

    The most basic approaches I know are Hamming distance and Edit Distance (aka Levenshtein Distance). There are other string metrics available, though:
    http://en.wikipedia.org/wiki/String_metric

    Edit distance is quite easy to implement efficiently using QtConcurrent or OpenCL..
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  5. The following user says thank you to wysota for this useful post:

    sedi (23rd June 2012)

Similar Threads

  1. Replies: 3
    Last Post: 8th June 2011, 06:36
  2. std:string how to change into system:string?
    By yunpeng880 in forum Qt Programming
    Replies: 1
    Last Post: 14th April 2009, 08:51
  3. qregexp to check string
    By mattia in forum Newbie
    Replies: 3
    Last Post: 19th February 2008, 14:13
  4. Int to String - manipulating string
    By mickey in forum General Programming
    Replies: 6
    Last Post: 5th November 2007, 20:11
  5. How to check if a string starts with a substring?
    By lni in forum Qt Programming
    Replies: 3
    Last Post: 18th April 2007, 00:36

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.