Results 1 to 20 of 20

Thread: Help with regular expression

  1. #1
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Exclamation Help with regular expression

    For QSyntaxHighlighter I need regular expression which still cannot create. Rules are:

    - word delimiters are:
    Qt Code:
    1. " ' ; . , : ( ) { } [ ]
    To copy to clipboard, switch view to plain text mode 
    or <any whyte space>
    - all sequences containing any other character, placed between delimeters are words, they must match to pattern only if they are completely equal to pattern
    - subwords inside words (i.e. "sub" inside "submarine" or "prosubmitter" or "subsub") must not match to pattern

    The rule must be single, not a set of rules. This is mandatory because of word pattern is taken from unknown list of words. It is only known that words cannot contain delimeters but all other characters they can contain. Proper words are:

    Qt Code:
    1. if
    2. =
    3. !=
    4. ==
    5. +
    6. -
    7. hello_world
    8. cdt*12ad
    9. +++---
    10. 12/5/1
    To copy to clipboard, switch view to plain text mode 
    and so on. Again - they must match as whole words.

    I tried several variants. Most complex part is with word match. If I create pattern like this:
    Qt Code:
    1. "\\b" + pattern + "\\b"
    To copy to clipboard, switch view to plain text mode 
    then only equal words match but words like "++" do not match. If I create pattern like:
    Qt Code:
    1. "(" + pattern + ")"
    To copy to clipboard, switch view to plain text mode 
    then any "+" or similar words match but even subwords match like this: "stop" becomes highlighted inside word "ifstopped".

    Anybody familiar with regular expressions, please help me. The "virtual beer" will be granted.

  2. #2
    Join Date
    Jul 2011
    Posts
    18
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Help with regular expression

    (.*)(?= ( delim1 | delim2 | ... cont) )

    then check for exact match
    run indexin and for though the captured texts i would think the correct word would show up in index 1.

    hope this helps

    For instant regex tests i use this web tool, (make sure to check "explain" this will break down and tell you what you regex is saying )
    http://myregextester.com/index.php

  3. #3
    Join Date
    Apr 2010
    Posts
    769
    Thanks
    1
    Thanked 94 Times in 86 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11

    Default Re: Help with regular expression

    I don't have time to look this up, so you'll want to check syntax, but something like

    ^[" ' ; . , : ( ) { }\ [\ ]](if|==|!=|=|+|-|hello_world|<whatever else>)^[" ' ; . , : ( ) { }\ [\ ]]

    should get you started. Note that the order of the items in the or'd middle clause is important; you generally want the more complex expressions which contain following simpler expressions to come first so they get matched before their simpler counterpart. The enclosing parentheses will let you extract the bit that matched. Note that members of the OR clause can themselves be REs; you need to pay close attention to the matching rules in such cases, though.

    It would probably be simpler to tokenize on the (rather huge) list of delimiters, then simply check string equality against your list of keywords rather than using REs in this case.

  4. #4
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    The sequence of patterns cannot be used. It must work in the cycle on patterns list like this:

    Qt Code:
    1. foreach( QString pattern, wordsQStringList )
    2. {
    3. rule.pattern = QRegExp( "\\b(" + pattern + ")\\b" );
    4. rule.format = format;
    5. highlightRules.append( rule );
    6. }
    To copy to clipboard, switch view to plain text mode 

    This is mandatory because of other highlighting algorithm parts. The very complex rule will slow down redrawing of large texts, up to 5000-6000 lines.

    That is why I'm still confused
    Last edited by Gourmet; 9th August 2011 at 19:35.

  5. #5
    Join Date
    Apr 2010
    Posts
    769
    Thanks
    1
    Thanked 94 Times in 86 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11

    Default Re: Help with regular expression

    Then do the same thing suggested above, but in a loop.

    I find it very hard to believe, though, that RE evaluation will be slower than a loop checking against all possibilities, even if the loop short-circuits. Especially the way you've formulated it here; you'll wind up re-compiling the RE on each iteration.

  6. #6
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    Then do the same thing suggested above, but in a loop.
    But HOW? This

    Qt Code:
    1. foreach( QString pattern, wordsQStringList )
    2. {
    3. rule.pattern = QRegExp( "^[\"';.,:(){}\[\]](" + pattern + ")^[\"';.,:(){}\[\]]" );
    4. rule.format = format;
    5. highlightRules.append( rule );
    6. }
    To copy to clipboard, switch view to plain text mode 

    doesn't work.

    About the speed - it would be better of course test which would be faster. This greatly depends from internal indexIn() implementation. But this idea use it cyclically taken from native Qt examples.

    Number of words could be logically unlimited (now there about 200).
    Last edited by Gourmet; 9th August 2011 at 21:47.

  7. #7
    Join Date
    Jul 2011
    Posts
    18
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Help with regular expression

    1. speed is O(n) you shouldnt be complaining.
    2. each punctuational delimiter should be pretended with a '\'
    3.have you even given thought about my implementation?

    from what i gather you a looking for a "word" surrounded by some sort of delimiters?
    and only capture Exact Matches?
    word = sub, thus capture sub but not submarine?

    Qt Code:
    1. QEegExp exp = QString( " exp " );
    2. foreach( word, wordlist)
    3. {
    4. if( exp.exactMatch(word) )
    5. {
    6. do something with the matched "word"
    7. }
    8. }
    To copy to clipboard, switch view to plain text mode 

    Read up on QRegExp, reserved characters ect:
    http://doc.qt.nokia.com/latest/qregexp.html

    What is expected input just a giant string of characters that you need to split up?
    what is expected output?

  8. #8
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    What is expected input just a giant string of characters that you need to split up?
    a program code on specific (not widely known) programming language placed inside QPlainTextEdit

    what is expected output?
    highlighted keywords of this language, they are placed inside wordsQStringList - this is mandatory and cannot be changed (the most important reason is - number of keywords can be different and they can be written on different human languages, not only by Latin characters and can include not only letters or digits, it is imported from outside highlighter and must be used inside highlighter)

    highlighting is implemented as

    Qt Code:
    1. void Syntaxer::highlightBlock(const QString &text)
    2. {
    3. foreach( HighRule rule, highlightRules )
    4. {
    5. QRegExp expression( rule.pattern );
    6. int index = expression.indexIn( text );
    7. while( index >= 0 )
    8. { // probably there is bug in indexIn, it returns zero even if text not found
    9. int length = expression.matchedLength();
    10. if( !length )
    11. break;
    12. setFormat( index, length, rule.format );
    13. index = expression.indexIn( text, index + length );
    14. }
    15. }
    16. }
    To copy to clipboard, switch view to plain text mode 

    as recommended in Qt examples. The exactMatch cannot be used cause text can contain different number of keywords.

    all other things work fine, they are debugged and cannot be changed - I only need create a rule to highlight those keywords, but I'm not a keen in Regular Expressions
    Last edited by Gourmet; 9th August 2011 at 22:55.

  9. #9
    Join Date
    Jul 2011
    Posts
    18
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Help with regular expression

    I would just recommend sitting down and reading the QT entry on QRegExp

    And Play around with some on this simple tester: http://myregextester.com/index.php

  10. #10
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    I read this all carefully. If I could make expression by myself I would not ask for advice. I'm already tired after hours of attempts. I have to LOTS of other things but I stalled on this.
    Last edited by Gourmet; 9th August 2011 at 23:27.

  11. #11
    Join Date
    Apr 2010
    Posts
    769
    Thanks
    1
    Thanked 94 Times in 86 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11

    Default Re: Help with regular expression

    If you're trying to parse a language, this is probably the worst way imaginable to do it. Go read about lex and yacc if you want to do this properly.

    The exactMatch cannot be used cause text can contain different number of keywords.
    So what? Every keyword is separated by delimiters and is considered alone. ExactMatch is EXACTLY what you want to use. Any of the solutions already discussed above will work just fine, although the resulting code will be a maintenance nightmare. To reiterate:

    1) Split your string into individual keywords.

    2) Match each keyword against the list of target keywords.

    Show your code for step one. This should be a one-liner in Qt, or a very small loop in C/C++. Provide an actual sample of text and target list; they way these keep changing in your descriptions makes me almost certain there's more to know here.

  12. #12
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    1) Split your string into individual keywords.

    2) Match each keyword against the list of target keywords.
    Not applicable, this will too slow. The solution must dance not from keywords but from delimiters. They are known and have very limited number. All other character sequences are keywords.

    But exactly the code to find delimiters can be almost the same as for step 1). And it can be based on indexIn() call.
    Last edited by Gourmet; 10th August 2011 at 10:22.

  13. #13
    Join Date
    Apr 2010
    Posts
    769
    Thanks
    1
    Thanked 94 Times in 86 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11

    Default Re: Help with regular expression

    You have no idea at all how "slow" this approach will be, because right now the performance of YOUR algorithm is infinitely bad - it produces absolutely no results no matter how long you wait.

    How about actually IMPLEMENTING something instead of whining about speed you can't even properly measure because you have nothing to compare it to? Only then will you be able to tell if performance is acceptable or not, and only then will you be able to compare performance of one approach over another.

    Hint: you are making this simple problem much, much, much too hard. Stop generating excuses and generate some code, instead.

    I'm not seeing any actual code, text or keyword examples. Obviously, the code doesn't exist, but the text and keyword examples are critical to understanding what your problem actually is, given that your descriptions are vague and continue to change.
    Last edited by SixDegrees; 10th August 2011 at 10:58.

  14. #14
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    I described problem and described a possible solution which I need. If you cannot help - then just do not troll me... Ignoring.

  15. #15
    Join Date
    Jul 2011
    Posts
    18
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Help with regular expression

    Quote Originally Posted by Gourmet View Post
    Not applicable, this will too slow. The solution must dance not from keywords but from delimiters. They are known and have very limited number. All other character sequences are keywords.
    QStringList keywords = inputTextLine.split( QRegExp( all delimiters or'd) )

    Ta freaking Da.
    No one is trolling you here, Six is correct you keep complaining, and making excuses. If you are not willing to learn via trial and error you will not learn at all. thus making me wonder why you are taking a programming class

  16. #16
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    I'm not taking programming class. I'm experienced enough in C/C++ and OOP after hundreeds of thousands working lines since 1989. Currently I debug up to 400-500 lines of C++ code per day using wide set of Qt classes. But I never used Perl-like regular expressions. That is why I asked for help but not to try "teach" me something... If I told - I need this kind of solution - that means only: I need this kind of solution. Not other kind but THIS KIND. A pattern to find keyword in string with known delimiters. I did not ask for help in coding. All other suggestions I could engineer by myself and not ask for help. If the solution I asked for is impossible - just tell: "it is impossible". Then I'll find other solution by myself without help. Or tell: "Sorry I cant' help you".

    Do not drop a wood bar if you have lifebuoy - drop bar only if you don't have lifebuoy. And notify about it mandatory. Otherwise drop lifebuoy but not bar. And never shout: "Teach swim yourself".
    Last edited by Gourmet; 10th August 2011 at 17:19.

  17. #17
    Join Date
    Jul 2011
    Posts
    18
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Help with regular expression

    I'm going to deal with this lack of appreciation, Six and I have given you multiple way to do this, If you are not going to even try then I'm done.

    I learned Regex on the fly, took me less than one hour to pick up on the simplicity of regexs.

    Step one: split the code line (write one regex for this, my previous post)
    Step two: see if a word is a keyword( write another regex for this )
    Step three move on to next word or line. ( loops!)

    Qt Code:
    1. Execution Time(sec.):
    2. 0.000009
    3.  
    4. Raw Match Pattern:
    5. (.*)(\"|\'|\;|\.|\,|\:|\(|\)|\{|\}|\[|\])(.*)?
    6.  
    7. Match Pattern Explanation:
    8. The regular expression:
    9.  
    10. (?-imsx:(.*)("|'|;|\.|,|:|\(|\)|\{|\}|\[|\])(.*)?)
    11.  
    12. matches as follows:
    13.  
    14. NODE EXPLANATION
    15. ----------------------------------------------------------------------
    16. (?-imsx: group, but do not capture (case-sensitive)
    17. (with ^ and $ matching normally) (with . not
    18. matching \n) (matching whitespace and #
    19. normally):
    20. ----------------------------------------------------------------------
    21. ( group and capture to \1:
    22. ----------------------------------------------------------------------
    23. .* any character except \n (0 or more times
    24. (matching the most amount possible))
    25. ----------------------------------------------------------------------
    26. ) end of \1
    27. ----------------------------------------------------------------------
    28. ( group and capture to \2:
    29. ----------------------------------------------------------------------
    30. " '"'
    31. ----------------------------------------------------------------------
    32. | OR
    33. ----------------------------------------------------------------------
    34. ' '\''
    35. ----------------------------------------------------------------------
    36. | OR
    37. ----------------------------------------------------------------------
    38. ; ';'
    39. ----------------------------------------------------------------------
    40. | OR
    41. ----------------------------------------------------------------------
    42. \. '.'
    43. ----------------------------------------------------------------------
    44. | OR
    45. ----------------------------------------------------------------------
    46. , ','
    47. ----------------------------------------------------------------------
    48. | OR
    49. ----------------------------------------------------------------------
    50. : ':'
    51. ----------------------------------------------------------------------
    52. | OR
    53. ----------------------------------------------------------------------
    54. \( '('
    55. ----------------------------------------------------------------------
    56. | OR
    57. ----------------------------------------------------------------------
    58. \) ')'
    59. ----------------------------------------------------------------------
    60. | OR
    61. ----------------------------------------------------------------------
    62. \{ '{'
    63. ----------------------------------------------------------------------
    64. | OR
    65. ----------------------------------------------------------------------
    66. \} '}'
    67. ----------------------------------------------------------------------
    68. | OR
    69. ----------------------------------------------------------------------
    70. \[ '['
    71. ----------------------------------------------------------------------
    72. | OR
    73. ----------------------------------------------------------------------
    74. \] ']'
    75. ----------------------------------------------------------------------
    76. ) end of \2
    77. ----------------------------------------------------------------------
    78. ( group and capture to \3 (optional
    79. (matching the most amount possible)):
    80. ----------------------------------------------------------------------
    81. .* any character except \n (0 or more times
    82. (matching the most amount possible))
    83. ----------------------------------------------------------------------
    84. )? end of \3 (NOTE: because you're using a
    85. quantifier on this capture, only the LAST
    86. repetition of the captured pattern will be
    87. stored in \3)
    88. ----------------------------------------------------------------------
    89. ) end of grouping
    90. ----------------------------------------------------------------------
    91.  
    92. $matches Array:
    93. (
    94. [0] => Array
    95. (
    96. [0] => hello:goodbye
    97. )
    98.  
    99. [1] => Array
    100. (
    101. [0] => hello
    102. )
    103.  
    104. [2] => Array
    105. (
    106. [0] => :
    107. )
    108.  
    109. [3] => Array
    110. (
    111. [0] => goodbye
    112.  
    113. **ignore "?-imsx:" that is looking at the possible flags you can set
    To copy to clipboard, switch view to plain text mode 
    Last edited by jacks916; 10th August 2011 at 18:16. Reason: updated contents

  18. #18
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    Looks like you all did not read problem carefully... Your suggestions are not applicable. The main task is highlight keywords in the text. That means - not the presence of word in text must be confirmed. Instead the position of text to highlight must be encountered. Split list is absolutely useless in this case. It only can confirm or decline the presence of keyword inside text. But cannot give it's index.

    Again - I have a list of keywords. They must be found in QString to know their positions inside this string. The set of delimiters is also known. Anybody more familiar can help?

    BTW: looks like both previous "helpers" even don't know how highlighter in Qt works...
    Last edited by Gourmet; 11th August 2011 at 12:51.

  19. #19
    Join Date
    Jul 2011
    Posts
    18
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Help with regular expression

    I like how you just throw out what we have inputted. good luck.

  20. #20
    Join Date
    Apr 2011
    Posts
    31
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows Android

    Default Re: Help with regular expression

    Your input does not give me anything useful except possibly this: (.*)(\"|\'|\;|\.|\,|\:|\(|\)|\{|\}|\[|\])(.*)?
    But this is subject to check.

Similar Threads

  1. Regular Expression Problem
    By kaushal_gaurav in forum Qt Programming
    Replies: 2
    Last Post: 27th February 2009, 10:41
  2. set a regular expression on QTextEdit
    By mattia in forum Newbie
    Replies: 3
    Last Post: 27th March 2008, 11:16
  3. Regular expression in QLineEdit?
    By vishal.chauhan in forum Qt Programming
    Replies: 3
    Last Post: 1st October 2007, 11:58
  4. Find with Regular Expression?
    By vishal.chauhan in forum Qt Programming
    Replies: 1
    Last Post: 1st August 2007, 15:44
  5. How to get a QString from a file (use regular expression)
    By fengtian.we in forum Qt Programming
    Replies: 16
    Last Post: 31st May 2007, 12:06

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.