Results 1 to 8 of 8

Thread: QRegExp Help; remove all html tag

  1. #1
    Join Date
    May 2006
    Posts
    788
    Thanks
    49
    Thanked 48 Times in 46 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default QRegExp Help; remove all html tag

    I wand to remove all HTML tag to reformat document ...
    Tidy can not make the job...

    I test QString::remove & QRegExp line 10 and line 11 remove the close tag .. now i wand to remove the open tag i tested line 13 but .. remove all..
    How can i make this?...


    Qt Code:
    1. QString QLess::CleanTag( QString body )
    2. {
    3. qDebug() << "### start clean tag ";
    4. body.replace("<br>","##break##");
    5. body.replace("</br>","##break##");
    6. body.replace("</p>","##break##");
    7. body.replace("</td>","##break##");
    8. body.remove(QRegExp("<head>(.*)</head>"));
    9. body.remove(QRegExp("<form(.*)</form>"));
    10. body.remove(QRegExp("</(div|span|tr|td|br|body|html|tt|a|strong|p)>"));
    11. body.remove(QRegExp("</(DIV|SPAN|TR|TD|BR|BODY|HTML|TT|A|STRONG|P)>"));
    12. /*body.remove(QRegExp("<(div|span|tr|td|br|body|html|tt|a|strong|p)>"));*/
    13. /*body.remove(QRegExp("<(div|span|tr|td|br|body|html|tt|a|strong|p)( )(.*)(!>)>"));*/
    14. qDebug() << "### newbody " << body;
    15. return body;
    16. }
    To copy to clipboard, switch view to plain text mode 

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    You need something like:
    Qt Code:
    1. body.remove( QRegExp( "<(?:div|span|tr|td|br|body|html|tt|a|strong|p)[^>]*>", Qt::CaseInsensitive ) );
    To copy to clipboard, switch view to plain text mode 

  3. The following 2 users say thank you to jacek for this useful post:

    patrik08 (27th July 2006), tpf80 (2nd December 2009)

  4. #3
    Join Date
    May 2006
    Posts
    788
    Thanks
    49
    Thanked 48 Times in 46 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    tanks ... the open tag is going out ... now stay only...

    Qt Code:
    1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
    2. <!--UdmComment-->
    To copy to clipboard, switch view to plain text mode 

  5. #4
    Join Date
    May 2006
    Posts
    788
    Thanks
    49
    Thanked 48 Times in 46 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    Now is run and clean all tag:

    Qt Code:
    1. QString QLess::CleanTag( QString body )
    2. {
    3. qDebug() << "### start clean tag ";
    4. body.replace("<br>","##break##");
    5. body.replace("</br>","##break##");
    6. body.replace("</p>","##break##");
    7. body.replace("</td>","##break##");
    8. body.remove(QRegExp("<head>(.*)</head>"));
    9. body.remove(QRegExp("<form(.*)</form>"));
    10. body.remove( QRegExp( "<(.)[^>]*>"));
    11. qDebug() << "### newbody " << body;
    12. return body;
    13. }
    To copy to clipboard, switch view to plain text mode 

  6. #5
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    Quote Originally Posted by patrik08
    body.remove(QRegExp("<form(.*)</form>"));
    What if a page contains more than one form?

  7. #6
    Join Date
    May 2006
    Posts
    788
    Thanks
    49
    Thanked 48 Times in 46 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    Quote Originally Posted by jacek
    What if a page contains more than one form?
    I hope ... body.remove( QRegExp( "<(.)[^>]*>"));
    remove 2° inside form tag.... but on my CMS is only News article ... to reformat color and Style... I replace new break-line and go to tidy to controll....

  8. #7
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    Quote Originally Posted by patrik08
    I hope ... body.remove( QRegExp( "<(.)[^>]*>"));
    remove 2° inside form tag....
    Then you should better try your code on:
    [html]text1
    <form>form1</form>
    text2
    <form>form2</form>
    text3[/html]
    hint

  9. #8
    Join Date
    May 2006
    Posts
    788
    Thanks
    49
    Thanked 48 Times in 46 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: QRegExp Help; remove all html tag

    Quote Originally Posted by jacek
    Then you should better try your code on:
    [html]text1
    <form>form1</form>
    text2
    <form>form2</form>
    text3[/html]
    hint

    Now take moore as on form and java scripts or style...

    Run so...


    Qt Code:
    1. QString QLess::CleanTag( QString body )
    2. {
    3. qDebug() << "### start clean tag "; /* &nbsp; */
    4. body.replace("&nbsp;"," ");
    5. body.replace("<br>","##break##");
    6. body.replace("</br>","##break##");
    7. body.replace("</p>","##break##");
    8. body.replace("</td>","##break##");
    9. body.remove(QRegExp("<head>(.*)</head>",Qt::CaseInsensitive));
    10. body.remove(QRegExp("<form(.)[^>]*</form>",Qt::CaseInsensitive));
    11. body.remove(QRegExp("<script(.)[^>]*</script>",Qt::CaseInsensitive));
    12. body.remove(QRegExp("<style(.)[^>]*</style>",Qt::CaseInsensitive));
    13. body.remove(QRegExp("<(.)[^>]*>"));
    14. body.replace("##break##","</br>");
    15. qDebug() << "### newbody " << body;
    16. return body;
    17. }
    To copy to clipboard, switch view to plain text mode 


    html result:
    Qt Code:
    1. text1
    2.  
    3. text2
    4.  
    5. text3
    To copy to clipboard, switch view to plain text mode 

Similar Threads

  1. Replies: 6
    Last Post: 13th February 2014, 12:46
  2. need help for my QRegExp
    By patcito in forum Qt Programming
    Replies: 1
    Last Post: 27th May 2006, 16:29
  3. Replies: 1
    Last Post: 17th March 2006, 08:01
  4. [Qt 4.1]using html in QTextEdit from designer
    By patcito in forum Qt Programming
    Replies: 5
    Last Post: 16th January 2006, 22:36
  5. QSettings again ... how to remove array elements
    By Mike in forum Qt Programming
    Replies: 4
    Last Post: 11th January 2006, 08:58

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.