Results 1 to 3 of 3

Thread: QRegExp to capture HTML link

  1. #1
    Join Date
    Feb 2007
    Location
    Wroclaw, Poland
    Posts
    72
    Thanks
    6
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Question QRegExp to capture HTML link

    Hi.
    I want to extract link from few pages. First I used QWebFrame and it worked, but why use cannon to shot fly.
    Now I'm trying to use RegExp to capture those links. On the other hand I could use XMLParse, but using many parsers for different page layout seems ... to much code.

    So, searched link is
    Qt Code:
    1. <a href="/?p=AMD+X2+II+555+AM3+B" class="produkt" title="Opis"><span class="produkt">AMD Phenom II X2 555 Black Edition s.AM3 BOX</span></a>
    To copy to clipboard, switch view to plain text mode 
    I tried with
    Qt Code:
    1. <a [\d\w= ]*(class)=\"(produkt)\">
    To copy to clipboard, switch view to plain text mode 
    but it was too greedy.
    Any suggestion?

  2. #2
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: QRegExp to capture HTML link

    QRegExp::setMinimal() may help but regular expressions to match nested delimiter pairs (i.e <>, "", etc.) is difficult to get right. The situation is not helped by inconsistent HTML... for example upper/lower case, single/double/no quote, valueless attributes etc.

  3. #3
    Join Date
    Feb 2007
    Location
    Wroclaw, Poland
    Posts
    72
    Thanks
    6
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QRegExp to capture HTML link

    Yes, I tried setting this minimal flag.
    Qt Code:
    1. const QString cstrProduktLinkRegExp("<a .*class=\"produkt\">");
    2. QRegExp stProductRow(cstrProduktLinkRegExp,Qt::CaseSensitive,QRegExp::RegExp2);
    3. stProductRow.setMinimal(true);
    4. iStart = strSearchTable.indexOf(stProductRow,iStart);
    5. if (stProductRow.matchedLength()==0)
    6. return -1;
    7. QString strProductRow = stProductRow.cap();
    To copy to clipboard, switch view to plain text mode 

    Still - he finds first matching <A> and finish regexp AFTER first appearance of class="produkt".

Similar Threads

  1. QRegExp for extracting the string between two HTML tags...
    By tuthmosis in forum Qt Programming
    Replies: 3
    Last Post: 27th May 2010, 06:55
  2. Html link color
    By sreedhar in forum Qt Programming
    Replies: 9
    Last Post: 5th September 2008, 14:36
  3. QString::replace() with QRegExp capture modification
    By Lykurg in forum Qt Programming
    Replies: 1
    Last Post: 4th March 2008, 09:50
  4. QPrinter::PdfFormat html format && Link
    By patrik08 in forum Qt Programming
    Replies: 2
    Last Post: 8th April 2007, 12:37
  5. QRegExp Help; remove all html tag
    By patrik08 in forum Qt Programming
    Replies: 7
    Last Post: 27th July 2006, 13:40

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.