Results 1 to 9 of 9

Thread: Best way to load and parse an HTML file ??

  1. #1
    Join Date
    Jul 2008
    Posts
    12
    Qt products
    Qt4
    Platforms
    Windows

    Default Best way to load and parse an HTML file ??

    Greetings !

    Can someone point me to a demo or sample QT C++ code to load and parse HTML files at specific URLs ? (DHTML content)

    Thanks !

  2. #2
    Join Date
    Oct 2006
    Location
    New Delhi, India
    Posts
    2,467
    Thanks
    8
    Thanked 334 Times in 317 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    Why do you wanna do that ???
    If you want to display HTML sites, you may have a look at QWebView from Qt 4.4 onwards

  3. #3
    Join Date
    Jul 2008
    Posts
    12
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by aamer4yu View Post
    Why do you wanna do that ???
    If you want to display HTML sites, you may have a look at QWebView from Qt 4.4 onwards
    Because i am creating a crawler, a robot application that loads dhtml documents, extract some links, and recurse for each of those links...

    So far, i've tested QHttp but it is not always working. Sometimes the pages load perfectly (i.e. http://www.google.ca)
    and sometimes, it loads a "302 Found" dummy page or worst. (i.e any url that represents a google query.)

  4. #4
    Join Date
    Oct 2006
    Location
    New Delhi, India
    Posts
    2,467
    Thanks
    8
    Thanked 334 Times in 317 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    Well in that case I geuss u will have to parse the html file urself. Am not aware of such a class in Qt.
    may be regular expressions might be of some help to u for parsing ...

  5. #5
    Join Date
    Feb 2008
    Posts
    50
    Thanks
    1
    Thanked 2 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by tuthmosis View Post
    Because i am creating a crawler, a robot application that loads dhtml documents, extract some links, and recurse for each of those links...

    So far, i've tested QHttp but it is not always working. Sometimes the pages load perfectly (i.e. http://www.google.ca)
    and sometimes, it loads a "302 Found" dummy page or worst. (i.e any url that represents a google query.)
    Sometimes Google displays captcha because of suspecting Bot search. That`s probably your 302 problem - 302 response code means redirected.
    If you want to parse html... and fetch the links ... use RegExp to do it.

  6. #6
    Join Date
    Jan 2006
    Posts
    368
    Thanks
    14
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    .. or user Perl which has dedicated classes for this subject. Maybe Qt is not the best solution for your problem.

  7. #7
    Join Date
    May 2008
    Posts
    4
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    This probably won't be read byt eh original thread author, but i'll post anyway for the record.

    You can use mozilla's engine "Gecko" to parse HTML or XML. go here and read :http://developer.mozilla.org/en/Gecko

    hope this helps anyone.

  8. #8
    Join Date
    Jul 2008
    Posts
    12
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by mave-rick View Post
    This probably won't be read byt eh original thread author, but i'll post anyway for the record.

    You can use mozilla's engine "Gecko" to parse HTML or XML. go here and read :http://developer.mozilla.org/en/Gecko

    hope this helps anyone.
    WOW... Thanks mave-rick !!!
    I hope this does what it claims !... In parsing stuff...

    I'll try to find wrapping class to ease it's usage with C++.... Eclipse and QT.

    Tahnks again

  9. #9
    Join Date
    May 2006
    Posts
    788
    Thanks
    49
    Thanked 48 Times in 46 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by tuthmosis View Post
    Greetings !

    Can someone point me to a demo or sample QT C++ code to load and parse HTML files at specific URLs ? (DHTML content)

    Thanks !
    read Qt Quarterly
    is a sample to query image src ...
    http://doc.trolltech.com/qq/qq25-webrobot.html
    change it to query a/href

    My method to load on qtextedit remote or local image is:

    Qt Code:
    1. /// from http://www.qt-apps.org/content/show.php/XHTML+Wysiwyg+Qeditor?content=59493
    2.  
    3. void Load_Image_Connector()
    4. {
    5. QRegExp expression( "src=[\"\'](.*)[\"\']", Qt::CaseInsensitive );
    6. expression.setMinimal(true);
    7. int iPosition = 0;
    8. int canna = 0;
    9. while( (iPosition = expression.indexIn( html , iPosition )) != -1 ) {
    10. QString semi1 = expression.cap( 1 );
    11. canna++;
    12. dimage.append(semi1); /* image lista 1 */
    13. AppendImage(semi1); /* image list local or remote */
    14. iPosition += expression.matchedLength();
    15. }
    16. QTimer::singleShot(1, this, SLOT(GetRemoteFile()));
    17. }
    To copy to clipboard, switch view to plain text mode 

    other way is class ScribeParser
    it parse the full document to find internal or external link
    file http://fop-miniscribus.googlecode.co...panelcontrol.h


    but all this way not parse javascript ... google robo bot engine it not can!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.