Greetings !
Can someone point me to a demo or sample QT C++ code to load and parse HTML files at specific URLs ? (DHTML content)
Thanks !
Printable View
Greetings !
Can someone point me to a demo or sample QT C++ code to load and parse HTML files at specific URLs ? (DHTML content)
Thanks !
Why do you wanna do that ???
If you want to display HTML sites, you may have a look at QWebView from Qt 4.4 onwards :)
Because i am creating a crawler, a robot application that loads dhtml documents, extract some links, and recurse for each of those links...
So far, i've tested QHttp but it is not always working. Sometimes the pages load perfectly (i.e. http://www.google.ca)
and sometimes, it loads a "302 Found" dummy page or worst. (i.e any url that represents a google query.)
Well in that case I geuss u will have to parse the html file urself. Am not aware of such a class in Qt.
may be regular expressions might be of some help to u for parsing ...
.. or user Perl which has dedicated classes for this subject. Maybe Qt is not the best solution for your problem.
This probably won't be read byt eh original thread author, but i'll post anyway for the record.
You can use mozilla's engine "Gecko" to parse HTML or XML. go here and read :http://developer.mozilla.org/en/Gecko
hope this helps anyone.
read Qt Quarterly
is a sample to query image src ...
http://doc.trolltech.com/qq/qq25-webrobot.html
change it to query a/href
My method to load on qtextedit remote or local image is:
Code:
/// from http://www.qt-apps.org/content/show.php/XHTML+Wysiwyg+Qeditor?content=59493 void Load_Image_Connector() { expression.setMinimal(true); int iPosition = 0; int canna = 0; while( (iPosition = expression.indexIn( html , iPosition )) != -1 ) { canna++; dimage.append(semi1); /* image lista 1 */ AppendImage(semi1); /* image list local or remote */ iPosition += expression.matchedLength(); } }
other way is class ScribeParser
it parse the full document to find internal or external link
file http://fop-miniscribus.googlecode.co...panelcontrol.h
but all this way not parse javascript ... google robo bot engine it not can!