Results 1 to 4 of 4

Thread: Website crawler | find html tags ("a" "h" "meta" ...)

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Mar 2015
    Posts
    2
    Thanks
    1
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Website crawler | find html tags ("a" "h" "meta" ...)

    Hey,

    thanks for the reply.

    With your hints I have adapted my code to the follow and it works:

    Qt Code:
    1. void Crawler::crawl_Page()
    2. {
    3. frame = new QWebPage(this);
    4.  
    5.  
    6. QWebSettings::setObjectCacheCapacities(0,0,0);
    7. frame->settings()->setAttribute(QWebSettings::LocalContentCanAccessFileUrls,false);
    8. frame->settings()->setAttribute(QWebSettings::LocalContentCanAccessRemoteUrls,false);
    9.  
    10. QObject::connect(frame->mainFrame(), SIGNAL(loadFinished(bool)),
    11. this, SLOT(parsingWork()));
    12.  
    13.  
    14. QFile* file = new QFile("D:/tempfile.txt");
    15.  
    16. if(file->open(QIODevice::ReadOnly | QIODevice::Text))
    17. {
    18. qDebug() << "Open tempfile ";
    19. QString htmlContent = file->readAll();
    20.  
    21. qDebug() << "Count Chars :: " << htmlContent.count();
    22. frame->mainFrame()->setHtml(htmlContent);
    23.  
    24. doc = frame->mainFrame()->documentElement();
    25. }
    26. }
    27.  
    28.  
    29. void Crawler::parsingWork()
    30. {
    31. qDebug() << "Start parsing content .....";
    32.  
    33. QWebElementCollection linkCollection = doc.findAll("a");
    34. qDebug() << "Found " << linkCollection.count() << " links";
    35.  
    36. foreach (QWebElement link, linkCollection)
    37. {
    38. qDebug() << "found link " << link.attribute("href");
    39. }
    40.  
    41. qDebug() << "stop parsing content .....";
    42. }
    To copy to clipboard, switch view to plain text mode 

  2. #2
    Join Date
    Jan 2006
    Location
    Graz, Austria
    Posts
    8,416
    Thanks
    37
    Thanked 1,544 Times in 1,494 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Website crawler | find html tags ("a" "h" "meta" ...)

    I would recommend to allocate your QFile onf the stack, no need to allocate on the heap since you only need in the scope of that function.

    Has the nice side effect that you don't have to delete it manually (which you currently miss).

    Also, if crawl_page is called more than once, you might want to explicitly delete the QWebPage at some point, e.g. using deleteLater() in the slot connected to loadFinished()

    Cheers,
    _

Similar Threads

  1. cmake error with Failed to find "glu32" in ""
    By kennethadammiller in forum Qt Programming
    Replies: 2
    Last Post: 6th September 2024, 10:39
  2. Replies: 0
    Last Post: 6th December 2012, 16:54
  3. Replies: 3
    Last Post: 15th February 2010, 17:27
  4. Replies: 3
    Last Post: 8th July 2008, 19:37
  5. Translation QFileDialog standart buttons ("Open"/"Save"/"Cancel")
    By victor.yacovlev in forum Qt Programming
    Replies: 4
    Last Post: 24th January 2008, 19:05

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.