Results 1 to 3 of 3

Thread: Weird problem: multithread QT app kills my linux

  1. #1
    Join Date
    Jul 2007
    Posts
    11
    Thanks
    2
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11

    Default Weird problem: multithread QT app kills my linux

    Hello,
    I'm having a very weird problem, which may not be related to Qt, but since it shows up only when I use a Qt app in a specific situation, here it is....
    In short: I have a Qt application which is able to kill my system, but only in very specific circumstances. Some background to see the full picture:

    - I have a two-PC setup, one running windows (XP) one running linux (Mandriva 2008.1, i586) connected by an ethernet wire at 100Mb/s

    - on the windows machine, there's a screenshot-taking program (something like fraps) which takes one screenshot every two seconds and saves it on the "Z:" disk drive (screenshot size: 1280x1024 uncompressed BMP).

    - the "Z:" disk in not on the windows box, it's a remote disk shared from the linux machine, using samba. The share is public and R/W, but I'm on a local network very much isolated from the outside world.

    - the linux machine has then a directory (let's call it "win-screenshots") which is filled up with time. The BMP images are 4Megs each, which means an average network load.

    - I have two applications: called "screen-analysis-old" and "screen-analysis-new" which I can run on the linux machine and which analyze the screenshot and extract data from them. The UI of the programs is not complex, basically just some read-only QTextEdits used to display the text obtained from the screenshots by some very primitive OCR.

    - "screen-analysis-old" works in the following way: it has a main window and a QTimer generating "check for new images" events every 250ms. These events run a function of the mainwindow class which performs the image processing and display the result in the QTextEdits. I've run this application for 100+ hours without ever encountering any problem. The processed screenshots are moved to a "save" directory using a system("mv", ...) command.

    - with time, the amount of processing I do has increased, and since the processing runs in the same thread as the UI management the slowdown became noticeable. So I developed a new application (strongly based on the old one) which uses two separate threads.

    - "screen-analysis-new" has one thread taking care of the UI (the main thread) and a secondary thread (subclassed from QThread) which takes care of the processing. This second thread communicates with the main one through a shared data area (a QStringList, protected by mutex) and a signal emitted by the thread which is caught by the main thread to use the data in the shared area to update the QTextEdits. The approach is the same, I sleep 250ms then check for new images and process any found. The screenshots here can be either deleted or (as in the old app) moved to a save directory. Either option has no influence on the problem.

    - this new application is able to kill my linux box, in the following way: everything freezes (mouse cursor frozen, no keyboard response, no network response: connecting via ssh returns timeout), but I can still perform an alt-sysreq-B to reboot the system, which means that the fundamental kernel stuff is still running. The freeze can occur fast (last one was after less than 500 images processed), and it's random (at times after 50 images, maximum I've seen was a bit more than 1h, so around 2k images).

    What kind of testing would you suggest to locate where the problem is?

    In order to see if the problem is the app or the kernel (I may have hit some weird kernel-lock stuff) I've run it "offline", i.e. I have prepared around 5k images in the "win-screenshots" directory and then let the app run to analyze them. It worked without any problem. This would seem to indicate that the application by itself works fine.

    I've not yet tested what happens if instead of putting all the 5k images at once, I copy them once every 2 seconds from a script running on the linux box (i.e. no samba shares running). This may be a test worth doing in case it's a "file being accessed by two processes" locking problem. But if this were the case the non-multithreaded app would crash as well, sooner or later.

    The fact that only the 2nd app crashes the system seems to indicate that it's something related to the multi-threading, but at the same time it's not the app itself, or it would crash also on the "offline" test, more like some weird interaction of multithreads and the samba sharing.

    I've not tested running an alternate version of the kernel, or reinstalling the whole system with a x64 kernel.

    Another possible test would be to rip the multithread part out of the second app, and have the processing again be run by a QTimer even in the main thread. This requires some changes, but it's doable.

    What I'd like to know is: have any of you had any problem similar to this? What would you suggest that I try to do to locate the problem?

    Apologies for the long post (which may be somewhat OT..) and thanks in advance.

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Weird problem: multithread QT app kills my linux

    Qt on its own can't kill your system, but a faulty graphics driver can. Compile your application in debug mode, start it from the console and check whether there are any messages there.

    Other possibility is that your system is alive, but unresponsive because it has run out of RAM.

  3. #3
    Join Date
    Jul 2007
    Posts
    11
    Thanks
    2
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11

    Default Re: Weird problem: multithread QT app kills my linux

    I monitored the ram usage and it's ok. At the same time when ram runs out you get massive disk activity, which is not the case.

    A faulty video driver would kill the display, but the machine would remain reachable via SSH as in with the initial Nvidia driver releases.

    I may have located the problem: I dropped the patched Mandriva kernel (which includes some preempt and similar stuff) and loaded up a vanilla kernel+open source nv driver. Everything works fine. I've reinstalled the nvidia driver and will test vanilla+nvidia driver today.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.