Results 1 to 14 of 14

Thread: QList of variable data type?

  1. #1
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default QList of variable data type?

    In my program I import a set of millions of 3D points with multiple properties each (RGB values, XYZ values, alpha, etc) from an external file and store it in memory. Rather than storing them as a simple array, I was hoping to store them in a Qt container class like QList so that I could tap into Qt's sorting capabilites. In order to conserve memory, I've written several class-based point data types to choose from—some with XYZ only, others that store XYZ and RGB data, etc—all that inherit from a core data type:
    Qt Code:
    1. class p3d_core_t {/*variables here*/}
    2. class p3d_0_t : public p3d_core_t {/*variables here*/}
    3. class p3d_1_t : public p3d_core_t {/*variables here*/}
    4. class p3d_2_t : public p3d_core_t {/*variables here*/}
    To copy to clipboard, switch view to plain text mode 
    A given data file specifies in its header which data type it contains (0, 1, 2, ..., N), so I don't know which type to store in my QList until run-time. I have two questions:
    1. How do I construct a new QList during run-time once I know which data type I'm importing?
    2. Is it possible to list the data types themselves in a QHash so that I can index them instead of having a series of if-else statements? For example, something like this
      Qt Code:
      1. int dataTypeIndex = 2;
      2. QList* list;
      3. QHash<int, pointdatatype> hash;
      4. list = new QList<hash.value(dataTypeIndex)>
      To copy to clipboard, switch view to plain text mode 
      would create a QList of p3d_2_t data, ready for importing.

    I can hardcode this with a huge set of if-else statements, but I have a feeling there's an elegant "Qt way" to do this, especially since the number of point data types changes periodically. I'd rather it be a little more flexible. Because these data sets get huge, even a little memory or speed boost makes a huge difference.

    Thanks in advance!

    [edit]
    I'd also like to add that I'm open to alternatives to QList or QHash that are more appropriate for lists that get into the 10-999 million member range.
    [/edit]
    Last edited by Phlucious; 16th November 2011 at 01:00.

  2. #2
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    I suppose that I could achieve the desired functionality using the Property System with a QList of QObjects, one QObject for every point. However, wouldn't doing so significantly inflate the memory use of the point dataset? When you have 100,000,000 QObject points, even a couple bytes per point makes a big difference.

  3. #3
    Join Date
    Jan 2009
    Location
    Germany
    Posts
    131
    Thanks
    11
    Thanked 16 Times in 16 Posts
    Qt products
    Qt3 Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: QList of variable data type?

    Thats not possible since c++ templates are evaluated at compile time and not at runtime.

  4. #4
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    Apparently QList pointers require a type, too, so I'm going to have to re-evaluate how I will accomplish this. I don't want to have to create an array of 1,000,000 structs that each occupy 423 bytes (v8) when I'm importing a data version that only occupies 67 bytes (v2), just to account for the possibility that I might import v8 data.

    Versioning sure makes things tricky... Know any good resources for that? I'm always willing to learn.

    The QList documentation mentions that it's optimized for <1000 members. Anyone know if it can manage 100,000,000 members efficiently? I'm thinking a basic C array allocated with malloc() may be the best (most efficient) option.

  5. #5
    Join Date
    Nov 2010
    Posts
    315
    Thanked 53 Times in 51 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows Symbian S60 Maemo/MeeGo

    Default Re: QList of variable data type?

    I don't know what are you trying achieve, but IMHO you should use simple data format.
    Using polymorphism or other universal solution cost memory for handling type recognition and storing information about heap (in case of QVariant copy will be kept in heap).
    If you have only 1M of small/average size elements then soring them in single fragment of memory would be more effective (423 byte it is small size and will give you about 500 GB size). QVector or QVarLengthArray can be better choice.
    For storing 100M of average size elements you have to store data on hard drive cache. Qt has some templates to handle that:
    QCache - for random access,
    QContiguousCache - for sequential access (this is probably for you).

    Without details it is hard to suggest something more.

  6. The following user says thank you to MarekR22 for this useful post:

    Phlucious (17th November 2011)

  7. #6
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: QList of variable data type?

    If your data types are all derived from a common base class, why not make your QList a list of pointers to the base class instead of a list of actual object instances? You could then use virtual methods of the base class to get at the actual content of the real instances.

    Consider using std::vector<> instead of QList<> or QVector<>. The Qt purists here may disagree, but for PC platforms, I think the C++ Standard Library is better for large containers. While I don't typically have containers of 100 million things, I do have containers with a million or more and use STL containers without hesitation.

    If you haven't ever read about C++ template metaprogramming, you should check into that. Much of it is concerned with how to write algorithms on containers of arbitrary things.

  8. The following user says thank you to d_stranz for this useful post:

    Phlucious (17th November 2011)

  9. #7
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    Generally speaking, my application is to open, manipulate, and close a LiDAR point cloud data file. These files can reach a few GB in size—each!—with around 1-500 million points apiece. Data quantities add up pretty fast when you collect around 120,000 points per second, minimum.
    Quote Originally Posted by d_stranz View Post
    If your data types are all derived from a common base class, why not make your QList a list of pointers to the base class instead of a list of actual object instances? You could then use virtual methods of the base class to get at the actual content of the real instances.
    I like the idea of using virtual methods declared by the base class. I didn't know that calling the virtual method of a base class pointer would call the method of the child it's pointing to. For example,
    Qt Code:
    1. p3d_1_t pt1; //p3d_1_t's constructor calls p3d_1_t::clear(), which sets x=0
    2. p3d_core_t* pt1_ptr = &pt1; //p3d_core_t's constructor calls p3d_core_t::clear(), which sets x=9999
    3. pt1_ptr->clear(); //this calls p3d_1_t::clear(), not p3d_core_t::clear(), even though the pointer's type is p3d_core_t*
    To copy to clipboard, switch view to plain text mode 
    would allow me to code in a generalized fashion given a set of get/set functions in the core class for each variable, even though not all of the derived classes use every variable. However...
    Quote Originally Posted by MarekR22 View Post
    Using polymorphism or other universal solution cost memory for handling type recognition and storing information about heap (in case of QVariant copy will be kept in heap).
    That's basically my concern with using a polymorphic class-based approach. I'm unsure how class functions are stored in memory or in the executable when you have multiple instances of the same class. How much overhead is added by having 100,000,000 instances with get/set functions? Do the functions occupy space in memory for each instance or are they shared somehow? As I understand it, static class functions are shared across every instance, but I don't know how normal non-static functions are stored/called. I'd rather not make them all static and force myself to pass a "this" every time, but I'll do it if it means saving that much memory. I've already pushed past 4 gis of RAM. Speed's the main issue, though.

    btw, I've pretty much abandoned the QObject::setProperty() concept, so I'm not going to bother with QVariants and type conversions.
    Quote Originally Posted by MarekR22 View Post
    If you have only 1M of small/average size elements then soring them in single fragment of memory would be more effective (423 byte it is small size and will give you about 500 GB size). QVector or QVarLengthArray can be better choice.
    For storing 100M of average size elements you have to store data on hard drive cache. Qt has some templates to handle that:
    QCache - for random access,
    QContiguousCache - for sequential access (this is probably for you).
    I believe that I'm looking for random access, so QCache is probably more applicable. I'd forgotten and misunderstood how those worked. Would it make sense to process the data in one QThread and retrieve it from the disk in another thread? Otherwise I don't see how a QCache would help me in this case because I've always had HUGE slowdowns when I try to process off the hard drive.

    It'd be nice if I could spatially sort the data somehow (bin and grid the data according to XYZ values) using prebuilt methods, but that's probably asking too much of the Qt library.
    Quote Originally Posted by d_stranz View Post
    Consider using std::vector<> instead of QList<> or QVector<>. The Qt purists here may disagree, but for PC platforms, I think the C++ Standard Library is better for large containers. While I don't typically have containers of 100 million things, I do have containers with a million or more and use STL containers without hesitation.

    If you haven't ever read about C++ template metaprogramming, you should check into that. Much of it is concerned with how to write algorithms on containers of arbitrary things.
    I've been leery of STL containers because I had so many headaches from them, vectors in particular, early in my programming experience (I wouldn't call it a career... I started dabbling in coding 2 years ago). I'll give vectors another look! TMP looks promising, but still a little over my head. I like how fast it sounds, though.

    Thanks for the tips, both of you! This has been a blast to learn.

  10. #8
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: QList of variable data type?

    The issues with LiDAR point clouds are similar to what we face with mass spectrometric images. Each point in the image is an entire spectrum, so MS image files are routinely 40 GB or larger.

    I'm unsure how class functions are stored in memory or in the executable when you have multiple instances of the same class. How much overhead is added by having 100,000,000 instances with get/set functions?
    The only thing that is allocated with each instance of an object is the memory for the data variables themselves. There is only just a single copy of the code, ever, regardless of whether it is a static method or not. The main difference between static and non-static methods is that a static method is basically the same as an ordinary function call, except that it is defined within a class' namespace. Static methods have no "this" pointer, so they can only modify static member variables of the class, not the member data associated with a particular instance of the class.

    There is some small overhead associated with using virtual methods. Most compilers implement classes with something called a "vtable", which is a lookup table of function locations. Depending on the implementation, virtual methods might have to go through another lookup at run time to determine which of the overridden functions to call, based on the actual type of the instance.

    As for deriving your data types from any QObject type, I wouldn't do it. If you really have millions of these, the overhead would kill performance. And Q_PROPERTY goes through all sorts of lookup and conversion to get or set values.

    I referred to metaprogramming because there are tricks in there for retrieving information out of object instances when you don't actually know (or care) what type the object really is. Look here under "Member types" for how std::vector<> does this.

  11. The following user says thank you to d_stranz for this useful post:

    Phlucious (30th November 2011)

  12. #9
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    Thanks for the tips! It seems to be working pretty well, at least conceptually. I seem to be having some issues with the actual size of the class in that it's somehow getting inflated.

    For example, if you add up all of the variables stored in this example class, it should add up to 20 bytes, and yet when I query its size with sizeof(), I get a value of 32. Why would that be if all each instance stores is the variables, not the functions?

    Qt Code:
    1. class ptcore_t
    2. {
    3. public:
    4. ptcore_t();
    5. virtual void clear();
    6.  
    7. qint32 xyz[3];
    8. quint16 intensity;
    9. quint8 retid;
    10. quint8 classification;
    11. qint8 scanangle;
    12. quint8 misc;
    13. quint16 ptsourceid;
    14. }
    To copy to clipboard, switch view to plain text mode 

    If it's relevant, I'm coding this with a 64bit version of Qt, on Windows 7 x64.

  13. #10
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: QList of variable data type?

    Probably because the compiler aligns storage for each member on 4-byte boundaries in a 64-bit machine. Possibly if you rearrange the order of your member variables, you could end up with a different size because the alignment changes.

    Why would that be if all each instance stores is the variables, not the functions?
    Think about what this means. Functions (code, basically) are invariant. Every single instance of an object of a given type executes exactly the same code in its member functions. The only differences between different invocations of the same method are the arguments passed in and the values of the member variables at that particular time. So why would the compiler need to store the addresses of each of the functions in every object, instead of one table for each class?

    Each compiler can make its own choices, but one way is that there is an "invisible" pointer that is passed as the first argument of every member function call, the pointer to "this", the actual object instance for which the method is being invoked. C++ hides this, so that when you refer to your member variable "intensity" within some method for ptcore_t, the compiler is actually looking at "this->intensity".

  14. The following user says thank you to d_stranz for this useful post:

    Phlucious (30th November 2011)

  15. #11
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    @d_stranz: You've been very helpful. I'm pretty much there with my implementation following your advice. My custom QFile class has an empty std::vector<ptcore_t*> in its constructor. The ptcore_t abstract class has virtual get/set functions for every possible variable.

    Qt Code:
    1. class ptcore_t
    2. {
    3. public:
    4. ptcore_t();
    5. virtual void clear();
    6.  
    7. /* Generalized data recording and retrieval functions */
    8. virtual double setXYZ(int i, double v);
    9. virtual quint16 setIntensity(quint16 v);
    10.  
    11. /* Placeholder functions for other formats */
    12. virtual double setTimeStamp(double v) { return 0.0; }
    13. //Lots and lots more placeholders that can be mixed and matched
    14.  
    15. private:
    16. /* Point data fields */
    17. double xyz[3];
    18. quint16 intensity;
    19. }
    To copy to clipboard, switch view to plain text mode 
    An inheriting class (such as pt1_t) overrides the get/set function for its specialized data.
    Qt Code:
    1. class pt1_t : public ptcore_t
    2. {
    3. public:
    4. pt1_t();
    5. void clear();
    6.  
    7. double setTimeStamp(double v);
    8. private:
    9. double gpstime;
    10. }
    To copy to clipboard, switch view to plain text mode 
    While reading in the data from an external binary file, I declare a ptN_t*, with the value of N depending on a flag in the file itself. Once the point is read, I push_back the ptN_t* into the vector container and move onto the next point. Can I safely assume that vector::clear() will release the memory like I expect it to if I want to read in a different set of points?

    One question regarding optimization. When I start reading the points, I already know what point format they will be since that was in the file's header. After reading the XYZ and Intensity data from the file, what comes next will vary depending on the point format. I'd rather not do a dozen if() checks for every point. Is there an efficient way to do this? I'm coding this at the file level, not the point level. Can I check if gpstime is defined, for example? (like #ifdef, except during run-time)

  16. #12
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: QList of variable data type?

    Can I safely assume that vector::clear() will release the memory like I expect it to if I want to read in a different set of points?
    No. std::vector<>::clear() does not do anything to free the memory pointed to by the objects it stores. If you are storing pointers, you must manually delete each pointer before clearing the vector. Likewise, you must do this with any operation that removes a vector element (erase) or replaces it with a different one (assign, operator=, etc.). Otherwise, you leak the memory occupied by the pointer that is left dangling.

    Whenever I create vectors of pointers, I usually use a smart pointer instead of a raw pointer. Smart pointers use reference counting and will automatically delete themselves when the reference count reaches zero (i.e. no one is holding onto a copy of the pointer anymore). Smart pointers might be too much overhead and slow performance too much for your application. If you want to look into it, we use the smart pointer implementation from the Boost C++ Libraries.

    Note also that for some implementations (Microsoft VC++ in particular), calling clear() does not actually free up the memory occupied by the vector itself. So if your vector is sized to hold 10^6 points and you then clear it, it still occupies the same amount of storage (10^6 * sizeof( ptN_t *) + a little overhead ). The only way we have found that actually reduces the size of the vector itself is to do the following:

    Qt Code:
    1. myVector.clear();
    2. {
    3. std::vector< double > reallyAndTrulyEmpty;
    4. myVector.swap( reallyAndTrulyEmpty );
    5. }
    To copy to clipboard, switch view to plain text mode 

    In other words, forcing a swap with a local empty vector does the trick. Simply calling clear just sets the value returned by size() to zero.

    Can I check if gpstime is defined, for example?
    I don't know what you mean by this. If what you want is a way to parse each line of the file to retrieve N-dimensional point data, without a multi-level if-else logic tree for each point, define a virtual method for each point type that parses the line into fields:

    Qt Code:
    1. class ptcore_t
    2. {
    3. public:
    4. ptcore_t();
    5. virtual void clear();
    6.  
    7. // ...
    8.  
    9. /* Reimplement for each concrete type to extract type-specific fields */
    10. virtual bool parseLine( const std::wstring & line );
    11.  
    12. private:
    13. // ...
    14. }
    To copy to clipboard, switch view to plain text mode 

    So your file reader will basically read a line of text (if that's how it is stored), construct an object of the type stored in the file, then pass the line to that object so it can parse its contents out. You could even make a constructor that did this as part of the allocation:

    Qt Code:
    1. ptN_t * ptr = new ptN_t( line );
    To copy to clipboard, switch view to plain text mode 

    Alternatively, when you first open the file and determine the type of points it contains, you could construct function objects (specific for the point type) to create a point of the appropriate type and to parse the line into the member variables. The parser function object then parses the line and fills in the appropriate member variables for the pointer. The function objects basically look like this:

    Qt Code:
    1. struct pt_t_builder
    2. {
    3. virtual ptcore_t * operator()() const = 0; // pure virtual, must be defined in derived structs
    4. };
    5.  
    6. struct pt1_t_builder : public pt_t_builder
    7. {
    8. virtual ptcore_t * operator()() const { return new pt1_t; }
    9. };
    10.  
    11. struct pt_t_parser
    12. {
    13. virtual bool operator()( ptcore_t * ptcore, const std::wstring & line ) const = 0;
    14. };
    15.  
    16. struct pt1_t_parser : public pt_t_parser
    17. {
    18. bool operator()( ptcore_t * ptcore, const std::wstring & line ) const
    19. {
    20. pt1_t * pt1 = static_cast< pt1_t * >( ptcore );
    21.  
    22. // parse "line" into fields, store into pt1 members
    23. // return true / false based on success
    24. }
    25. };
    To copy to clipboard, switch view to plain text mode 

    You then use this roughly as follows:

    Qt Code:
    1. // When point type is first determined
    2. pt_t_builder * ptBuilder = 0;
    3. pt_t_parser * ptParser = 0;
    4. switch( fileType )
    5. {
    6. case pt1_t_file:
    7. {
    8. ptBuilder = new pt1_t_builder;
    9. ptParser = new pt1_t_parser;
    10. }
    11. break;
    12.  
    13. // .. additional cases for other point types
    14. }
    15.  
    16. // Later, when reading each line:
    17. std::wstring line; // read the line into "line"
    18.  
    19. ptcore_t * point = (*ptBuilder)(); // create the appropriate point type based on the builder
    20. if ( (*ptParser)( point, line ) ) // load the new point's contents based on the parser and store if OK
    21. myVector[ nPoint++ ] = point;
    To copy to clipboard, switch view to plain text mode 

    HTH. Obviously not compiled or tested, so some tweaking might be required.
    Last edited by d_stranz; 30th November 2011 at 16:47.

  17. The following user says thank you to d_stranz for this useful post:

    Phlucious (30th November 2011)

  18. #13
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    Thank you so much for the vector information! I've found it very difficult to find good resources on vectors in a high-performance environment, so I really appreciate your input. In my experience, Boost tends to slow things down enough that I'm okay with programming the deletion myself. It'll be good practice, anyway.

    I really like the idea of passing a data chunk into a point constructor. That sounds very clean, especially since the file header also gives me the size-on-disk of each point format, I already know programmatically how big of a chunk to read for each point.

    Since my data file is stored in a binary format (not ASCII text), would it be faster to pull data one point (30 bytes, for example) at a time, or one component at a time (2 bytes)? I imagine it would be the former because of the reduced number of calls to the disk, especially with a typical HDD. In that case, would it make sense to read in a QByteArray using QIODevice::read(qint64) and pass that to the point constructor?

    I desperately need to take a good course on code optimization so I don't have to rely on you all. Thanks again! This should be my last question.
    Last edited by Phlucious; 30th November 2011 at 21:26. Reason: reformatted to look better

  19. #14
    Join Date
    Jan 2011
    Posts
    70
    Thanks
    43
    Thanked 4 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: QList of variable data type?

    One more... Why would I use static_cast as in
    Qt Code:
    1. pt1_t * pt1 = static_cast< pt1_t * >( ptcore );
    To copy to clipboard, switch view to plain text mode 
    instead of simply equating it and calling the constructor?
    Qt Code:
    1. ptcore_t* ptcore;
    2. pt1_t* pt1 = new pt1_t();
    3. ptcore = pt1;
    4. //or....
    5. ptcore_t* ptcore = new pt1_t();
    To copy to clipboard, switch view to plain text mode 
    Last edited by Phlucious; 2nd December 2011 at 00:26. Reason: updated contents

Similar Threads

  1. Changing the type of QList
    By wagmare in forum Newbie
    Replies: 4
    Last Post: 19th October 2011, 11:59
  2. Replies: 4
    Last Post: 8th December 2010, 13:22
  3. Replies: 2
    Last Post: 6th October 2010, 22:09
  4. 'QList' does not name a type
    By Qt Coder in forum Qt Programming
    Replies: 3
    Last Post: 22nd September 2009, 14:28
  5. Getting Type of an variable
    By hgedek in forum Qt Programming
    Replies: 1
    Last Post: 9th September 2007, 15:37

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.