Results 1 to 8 of 8

Thread: Advice: Reading large text file.

  1. #1
    Join Date
    Dec 2010
    Posts
    55
    Thanks
    1
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Advice: Reading large text file.

    I need to read a large (multi-GB) text file.

    It is formatted something like this

    Qt Code:
    1. ------------------------
    2.  
    3. Record ID: 1
    4. Attribute 1: 1234
    5. Attribute 2: 1234 1234 1234
    6. 1234 1234 1234
    7. Attribute 3: 1234
    8.  
    9. --------------------------------
    10.  
    11. Record ID: 2
    12. Attribute 1:
    13.  
    14. ....
    To copy to clipboard, switch view to plain text mode 

    There may be hundreds of thousands of records in the file. I have a list of maybe 500 record IDs and I need to pull the attributes for each of the 500. Also, a record may refer to another record, in that case, I will need to also pull the data for the referenced record. (ie. Record ID#1000 refers to ID#50. Then I need to go back and get Record #50)

    I decided to ReadAll the whole file into a QString.
    Then I use find to find the section of text that I want.
    Then I feed the record text into a QTextstream so I can use readline to parse out every line and get the attributes.

    I read in the file first into memory because I figure operations in memory will be faster.

    Another thought is to just readline directly from the Qfile saving off what I need. However since I am making repeated access to disk, I figure this might be slower.

    Also, I may need to read back and find another record. For example when i get to record 1000, it may refer to record 50, so I have to go back and find record 50. I may have encountered Record 50 first, but I would have discarded it because I would not know I need it until I get to record 1000. This seems to be another advantage of the first approach since I can just used find on the QString.


    Does this sound like the best way for me to do this? or is there a faster way to do this?
    Running:
    RHEL 5.4
    Python 2.7.2
    Qt 4.7.4
    SIP 4.7.8
    PyQt 4.7

  2. #2
    Join Date
    Mar 2011
    Location
    Hyderabad, India
    Posts
    1,882
    Thanks
    3
    Thanked 452 Times in 435 Posts
    Qt products
    Qt4 Qt5
    Platforms
    MacOS X Unix/X11 Windows
    Wiki edits
    15

    Default Re: Advice: Reading large text file.

    It would be better to store the processed records in memory, and reuse then, instead of re-loading them from file.

    I would say, open the file, load all the records, and store in memory in a processed format, and do functions(). This way file is read only once duing loading, all the rest of the time records are read from memory. (But you need to be sure of each record size and total memory required to load all the records)

  3. #3
    Join Date
    Dec 2010
    Posts
    55
    Thanks
    1
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Advice: Reading large text file.

    I have tried that too and it seems to take slightly longer.

    If I understand you correctly, you mean parsing through the entire file and saving the records in memory.

    The file may have a million records, so I tried going through line by line and saving the million records in a QHash, then accessing the QHash to get the 500 records i actually care about.

    However, when I changed to just loading the entire file TEXT into a QString, then searching the QString for the 500 records I want, it was faster.

    I think this is because that parsing through a record to get the data I want will take X amount of time. If I just do it for 500 records, its 500X time versus if I do it for every record it is 1000000X time. I guess just storing the text in memory and using indexOf is faster.

    But yes I agree I will be wasting alot of memory doing this. I could delete the QString after I'm done with it, but all the program does after it gets the 500 records I care about is save it to a CSV file the exit.

    I'll have to do some testing to see how much memory its using to see if its too much. I wish there was a indexOf funtion to allow me to search for a substring in a file directly. That way I could pull the parts of text into memory versus the entire thing.

    Another option would be for me to make two passes of the file but then I have to read the entire file twice.
    Running:
    RHEL 5.4
    Python 2.7.2
    Qt 4.7.4
    SIP 4.7.8
    PyQt 4.7

  4. #4
    Join Date
    Dec 2010
    Posts
    55
    Thanks
    1
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Advice: Reading large text file.

    I tried to open a 2.5GB file and found that readall would not work.

    I tried reading line by line and saving the formatted data however that took a long time.

    Then I tried to read the entire file, and indexing the file position with the start of each record (with its record ID). Then I used seek() to get the details of each record I was interested in. It involved two passes though

    I will have to test which way is faster.
    Running:
    RHEL 5.4
    Python 2.7.2
    Qt 4.7.4
    SIP 4.7.8
    PyQt 4.7

  5. #5
    Join Date
    Apr 2011
    Posts
    124
    Thanks
    1
    Thanked 10 Times in 10 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Windows Symbian S60

    Default Re: Advice: Reading large text file.

    You say you only need about 500 records out of the file. Read the records one at a time and see if each is the one you need. If so, save it, otherwise discard. Then read the next.

    Be sure to enable buffering (ie, use QTextStream or the equivalent) on the file so that performance doesn't suck.

  6. #6
    Join Date
    Dec 2010
    Posts
    55
    Thanks
    1
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Advice: Reading large text file.

    Quote Originally Posted by DanH View Post
    You say you only need about 500 records out of the file. Read the records one at a time and see if each is the one you need. If so, save it, otherwise discard. Then read the next.

    Be sure to enable buffering (ie, use QTextStream or the equivalent) on the file so that performance doesn't suck.
    Yes, but I dont know what all the records are.
    I will start with a list of maybe 300 record ID's, so I go through the file one by one as you say. However inside each record, there may be a reference to another record. in that case, I would need to go find that record. I could read the file line by line as you suggest, but I would need to do two passes to go back and get the references. That is why I do one full pass first to index the file, then I get the records I need using seek. I assume that when I use seek, it jumps to that part of the file and does not need to read every line.
    Running:
    RHEL 5.4
    Python 2.7.2
    Qt 4.7.4
    SIP 4.7.8
    PyQt 4.7

  7. #7
    Join Date
    Jun 2007
    Location
    India
    Posts
    1,042
    Thanks
    8
    Thanked 133 Times in 128 Posts
    Qt products
    Qt3 Qt4 Qt/Embedded
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: Advice: Reading large text file.

    handling text file in GB size is stupidity. Convert your text file into a sqlite database and then you can fire simple queries to select the desired data. As your text file already formatted, it would take just one small function to convert the text records to sqlite records.

  8. #8
    Join Date
    Apr 2011
    Posts
    124
    Thanks
    1
    Thanked 10 Times in 10 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Windows Symbian S60

    Default Re: Advice: Reading large text file.

    Certainly you can read through once and then index back with "seek". No need to read the entire file at once.

    Or convert to a SQL DB.

Similar Threads

  1. Problem: Reading and editing text file data
    By dipeshtech in forum Newbie
    Replies: 2
    Last Post: 2nd May 2011, 23:47
  2. Reading from text file
    By jerkymotion in forum Qt Programming
    Replies: 5
    Last Post: 17th March 2011, 11:26
  3. Replies: 9
    Last Post: 30th July 2010, 09:13
  4. High performance large file reading on OSX
    By mikeee7 in forum Qt Programming
    Replies: 2
    Last Post: 15th October 2009, 14:18
  5. File Reading Advice
    By tntcoda in forum Qt Programming
    Replies: 1
    Last Post: 11th November 2008, 19:44

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.