I need to read a large (multi-GB) text file.
It is formatted something like this
------------------------
Record ID: 1
Attribute 1: 1234
Attribute 2: 1234 1234 1234
1234 1234 1234
Attribute 3: 1234
--------------------------------
Record ID: 2
Attribute 1:
....
------------------------
Record ID: 1
Attribute 1: 1234
Attribute 2: 1234 1234 1234
1234 1234 1234
Attribute 3: 1234
--------------------------------
Record ID: 2
Attribute 1:
....
To copy to clipboard, switch view to plain text mode
There may be hundreds of thousands of records in the file. I have a list of maybe 500 record IDs and I need to pull the attributes for each of the 500. Also, a record may refer to another record, in that case, I will need to also pull the data for the referenced record. (ie. Record ID#1000 refers to ID#50. Then I need to go back and get Record #50)
I decided to ReadAll the whole file into a QString.
Then I use find to find the section of text that I want.
Then I feed the record text into a QTextstream so I can use readline to parse out every line and get the attributes.
I read in the file first into memory because I figure operations in memory will be faster.
Another thought is to just readline directly from the Qfile saving off what I need. However since I am making repeated access to disk, I figure this might be slower.
Also, I may need to read back and find another record. For example when i get to record 1000, it may refer to record 50, so I have to go back and find record 50. I may have encountered Record 50 first, but I would have discarded it because I would not know I need it until I get to record 1000. This seems to be another advantage of the first approach since I can just used find on the QString.
Does this sound like the best way for me to do this? or is there a faster way to do this?
Bookmarks