Page 2 of 2 FirstFirst 12
Results 21 to 33 of 33

Thread: reading from a file

  1. #21
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by mickey View Post
    1. I don't know the number of the lines ( I know only its dimesion with instructions below)
    If you are going to use merge sort, you don't have to know this.

    Quote Originally Posted by mickey View Post
    After that, in your opinion, can I use
    [...]
    to do all work ( I still understand it'll more speed) ??
    The version with std::vector will be easier to implement and you if preallocate memory, it shouldn't be significantly slower.

  2. #22
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: reading from a file

    EDIT: OK. Such as other user, I couldn't understand the trick to work on block of 256k
    Last edited by mickey; 12th July 2007 at 18:40. Reason: error in reply
    Regards

  3. #23
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Post Re: reading from a file

    Quote Originally Posted by mickey View Post
    Such as other user, I couldn't understand the trick to work on block of 256k
    Suppose you have file X and you want to sort it. Using merge sort you would do it this way:
    1. blockSize := 1.
    2. while not end of file X:
      • copy blockSize lines to file A,
      • if no more lines in file X:
        • END: file A is sorted.
      • copy blockSize lines to file B.
    3. rewind files X, A and B.
    4. while not end of file A and B:
      • merge two blocks from A and B and write them to X,
    5. rewind files X, A and B.
    6. blockSize := 2 * blockSize.
    7. go to 2.


    For example, let X= 42, 24, 67, 45, 79, 33, 76, 85, 29, 59, 26, 92, 30, 56, 81, 27.

    Iteration 1:
    A = 24, 45, 33, 85, 59, 92, 56, 27
    B = 42, 67, 79, 76, 29, 26, 30, 81

    X = 24, 42, 45, 67, 33, 79, 76, 85, 29, 59, 26, 92, 30, 56, 27, 81

    Now X consists of 2-element sorted blocks.

    Iteration 2 (this time blockSize is 2):
    A = 24, 42, 33, 79, 29, 59, 30, 56
    B = 45, 67, 76, 85, 26, 92, 27, 81

    X = 24, 42, 45, 67, 33, 76, 79, 85, 26, 29, 59, 92, 27, 30, 56, 81

    Now 4-element blocks are sorted.

    Iteration 3 (blockSize = 4):
    A = 24, 42, 45, 67, 26, 29, 59, 92
    B = 33, 76, 79, 85, 27, 30, 56, 81

    X = 24, 33, 42, 45, 67, 76, 79, 85, 26, 27, 29, 30, 56, 59, 81, 92

    Final iteration:
    A = 24, 33, 42, 45, 67, 76, 79, 85,
    B = 26, 27, 29, 30, 56, 59, 81, 92

    X = 24, 26, 27, 29, 30, 33, 42, 45, 56, 59, 67, 76, 79, 81, 85, 92


    Note that:
    • you don't have to read whole blocks to merge them,
    • you don't have to start with blockSize = 1 (you can use some more efficient algorithm to sort blocks in memory, dump them into a file and then continue with merge sort),
    • when the size of the input file isn't a power of 2, you will have a block that is shorter than blockSize.

  4. #24
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: reading from a file

    The thing becames interesting....
    Quote Originally Posted by jacek View Post
    The version with std::vector will be easier to implement and you if preallocate memory, it shouldn't be significantly slower.
    I still haven't clear a point: How do I put lines of file X in the vector line?
    Qt Code:
    1. while( getline (ifile, temp, '\n') ) {
    2. lines[l] = temp;
    3. l++;
    4. }
    To copy to clipboard, switch view to plain text mode 
    When I speak of "speed" I mean that I know read a file line by line isn't so efficent..(!?)
    Then you said to work on 3 files: but file access and work on files is so efficent? (maybe it'll be better use 2 vector instead of 2 files?)
    thanks
    Regards

  5. #25
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by mickey View Post
    How do I put lines of file X in the vector line?
    I would use operator>>.

    Quote Originally Posted by mickey View Post
    When I speak of "speed" I mean that I know read a file line by line isn't so efficent..(!?)
    Then you said to work on 3 files: but file access and work on files is so efficent? (maybe it'll be better use 2 vector instead of 2 files?)
    If you read 2GB file into 512MB of RAM, over 3/4 of it will land back on the hard drive in a swap partition/file. If you divide that file into several blocks and sort them in memory and finally merge them, it should be faster than sorting a vector which 3/4 are in swap.

    If you write each sorted part of the file into separate temporary file and then merge all of those files at once (not two at a time as in classical merge sort), you will need only one merge operation. This means that you will have to read and write every line from/to hard drive only twice.

  6. #26
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Thanks
    8
    Thanked 127 Times in 121 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by jacek View Post
    What doesn't make sense exactly?
    Telling that a file bigger than memory shouldn't be loaded in order to prevent swap use because the "swap limit" is actually much lower than the actual physical memory size...
    Current Qt projects : QCodeEdit, RotiDeCode

  7. #27
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: reading from a file

    Quote Originally Posted by jacek View Post
    Luckily it isn't required for merge sort.
    Yes. To be exact, something like:
    Qt Code:
    1. std::string temp;
    2. temp.reserve( MAX_LINE_LEN );
    3. std::vector< std::string > lines( NLINES, temp );
    4.  
    5. // or if std::string tries to be smart:
    6.  
    7. std::vector< std::string > lines( NLINES );
    8. for( ... ) {
    9. line->reserve( MAX_LINE_LEN );
    10. }
    To copy to clipboard, switch view to plain text mode 
    Where MAX_LINE_LEN = 1024 and NLINES is around 256k. Remember that the point is to avoid reallocations and swapping.

    This way you can read 256k lines into preallocated space, which should be fast enough. Then you can sort that vector using std::sort, dump everyting into a temporary file and read another set of lines. And so on until the end of file. When you are finished you can free the memory and continue with merge sort algorithm.
    Sorry, but the solutions with esamples of some posts before is different from what you said here above?? With solution with examples where Have I the vectors? Where do I call sort() ?
    Regards

  8. #28
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by fullmetalcoder View Post
    Telling that a file bigger than memory shouldn't be loaded in order to prevent swap use because the "swap limit" is actually much lower than the actual physical memory size...
    In other words, you say that upper bound is incorrect, because it's too big?

  9. #29
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by mickey View Post
    Sorry, but the solutions with esamples of some posts before is different from what you said here above??
    No, that's the first step, which allows you to save some time.

    Quote Originally Posted by mickey View Post
    With solution with examples where Have I the vectors? Where do I call sort() ?
    Instead of starting the merge sort algorithm with blockSize = 1, you can start it with blockSize = 256000, but you'll need a file that contains sorted blocks of 256000 lines. That's where you are going to use the vector and sort().

    • f := [A, B].
    • fileId := 0.
    • while not end of file X:
      • vector := read 256000 lines from file X,
      • sort vector,
      • write vector to file f[fileId],
      • fileId := 1 - fileId.
    • blockSize := 256000.
    • go to 3 from post #23.


    Merge sort is one of the basic algorithms that every programmer should know. If you need more explanations, see Algorithms + Data Structures = Programs by Niklaus Wirth.

  10. #30
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: reading from a file

    sorry, last question: your solution counts on many (dim_file/block_size) temporary files? Because that's what I understand only now..
    Regards

  11. #31
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by mickey View Post
    sorry, last question: your solution counts on many (dim_file/block_size) temporary files? Because that's what I understand only now..
    You need at least two temporary files, but you can use more.

  12. #32
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: reading from a file

    sorry but I'm thinking that what I don't understand isn't the merge sort...
    You said that I can do with 2 files A and B. So I work with 256000 lines. Sort they and put in file A; then read other 256000 lines, sort(), and put in file B. Now using mergesort to merge two file and write they in the first 500.000 lines of file X. Now I go on. 256.000*2 = 512.000; now the first 512000 lines of X are sorted! But the file isn't finish. Here begin what I don't understand....(I'm thinking this: ); rewind X,A,B. from line 512.001 of X: take a block of 256.000, sort, copy on A; take the next 256.000 block of X, sort, copy on B; mergesort on A and B and overwrite lines from 512.001 to 1024.000; take a block from of 256.000 lines of X from line 1024.000; there's only 120000 lines! Sort them; restart: blocksize 256.000*2=512.000; copy lines from 1 to 512.000 to A (they're sorted); copy lines from 512.001 to B (sorted too); mergesort on A and B and write 1024.000 lines on X; take lines from 1024.000 to end; copy 1024.000 to A and copy the 120.000 lines to B; mergesort on A and B and write on X. Stop. Is it this? It seems me slow...
    Regards

  13. #33
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    5,372
    Thanks
    28
    Thanked 976 Times in 912 Posts
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: reading from a file

    Quote Originally Posted by mickey View Post
    You said that I can do with 2 files A and B. So I work with 256000 lines. Sort they and put in file A; then read other 256000 lines, sort(), and put in file B. Now using mergesort to merge two file and write they in the first 500.000 lines of file X. Now I go on. 256.000*2 = 512.000; now the first 512000 lines of X are sorted! But the file isn't finish. Here begin what I don't understand....
    Before you start to merge anything, you have to split the whole file, not only the first blocks.

    If you have a file 'X' with 2048k of lines, you have to do the following:
    • repeat 4 times:
      • lines := read 256k lines from X,
      • sort lines
      • write lines to file A
      • lines := read 256k lines from X,
      • sort lines
      • write lines to file B
    • rewind files
    • repeat 4 times:
      • merge 256k line blocks from A and B and append them to X
    • rewind files
    • repeat 2 times:
      • append 512k lines from X to A
      • append 512k lines from X to B
    • rewind files
    • repeat 2 times:
      • merge 512k line blocks from A and B and append them to X
    • rewind files
    • append 1024k lines from X to A
    • append 1024k lines from X to B
    • rewind files
    • merge 1024k line blocks from A and B and append them to X


    Quote Originally Posted by mickey View Post
    It seems me slow...
    Of course it's slow. If you can't fit your data in memory, everything will be slow, but other methods might be event slower.

    Although there might be other solution. If you find such function f, that if line1 < line2, then f(line1) < f(line2), you can sort values returned by that function instead.

Similar Threads

  1. qt-3.3.8 fail in scratchbox
    By nass in forum Installation and Deployment
    Replies: 0
    Last Post: 25th May 2007, 16:21
  2. Interesting little Segfault w/r to signal/slot connection
    By Hydragyrum in forum Qt Programming
    Replies: 24
    Last Post: 12th September 2006, 20:22
  3. Reading a unicode names from a file???
    By darpan in forum Qt Programming
    Replies: 7
    Last Post: 3rd May 2006, 18:28
  4. Replies: 6
    Last Post: 27th February 2006, 13:47
  5. Problem with reading a file
    By Buhmann in forum Qt Programming
    Replies: 11
    Last Post: 17th February 2006, 14:02

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.