Not The Wizard

Oz Solomon's Blog

Are Files Appends Really Atomic?

Conventional wisdom says that appending to a file is atomic, up to a certain write size.  That means, for example, that you can have multiple processes write shared a log file and not have the log output crossed.  Recently, I was thinking about this “conventional wisdom” and decided to look for some proof.  After some Googling, and eventually landed on this StackOverflow question which discusses the question at hand.  The accepted answer has a lot of good information, but also uses the words “supposed to be” and “assuming” a little to much to be conclusive.

Testing Appends on Linux

To settle this issue once and for all, I decided to write a short script to actually test the limit of atomic appends.  The idea is to fork a bunch of worker threads and have them write to the same file.  Each thread will write a unique signature, which we can then verify against corruption.  If user freiheit from StackOverflow is correct, then writes of 4096 bytes on Linux should be OK, and at 4097 bytes we should already see corruption.

The bash script (source at the end of this post) takes one argument – the amount that each worker should write at a time (a.k.a. the buffer size). The results from a CentOS 6.5 box are as follows:

# ./test_appends.sh 4096
Launching 20 worker processes
Each line will be 4096 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....
All's good! The output file had no corrupted lines.
# ./test_appends.sh 4097
Launching 20 worker processes
Each line will be 4097 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....
Found 27 instances of corrupted lines

So there you have it, empirical proof that Linux allows atomic appends of 4KB.

What About Windows?

Luckily, we can use cygwin on Windows to run the same experimental script.  With some trial and error we can then determine that the maximum atomic write size is 1024 bytes:

$ ./test_appends.sh 1024
Launching 30 worker processes
Each line will be 1024 characters long
Waiting for processes to exit
Testing output file
All's good! The output file had no corrupted lines.
$ ./test_appends.sh 1025
Launching 30 worker processes
Each line will be 1025 characters long
Waiting for processes to exit
Testing output file
Found 406 instances of corrupted lines

In Conclusion

If you’re writing cross-platform Windows/Linux code, and need to output to a shared file, make sure to keep your writes to 1KB or less.

 

Update 11/12/2014: Updated the script to check the resulting file size as per glandium‘s suggestion in the comments.

11 Comments

  1. One thing your check doesn’t do is count the number of lines in the file. Which hides the fact that some lines are just overwritten on Windows. With 50 lines per worker, I got 15% loss ; with 500 lines, between 40 and 50%.

    • Thank you for the suggestion. I added a filesize check to the script to make sure entirely dropped lines are detected. Personally I’m not seeing this problem on Windows, but it could be because we’re using different versions/ports of bash.

      • Indeed, depending on the APIs used, the result is different. Native win32 APIs work properly (CreateFile with FILE_APPEND_DATA), but POSIX APIs don’t (open with O_APPEND, as provided in the MS CRT).

  2. FWIW I just tried this on OS X Yosemite 10.10.2 and it only supports 256 bytes without locking.

  3. I used your script on Debian 7.8, and I was only able to get atomic writes up to and including 1008 bytes (1024 – 16 bytes of overhead?) on both my ext4 partition and a tmpfs mount. Anything beyond that resulted in corruption every time.

  4. It’s possible that the ‘echo’ command used by the worker processes is to blame for the corruption.

    You’re assuming that ‘echo’ is using write(2) to output it’s argument(s) directly.

    What if, instead, it’s using stdio, and using printf? There’s no guarantee that a single call to printf will result in a single write(2) call. It might be dumb enough to try copying the argument into STDOUT’s buffer, and as that buffer gets full, outputting the buffer via write(2). This means that long enough strings passed to printf could result in multiple write(2) calls.

    The only way to know for certain is to use strace, or something like it.

  5. I know this is old but on linux, I was able to used stdbuf to extend the output buffer beyond 4K via the newish stdbuf command.
    I just did something like this in the run_worker() method:
    stdbuf -o 10K echo $line >> $OUTPUT_FILE
    PS: Thanks for your test program!

  6. Because your script is using pipe (echo $line >> $OUTPUT_FILE) then your atomic writes are limited to *pipe* atomic write size, which is different from file write append semantics. Your test basically shows that 4K limit on pipe write is there, on Linux limits.h:
    #define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */

    However append to file on posix is atomic:
    http://man7.org/linux/man-pages/man2/write.2.html

    If the file was open(2)ed with O_APPEND, the file offset is
    first set to the end of the file before writing. The adjustment of
    the file offset and the write operation are performed as an atomic
    step.

    You can test it with similar script, but you have to use OS write. Be aware that some languages (like Python) do use internal buffering which messes up with test results, and must be disabled (in Python use os.open() or open(.. buffering=0) for testing).

    Sample Python test code:
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    import os
    import codecs
    import random
    import string
    from hashlib import md5
    from multiprocessing import Pool

    PROCESSES = 10
    LENGTH = 8192*5+777
    LINES = 100
    FILENAME = ‘/tmp/test.log’

    def randomstr(length):
    return ”.join(random.choice(string.lowercase) for i in range(length))

    def write(_):
    with codecs.open(FILENAME, ‘ab’, encoding=’utf-8′, buffering=0) as f:
    rndstr = randomstr(LENGTH – 2 – len(md5(‘x’).hexdigest())) # -2 is to account for : and \n
    hash = md5(rndstr).hexdigest()
    line = rndstr + ‘:’ + hash + ‘\n’
    for i in xrange(LINES):
    f.write(line)

    def verify():
    wrong = 0
    lines = 0
    with codecs.open(FILENAME, ‘rb’, encoding=’utf-8′) as f:
    for l in f:
    lines += 1
    try:
    line, hash = l.strip(‘\n’).split(‘:’)
    except:
    wrong += 1
    if hash != md5(line).hexdigest():
    wrong += 1
    print ‘Line length {} total {} missing {} wrong {}’.format(LENGTH, lines, PROCESSES * LINES – lines, wrong)

    if __name__ == ‘__main__’:
    try:
    os.unlink(FILENAME)
    except:
    pass
    pool = Pool(PROCESSES)
    pool.map(write, [None] * PROCESSES)
    pool.close()
    pool.join()
    verify()

  7. Aravindakshan Babu

    October 26, 2016 at 3:02 am

    Strange. When I run your script without any arguments or modifications on ubuntu 14.04 with ext4, I get:

    $ ./test_appends.sh
    Buffer length not specified, defaulting to 4096
    Launching 20 worker processes
    Each line will be 4096 characters long
    Waiting for processes to exit
    Testing output file
    Found 83 instances of corrupted lines

    Any ideas?

    • Please see Reid Faiv’s comments above. I suspect this has to do with changes in pipe flushing on your system.

      • Aravindakshan Babu

        October 29, 2016 at 2:56 am

        Thanks for the reply Oz!

        I”m not sure that is the case. On my machine:

        “””
        $ grep “BUF” /usr/include/linux/limits.h
        #define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */
        “””

        Seems like something else is going on… crazy.

Leave a Reply

Your email address will not be published.

*

© 2017 Not The Wizard

Theme by Anders NorenUp ↑