Are Files Appends Really Atomic?

June 17, 2014 / Oz Solomon / 16 Comments

Conventional wisdom says that appending to a file is atomic, up to a certain write size. That means, for example, that you can have multiple processes write shared a log file and not have the log output crossed. Recently, I was thinking about this “conventional wisdom” and decided to look for some proof. After some Googling, and eventually landed on this StackOverflow question which discusses the question at hand. The accepted answer has a lot of good information, but also uses the words “supposed to be” and “assuming” a little to much to be conclusive.

Testing Appends on Linux

To settle this issue once and for all, I decided to write a short script to actually test the limit of atomic appends. The idea is to fork a bunch of worker threads and have them write to the same file. Each thread will write a unique signature, which we can then verify against corruption. If user freiheit from StackOverflow is correct, then writes of 4096 bytes on Linux should be OK, and at 4097 bytes we should already see corruption.

The bash script (source at the end of this post) takes one argument – the amount that each worker should write at a time (a.k.a. the buffer size). The results from a CentOS 6.5 box are as follows:

# ./test_appends.sh 4096
Launching 20 worker processes
Each line will be 4096 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....
All's good! The output file had no corrupted lines.
# ./test_appends.sh 4097
Launching 20 worker processes
Each line will be 4097 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....
Found 27 instances of corrupted lines

So there you have it, empirical proof that Linux allows atomic appends of 4KB.

What About Windows?

Luckily, we can use cygwin on Windows to run the same experimental script. With some trial and error we can then determine that the maximum atomic write size is 1024 bytes:

$ ./test_appends.sh 1024
Launching 30 worker processes
Each line will be 1024 characters long
Waiting for processes to exit
Testing output file
All's good! The output file had no corrupted lines.
$ ./test_appends.sh 1025
Launching 30 worker processes
Each line will be 1025 characters long
Waiting for processes to exit
Testing output file
Found 406 instances of corrupted lines

In Conclusion

If you’re writing cross-platform Windows/Linux code, and need to output to a shared file, make sure to keep your writes to 1KB or less.

#!/bin/bash

#############################################################################
#
# This script aims to test/prove that you can append to a single file from
# multiple processes with buffers up to a certain size, without causing one
# process' output to corrupt the other's.
#
# The script takes one parameter, the length of the buffer. It then creates
# 20 worker processes which each write 50 lines of the specified buffer
# size to the same file. When all processes are done outputting, it tests
# the output file to ensure it is in the correct format.
#
#############################################################################

NUM_WORKERS=20
LINES_PER_WORKER=50
OUTPUT_FILE=/tmp/out.tmp

# each worker will output $LINES_PER_WORKER lines to the output file
run_worker() {
    worker_num=$1
    buf_len=$2

    # Each line will be a specific character, multiplied by the line length.
    # The character changes based on the worker number.
    filler_len=$((${buf_len}-1)) # -1 -> leave room for \n
    filler_char=$(printf \\$(printf '%03o' $(($worker_num+64))))
    line=`for i in $(seq 1 $filler_len);do echo -n $filler_char;done`
    for i in $(seq 1 $LINES_PER_WORKER)
    do
        echo $line >> $OUTPUT_FILE
    done
}

if [ "$1" = "worker" ]; then
    run_worker $2 $3
    exit
fi

buf_len=$1
if [ "$buf_len" = "" ]; then
    echo "Buffer length not specified, defaulting to 4096"
    buf_len=4096
fi

rm -f $OUTPUT_FILE

echo Launching $NUM_WORKERS worker processes
for i in $(seq 1 $NUM_WORKERS)
do
    $0 worker $i $buf_len &
    pids[$i]=${!}
done

echo Each line will be $buf_len characters long
echo Waiting for processes to exit
for i in $(seq 1 $NUM_WORKERS)
do
    wait ${pids[$i]}
done

# Now we want to test the output file. Each line should be the same letter
# repeated buf_len-1 times (remember the \n takes up one byte). If we had
# workers writing over eachother's lines, then there will be mixed characters
# and/or longer/shorter lines.

echo Testing output file

# Make sure the file is the right size (ensures processes didn't write over
# eachother's lines)
expected_file_size=$(($NUM_WORKERS * $LINES_PER_WORKER * $buf_len))
actual_file_size=`cat $OUTPUT_FILE | wc -c`
if [ "$expected_file_size" -ne "$actual_file_size" ]; then
    echo Expected file size of $expected_file_size, but got $actual_file_size
else
  
    # File size is OK, test the actual content

    # Only use newer versions of grep because older ones are way too slow with
    # backreferences
    [[ $(grep --version) =~ [^[:digit:]]*([[:digit:]]+)\.([[:digit:]]+) ]]
    grep_ver="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
    if [ "$grep_ver" -ge "216" ]; then
        num_lines=$(grep -v "^\(.\)\1\{$((${buf_len}-2))\}$" $OUTPUT_FILE | wc -l)
    else
        # Scan line by line in bash, which isn't that speedy, but is good enough
        # Note: Doesn't work on cygwin for lines < 255
        line_length=$((${buf_len}-1))
        num_lines=0
        for line in `cat $OUTPUT_FILE`
        do
            if ! [[ $line =~ ^${line:0:1}{$line_length}$ ]]; then
                num_lines=$(($num_lines+1))
            fi;
            echo -n .
        done
        echo
    fi

    if [ "$num_lines" -gt "0" ]; then
        echo "Found $num_lines instances of corrupted lines"
        else
        echo "All's good! The output file had no corrupted lines. $size"
    fi
fi

rm -f $OUTPUT_FILE

100

101

102

103

104

105

106

107

108

#!/bin/bash

#############################################################################

# This script aims to test/prove that you can append to a single file from

# multiple processes with buffers up to a certain size, without causing one

# process' output to corrupt the other's.

# The script takes one parameter, the length of the buffer. It then creates

# 20 worker processes which each write 50 lines of the specified buffer

# size to the same file. When all processes are done outputting, it tests

# the output file to ensure it is in the correct format.

#############################################################################

NUM_WORKERS=20

LINES_PER_WORKER=50

OUTPUT_FILE=/tmp/out.tmp

# each worker will output $LINES_PER_WORKER lines to the output file

run_worker() {

worker_num=$1

buf_len=$2

# Each line will be a specific character, multiplied by the line length.

# The character changes based on the worker number.

filler_len=$((${buf_len}-1)) # -1 -> leave room for \n

filler_char=$(printf \\$(printf '%03o' $(($worker_num+64))))

line=`for i in $(seq 1 $filler_len);do echo -n $filler_char;done`

for i in $(seq 1 $LINES_PER_WORKER)

echo $line >> $OUTPUT_FILE

done

}

if [ "$1" = "worker" ]; then

run_worker $2 $3

exit

buf_len=$1

if [ "$buf_len" = "" ]; then

echo "Buffer length not specified, defaulting to 4096"

buf_len=4096

rm -f $OUTPUT_FILE

echo Launching $NUM_WORKERS worker processes

for i in $(seq 1 $NUM_WORKERS)

$0 worker $i $buf_len &

pids[$i]=${!}

done

echo Each line will be $buf_len characters long

echo Waiting for processes to exit

for i in $(seq 1 $NUM_WORKERS)

wait ${pids[$i]}

done

# Now we want to test the output file. Each line should be the same letter

# repeated buf_len-1 times (remember the \n takes up one byte). If we had

# workers writing over eachother's lines, then there will be mixed characters

# and/or longer/shorter lines.

echo Testing output file

# Make sure the file is the right size (ensures processes didn't write over

# eachother's lines)

expected_file_size=$(($NUM_WORKERS * $LINES_PER_WORKER * $buf_len))

actual_file_size=`cat $OUTPUT_FILE | wc -c`

if [ "$expected_file_size" -ne "$actual_file_size" ]; then

echo Expected file size of $expected_file_size, but got $actual_file_size

else

# File size is OK, test the actual content

# Only use newer versions of grep because older ones are way too slow with

# backreferences

[[ $(grep --version) =~ [^[:digit:]]*([[:digit:]]+)\.([[:digit:]]+) ]]

grep_ver="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"

if [ "$grep_ver" -ge "216" ]; then

num_lines=$(grep -v "^$.$\1\{$((${buf_len}-2))\}$" $OUTPUT_FILE | wc -l)

else

# Scan line by line in bash, which isn't that speedy, but is good enough

# Note: Doesn't work on cygwin for lines < 255

line_length=$((${buf_len}-1))

num_lines=0

for line in `cat $OUTPUT_FILE`

if ! [[ $line =~ ^${line:0:1}{$line_length}$ ]]; then

num_lines=$(($num_lines+1))

fi;

echo -n .

done

echo

if [ "$num_lines" -gt "0" ]; then

echo "Found $num_lines instances of corrupted lines"

else

echo "All's good! The output file had no corrupted lines. $size"

rm -f $OUTPUT_FILE

Update 11/12/2014: Updated the script to check the resulting file size as per glandium‘s suggestion in the comments.

Software Dev

16 Comments

glandium
November 12, 2014 at 1:29 am

One thing your check doesn’t do is count the number of lines in the file. Which hides the fact that some lines are just overwritten on Windows. With 50 lines per worker, I got 15% loss ; with 500 lines, between 40 and 50%.

Reply
- Oz Solomon (Post author)
  November 12, 2014 at 8:55 am
  
  Thank you for the suggestion. I added a filesize check to the script to make sure entirely dropped lines are detected. Personally I’m not seeing this problem on Windows, but it could be because we’re using different versions/ports of bash.
  
  Reply
  - glandium
    November 13, 2014 at 5:36 am
    
    Indeed, depending on the APIs used, the result is different. Native win32 APIs work properly (CreateFile with FILE_APPEND_DATA), but POSIX APIs don’t (open with O_APPEND, as provided in the MS CRT).
    
    Reply
magnum
February 6, 2015 at 8:48 pm

FWIW I just tried this on OS X Yosemite 10.10.2 and it only supports 256 bytes without locking.

Reply
- Filip
  July 4, 2019 at 3:58 am
  
  256 bytes on macOS Mojave 10.14.4 too. APFS file system with 4096 allocation size.
  
  Reply
Eric
February 14, 2015 at 4:41 pm

I used your script on Debian 7.8, and I was only able to get atomic writes up to and including 1008 bytes (1024 – 16 bytes of overhead?) on both my ext4 partition and a tmpfs mount. Anything beyond that resulted in corruption every time.

Reply
Ben Goldberg
April 9, 2015 at 9:00 pm

It’s possible that the ‘echo’ command used by the worker processes is to blame for the corruption.

You’re assuming that ‘echo’ is using write(2) to output it’s argument(s) directly.

What if, instead, it’s using stdio, and using printf? There’s no guarantee that a single call to printf will result in a single write(2) call. It might be dumb enough to try copying the argument into STDOUT’s buffer, and as that buffer gets full, outputting the buffer via write(2). This means that long enough strings passed to printf could result in multiple write(2) calls.

The only way to know for certain is to use strace, or something like it.

Reply
jeff
May 20, 2015 at 4:15 pm

I know this is old but on linux, I was able to used stdbuf to extend the output buffer beyond 4K via the newish stdbuf command.
I just did something like this in the run_worker() method:
stdbuf -o 10K echo $line >> $OUTPUT_FILE
PS: Thanks for your test program!

Reply
Reid Faiv
June 30, 2016 at 5:28 pm

Because your script is using pipe (echo $line >> $OUTPUT_FILE) then your atomic writes are limited to *pipe* atomic write size, which is different from file write append semantics. Your test basically shows that 4K limit on pipe write is there, on Linux limits.h:
#define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */

However append to file on posix is atomic:
http://man7.org/linux/man-pages/man2/write.2.html

If the file was open(2)ed with O_APPEND, the file offset is
first set to the end of the file before writing. The adjustment of
the file offset and the write operation are performed as an atomic
step.

You can test it with similar script, but you have to use OS write. Be aware that some languages (like Python) do use internal buffering which messes up with test results, and must be disabled (in Python use os.open() or open(.. buffering=0) for testing).

Sample Python test code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import codecs
import random
import string
from hashlib import md5
from multiprocessing import Pool

PROCESSES = 10
LENGTH = 8192*5+777
LINES = 100
FILENAME = ‘/tmp/test.log’

def randomstr(length):
return ”.join(random.choice(string.lowercase) for i in range(length))

def write(_):
with codecs.open(FILENAME, ‘ab’, encoding=’utf-8′, buffering=0) as f:
rndstr = randomstr(LENGTH – 2 – len(md5(‘x’).hexdigest())) # -2 is to account for : and \n
hash = md5(rndstr).hexdigest()
line = rndstr + ‘:’ + hash + ‘\n’
for i in xrange(LINES):
f.write(line)

def verify():
wrong = 0
lines = 0
with codecs.open(FILENAME, ‘rb’, encoding=’utf-8′) as f:
for l in f:
lines += 1
try:
line, hash = l.strip(‘\n’).split(‘:’)
except:
wrong += 1
if hash != md5(line).hexdigest():
wrong += 1
print ‘Line length {} total {} missing {} wrong {}’.format(LENGTH, lines, PROCESSES * LINES – lines, wrong)

if __name__ == ‘__main__’:
try:
os.unlink(FILENAME)
except:
pass
pool = Pool(PROCESSES)
pool.map(write, [None] * PROCESSES)
pool.close()
pool.join()
verify()

Reply
- Joker_vD
  January 24, 2019 at 5:49 pm
  
  > your script is using pipe (echo $line >> $OUTPUT_FILE)
  No, it doesn’t. The shell basically does
  open(“$OUTPUT_FILE”, O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
  write(3, “$line);
  close(3);
  with some descriptor duplication thrown in the mix, but no pipes are created — run strace if you don’t believe me.
  
  Reply
Aravindakshan Babu
October 26, 2016 at 3:02 am

Strange. When I run your script without any arguments or modifications on ubuntu 14.04 with ext4, I get:

$ ./test_appends.sh
Buffer length not specified, defaulting to 4096
Launching 20 worker processes
Each line will be 4096 characters long
Waiting for processes to exit
Testing output file
Found 83 instances of corrupted lines

Any ideas?

Reply
- Oz Solomon (Post author)
  October 26, 2016 at 7:10 pm
  
  Please see Reid Faiv’s comments above. I suspect this has to do with changes in pipe flushing on your system.
  
  Reply
  - Aravindakshan Babu
    October 29, 2016 at 2:56 am
    
    Thanks for the reply Oz!
    
    I”m not sure that is the case. On my machine:
    
    “””
    $ grep “BUF” /usr/include/linux/limits.h
    #define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */
    “””
    
    Seems like something else is going on… crazy.
    
    Reply
Mike Yoder
May 20, 2018 at 12:10 pm

@OzSolomon, thanks for sharing your useful script. I liked it so much I translated it to Go and called it test-atomic-writes https://github.com/yodertv/test-atomic-writes. It may avoid the shell and pipe concerns identified by several commenters. As noted in the answers to this question on StackOverflow https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix/50431250#50431250, this is device dependent behavior. This program provides another portable way to test operating systems and devices for the limits of this behavior.

I found the question and your blog entry while writing a program on my Mac that depended on this behavior working. This test program allowed me to discover that Apple’s APFS doesn’t currently meet the POSIX standard. I reported Apple Bug #37859698 referencing test-atomic-writes code and test results on GitHub. Doesn’t seem to be getting any attention. Anyone else impeded by this?

Reply
Marcus Perlick
January 23, 2019 at 11:22 am

Tried it bare-bones and replaced the shell’s echo with this small C append – all fine on my linux, even with 10k bytes per write:
#include
#include
#include
#include
#include
#include
#include
#include
int main(int argc, char *argv[]) {
if (argc != 3) { return 1; }
int fd = open(argv[1], O_CREAT|O_WRONLY|O_APPEND, 0666);
if (fd < 0) { fprintf(stderr, "open: %s", strerror(errno)); return 2; }
size_t len = strlen(argv[2]);
char *buf = malloc(len+1);
memcpy(buf, argv[2], len);
buf[len] = '\n';
write(fd, buf, len+1);
close(fd);
}

Reply
- Oz Solomon
  January 25, 2019 at 1:53 pm
  
  This seems suspect. The memcpy should be a memfill. You’re currently writing garbage into your buffer, so the script should definitely fail right away.
  
  Reply