Conventional wisdom says that appending to a file is atomic, up to a certain write size. That means, for example, that you can have multiple processes write shared a log file and not have the log output crossed. Recently, I was thinking about this “conventional wisdom” and decided to look for some proof. After some Googling, and eventually landed on this StackOverflow question which discusses the question at hand. The accepted answer has a lot of good information, but also uses the words “supposed to be” and “assuming” a little to much to be conclusive.
Testing Appends on Linux
To settle this issue once and for all, I decided to write a short script to actually test the limit of atomic appends. The idea is to fork a bunch of worker threads and have them write to the same file. Each thread will write a unique signature, which we can then verify against corruption. If user freiheit from StackOverflow is correct, then writes of 4096 bytes on Linux should be OK, and at 4097 bytes we should already see corruption.
The bash script (source at the end of this post) takes one argument – the amount that each worker should write at a time (a.k.a. the buffer size). The results from a CentOS 6.5 box are as follows:
# ./test_appends.sh 4096 Launching 20 worker processes Each line will be 4096 characters long Waiting for processes to exit Testing output file .......................[snip].... All's good! The output file had no corrupted lines. # ./test_appends.sh 4097 Launching 20 worker processes Each line will be 4097 characters long Waiting for processes to exit Testing output file .......................[snip].... Found 27 instances of corrupted lines
So there you have it, empirical proof that Linux allows atomic appends of 4KB.
What About Windows?
Luckily, we can use cygwin on Windows to run the same experimental script. With some trial and error we can then determine that the maximum atomic write size is 1024 bytes:
$ ./test_appends.sh 1024 Launching 30 worker processes Each line will be 1024 characters long Waiting for processes to exit Testing output file All's good! The output file had no corrupted lines. $ ./test_appends.sh 1025 Launching 30 worker processes Each line will be 1025 characters long Waiting for processes to exit Testing output file Found 406 instances of corrupted lines
In Conclusion
If you’re writing cross-platform Windows/Linux code, and need to output to a shared file, make sure to keep your writes to 1KB or less.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
#!/bin/bash ############################################################################# # # This script aims to test/prove that you can append to a single file from # multiple processes with buffers up to a certain size, without causing one # process' output to corrupt the other's. # # The script takes one parameter, the length of the buffer. It then creates # 20 worker processes which each write 50 lines of the specified buffer # size to the same file. When all processes are done outputting, it tests # the output file to ensure it is in the correct format. # ############################################################################# NUM_WORKERS=20 LINES_PER_WORKER=50 OUTPUT_FILE=/tmp/out.tmp # each worker will output $LINES_PER_WORKER lines to the output file run_worker() { worker_num=$1 buf_len=$2 # Each line will be a specific character, multiplied by the line length. # The character changes based on the worker number. filler_len=$((${buf_len}-1)) # -1 -> leave room for \n filler_char=$(printf \\$(printf '%03o' $(($worker_num+64)))) line=`for i in $(seq 1 $filler_len);do echo -n $filler_char;done` for i in $(seq 1 $LINES_PER_WORKER) do echo $line >> $OUTPUT_FILE done } if [ "$1" = "worker" ]; then run_worker $2 $3 exit fi buf_len=$1 if [ "$buf_len" = "" ]; then echo "Buffer length not specified, defaulting to 4096" buf_len=4096 fi rm -f $OUTPUT_FILE echo Launching $NUM_WORKERS worker processes for i in $(seq 1 $NUM_WORKERS) do $0 worker $i $buf_len & pids[$i]=${!} done echo Each line will be $buf_len characters long echo Waiting for processes to exit for i in $(seq 1 $NUM_WORKERS) do wait ${pids[$i]} done # Now we want to test the output file. Each line should be the same letter # repeated buf_len-1 times (remember the \n takes up one byte). If we had # workers writing over eachother's lines, then there will be mixed characters # and/or longer/shorter lines. echo Testing output file # Make sure the file is the right size (ensures processes didn't write over # eachother's lines) expected_file_size=$(($NUM_WORKERS * $LINES_PER_WORKER * $buf_len)) actual_file_size=`cat $OUTPUT_FILE | wc -c` if [ "$expected_file_size" -ne "$actual_file_size" ]; then echo Expected file size of $expected_file_size, but got $actual_file_size else # File size is OK, test the actual content # Only use newer versions of grep because older ones are way too slow with # backreferences [[ $(grep --version) =~ [^[:digit:]]*([[:digit:]]+)\.([[:digit:]]+) ]] grep_ver="${BASH_REMATCH[1]}${BASH_REMATCH[2]}" if [ "$grep_ver" -ge "216" ]; then num_lines=$(grep -v "^\(.\)\1\{$((${buf_len}-2))\}$" $OUTPUT_FILE | wc -l) else # Scan line by line in bash, which isn't that speedy, but is good enough # Note: Doesn't work on cygwin for lines < 255 line_length=$((${buf_len}-1)) num_lines=0 for line in `cat $OUTPUT_FILE` do if ! [[ $line =~ ^${line:0:1}{$line_length}$ ]]; then num_lines=$(($num_lines+1)) fi; echo -n . done echo fi if [ "$num_lines" -gt "0" ]; then echo "Found $num_lines instances of corrupted lines" else echo "All's good! The output file had no corrupted lines. $size" fi fi rm -f $OUTPUT_FILE |
Update 11/12/2014: Updated the script to check the resulting file size as per glandium‘s suggestion in the comments.
November 12, 2014 at 1:29 am
One thing your check doesn’t do is count the number of lines in the file. Which hides the fact that some lines are just overwritten on Windows. With 50 lines per worker, I got 15% loss ; with 500 lines, between 40 and 50%.
November 12, 2014 at 8:55 am
Thank you for the suggestion. I added a filesize check to the script to make sure entirely dropped lines are detected. Personally I’m not seeing this problem on Windows, but it could be because we’re using different versions/ports of bash.
November 13, 2014 at 5:36 am
Indeed, depending on the APIs used, the result is different. Native win32 APIs work properly (CreateFile with FILE_APPEND_DATA), but POSIX APIs don’t (open with O_APPEND, as provided in the MS CRT).
February 6, 2015 at 8:48 pm
FWIW I just tried this on OS X Yosemite 10.10.2 and it only supports 256 bytes without locking.
July 4, 2019 at 3:58 am
256 bytes on macOS Mojave 10.14.4 too. APFS file system with 4096 allocation size.
February 14, 2015 at 4:41 pm
I used your script on Debian 7.8, and I was only able to get atomic writes up to and including 1008 bytes (1024 – 16 bytes of overhead?) on both my ext4 partition and a tmpfs mount. Anything beyond that resulted in corruption every time.
April 9, 2015 at 9:00 pm
It’s possible that the ‘echo’ command used by the worker processes is to blame for the corruption.
You’re assuming that ‘echo’ is using write(2) to output it’s argument(s) directly.
What if, instead, it’s using stdio, and using printf? There’s no guarantee that a single call to printf will result in a single write(2) call. It might be dumb enough to try copying the argument into STDOUT’s buffer, and as that buffer gets full, outputting the buffer via write(2). This means that long enough strings passed to printf could result in multiple write(2) calls.
The only way to know for certain is to use strace, or something like it.
May 20, 2015 at 4:15 pm
I know this is old but on linux, I was able to used stdbuf to extend the output buffer beyond 4K via the newish stdbuf command.
I just did something like this in the run_worker() method:
stdbuf -o 10K echo $line >> $OUTPUT_FILE
PS: Thanks for your test program!
June 30, 2016 at 5:28 pm
Because your script is using pipe (echo $line >> $OUTPUT_FILE) then your atomic writes are limited to *pipe* atomic write size, which is different from file write append semantics. Your test basically shows that 4K limit on pipe write is there, on Linux limits.h:
#define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */
However append to file on posix is atomic:
http://man7.org/linux/man-pages/man2/write.2.html
If the file was open(2)ed with O_APPEND, the file offset is
first set to the end of the file before writing. The adjustment of
the file offset and the write operation are performed as an atomic
step.
You can test it with similar script, but you have to use OS write. Be aware that some languages (like Python) do use internal buffering which messes up with test results, and must be disabled (in Python use os.open() or open(.. buffering=0) for testing).
Sample Python test code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import codecs
import random
import string
from hashlib import md5
from multiprocessing import Pool
PROCESSES = 10
LENGTH = 8192*5+777
LINES = 100
FILENAME = ‘/tmp/test.log’
def randomstr(length):
return ”.join(random.choice(string.lowercase) for i in range(length))
def write(_):
with codecs.open(FILENAME, ‘ab’, encoding=’utf-8′, buffering=0) as f:
rndstr = randomstr(LENGTH – 2 – len(md5(‘x’).hexdigest())) # -2 is to account for : and \n
hash = md5(rndstr).hexdigest()
line = rndstr + ‘:’ + hash + ‘\n’
for i in xrange(LINES):
f.write(line)
def verify():
wrong = 0
lines = 0
with codecs.open(FILENAME, ‘rb’, encoding=’utf-8′) as f:
for l in f:
lines += 1
try:
line, hash = l.strip(‘\n’).split(‘:’)
except:
wrong += 1
if hash != md5(line).hexdigest():
wrong += 1
print ‘Line length {} total {} missing {} wrong {}’.format(LENGTH, lines, PROCESSES * LINES – lines, wrong)
if __name__ == ‘__main__’:
try:
os.unlink(FILENAME)
except:
pass
pool = Pool(PROCESSES)
pool.map(write, [None] * PROCESSES)
pool.close()
pool.join()
verify()
January 24, 2019 at 5:49 pm
> your script is using pipe (echo $line >> $OUTPUT_FILE)
No, it doesn’t. The shell basically does
open(“$OUTPUT_FILE”, O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
write(3, “$line);
close(3);
with some descriptor duplication thrown in the mix, but no pipes are created — run strace if you don’t believe me.
October 26, 2016 at 3:02 am
Strange. When I run your script without any arguments or modifications on ubuntu 14.04 with ext4, I get:
$ ./test_appends.sh
Buffer length not specified, defaulting to 4096
Launching 20 worker processes
Each line will be 4096 characters long
Waiting for processes to exit
Testing output file
Found 83 instances of corrupted lines
Any ideas?
October 26, 2016 at 7:10 pm
Please see Reid Faiv’s comments above. I suspect this has to do with changes in pipe flushing on your system.
October 29, 2016 at 2:56 am
Thanks for the reply Oz!
I”m not sure that is the case. On my machine:
“””
$ grep “BUF” /usr/include/linux/limits.h
#define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */
“””
Seems like something else is going on… crazy.
May 20, 2018 at 12:10 pm
@OzSolomon, thanks for sharing your useful script. I liked it so much I translated it to Go and called it test-atomic-writes https://github.com/yodertv/test-atomic-writes. It may avoid the shell and pipe concerns identified by several commenters. As noted in the answers to this question on StackOverflow https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix/50431250#50431250, this is device dependent behavior. This program provides another portable way to test operating systems and devices for the limits of this behavior.
I found the question and your blog entry while writing a program on my Mac that depended on this behavior working. This test program allowed me to discover that Apple’s APFS doesn’t currently meet the POSIX standard. I reported Apple Bug #37859698 referencing test-atomic-writes code and test results on GitHub. Doesn’t seem to be getting any attention. Anyone else impeded by this?
January 23, 2019 at 11:22 am
Tried it bare-bones and replaced the shell’s echo with this small C append – all fine on my linux, even with 10k bytes per write:
#include
#include
#include
#include
#include
#include
#include
#include
int main(int argc, char *argv[]) {
if (argc != 3) { return 1; }
int fd = open(argv[1], O_CREAT|O_WRONLY|O_APPEND, 0666);
if (fd < 0) { fprintf(stderr, "open: %s", strerror(errno)); return 2; }
size_t len = strlen(argv[2]);
char *buf = malloc(len+1);
memcpy(buf, argv[2], len);
buf[len] = '\n';
write(fd, buf, len+1);
close(fd);
}
January 25, 2019 at 1:53 pm
This seems suspect. The memcpy should be a memfill. You’re currently writing garbage into your buffer, so the script should definitely fail right away.