5.5 KiB
Unix Pipes Are Not Buffered (But Everything Else Is)
If you've ever piped a command into another and wondered why the output seems to "lag" or arrive in chunks, this one's for you. The pipe isn't the problem. It never was.
Pipes are just a kernel FIFO
When the shell sets up cmd1 | cmd2, it does roughly this:
int fds[2];
pipe(fds); // fds[0] = read end, fds[1] = write end
// fork cmd1, dup2(fds[1], STDOUT)
// cmd1's stdout writes into the pipe
// fork cmd2, dup2(fds[0], STDIN)
// cmd2's stdin reads from the pipe
The pipe itself is a dumb byte queue in the kernel. No
buffering strategy, no flushing, no opinions. Bytes written
to the write end are immediately available on the read end.
It has a capacity (64KB on Linux, varies elsewhere) and
write() blocks if it's full. That's your backpressure.
Think of it like a bounded tokio::sync::mpsc::channel but
for raw bytes instead of typed messages. One side writes,
the other reads, the kernel handles the queue.
So where does the buffering come from?
The C standard library (libc / glibc). Specifically, its
FILE* stream layer (the thing behind printf, puts,
fwrite to stdout, etc.).
When a C program starts up, before your main() even runs,
libc's runtime initializes stdout with this rule:
| stdout points to... | Buffering mode |
|---|---|
| A terminal (tty) | Line-buffered: flushes on every \n |
| A pipe or file | Fully buffered: flushes when the internal buffer fills (~4-8KB) |
This detection happens via isatty(STDOUT_FILENO). The
program checks if its stdout is a terminal and picks a
buffering strategy accordingly.
This is not a decision the shell makes. The shell just wires up the pipe. The program decides to buffer based on what it sees on the other end.
The classic surprise
# Works fine. stdout is a terminal, line-buffered, lines
# appear immediately.
tail -f /var/log/something
# Seems to lag. stdout is a pipe, fully buffered, lines
# arrive in 4KB chunks.
tail -f /var/log/something | grep error
The pipe between tail and grep is instant. But tail
detects its stdout is a pipe, switches to full buffering,
and holds onto output until its internal buffer fills. So
grep sits there waiting for a 4KB chunk instead of getting
lines one at a time.
Same deal with any command. awk, sed, cut, they all
do the same isatty check.
The workarounds
stdbuf: override libc's buffering choice
stdbuf -oL tail -f /var/log/something | grep error
-oL means "force stdout to line-buffered." It works by
LD_PRELOADing a shim library that overrides libc's
initialization. This only works for dynamically-linked
programs that use libc's stdio (most things, but not
everything).
unbuffer (from expect)
unbuffer tail -f /var/log/something | grep error
Creates a pseudo-terminal (pty) so the program thinks
it's talking to a terminal and uses line buffering. Heavier
than stdbuf but works on programs that don't use libc's
stdio.
In your own code: just don't add buffering
In Rust, raw std::fs::File writes are unbuffered. Every
.write() call goes straight to the kernel via the write
syscall:
use std::io::Write;
// Immediately available on the read end. No flush needed.
write_file.write_all(b"first line\n")?;
// Reader already has that line. Do whatever.
tokio::time::sleep(Duration::from_secs(1)).await;
// This also lands immediately.
write_file.write_all(b"second line\n")?;
If you wrap it in BufWriter, now you've opted into the
same buffering libc does:
use std::io::{BufWriter, Write};
let mut writer = BufWriter::new(write_file);
writer.write_all(b"first line\n")?;
// NOT visible yet. Sitting in an 8KB userspace buffer.
writer.flush()?;
// NOW visible on the read end.
Rust's println! and stdout().lock() do their own tty
detection similar to libc. If you need guaranteed unbuffered
writes, use the raw fd or explicitly flush.
How pikl uses this
In pikl's test helpers, we create a pipe to feed action
scripts to the --action-fd flag:
let mut fds = [0i32; 2];
unsafe { libc::pipe(fds.as_mut_ptr()) };
let [read_fd, write_fd] = fds;
// Wrap the write end in a File. Raw, unbuffered.
let mut write_file =
unsafe { std::fs::File::from_raw_fd(write_fd) };
// Write the script. Immediately available on read_fd.
write_file.write_all(script.as_bytes())?;
// Close the write end so the reader gets EOF.
drop(write_file);
Then in the child process, we remap the read end to the expected fd:
// pre_exec runs in the child after fork(), before exec()
cmd.pre_exec(move || {
if read_fd != target_fd {
// make fd 3 point to the pipe
libc::dup2(read_fd, target_fd);
// close the original (now redundant)
libc::close(read_fd);
}
Ok(())
});
For streaming/async scenarios (like feeding items to pikl
over time), the same approach works. Just don't drop the
write end. Each write_all call pushes bytes through the
pipe immediately, and the reader picks them up as they
arrive. No flush needed because File doesn't buffer.
tl;dr
- Pipes are instant. They're a kernel FIFO with zero buffering.
- The "buffering" you see is libc's
FILE*layer choosing full buffering when stdout isn't a terminal. stdbuf -oLorunbufferto fix other people's programs.- In your own code, use raw
File(notBufWriter) and every write lands immediately. - It was always libc. Bloody libc.