# Unix Pipes Are Not Buffered (But Everything Else Is) If you've ever piped a command into another and wondered why the output seems to "lag" or arrive in chunks, this one's for you. The pipe isn't the problem. It never was. ## Pipes are just a kernel FIFO When the shell sets up `cmd1 | cmd2`, it does roughly this: ```c int fds[2]; pipe(fds); // fds[0] = read end, fds[1] = write end // fork cmd1, dup2(fds[1], STDOUT) // cmd1's stdout writes into the pipe // fork cmd2, dup2(fds[0], STDIN) // cmd2's stdin reads from the pipe ``` The pipe itself is a dumb byte queue in the kernel. No buffering strategy, no flushing, no opinions. Bytes written to the write end are immediately available on the read end. It has a capacity (64KB on Linux, varies elsewhere) and `write()` blocks if it's full. That's your backpressure. Think of it like a bounded `tokio::sync::mpsc::channel` but for raw bytes instead of typed messages. One side writes, the other reads, the kernel handles the queue. ## So where does the buffering come from? The C standard library (`libc` / `glibc`). Specifically, its `FILE*` stream layer (the thing behind `printf`, `puts`, `fwrite` to stdout, etc.). When a C program starts up, before your `main()` even runs, libc's runtime initializes stdout with this rule: | stdout points to... | Buffering mode | |----------------------|----------------------------| | A terminal (tty) | **Line-buffered**: flushes on every `\n` | | A pipe or file | **Fully buffered**: flushes when the internal buffer fills (~4-8KB) | This detection happens via `isatty(STDOUT_FILENO)`. The program checks if its stdout is a terminal and picks a buffering strategy accordingly. **This is not a decision the shell makes.** The shell just wires up the pipe. The *program* decides to buffer based on what it sees on the other end. ## The classic surprise ```bash # Works fine. stdout is a terminal, line-buffered, lines # appear immediately. tail -f /var/log/something # Seems to lag. stdout is a pipe, fully buffered, lines # arrive in 4KB chunks. tail -f /var/log/something | grep error ``` The pipe between `tail` and `grep` is instant. But `tail` detects its stdout is a pipe, switches to full buffering, and holds onto output until its internal buffer fills. So `grep` sits there waiting for a 4KB chunk instead of getting lines one at a time. Same deal with any command. `awk`, `sed`, `cut`, they all do the same isatty check. ## The workarounds ### `stdbuf`: override libc's buffering choice ```bash stdbuf -oL tail -f /var/log/something | grep error ``` `-oL` means "force stdout to line-buffered." It works by LD_PRELOADing a shim library that overrides libc's initialization. This only works for dynamically-linked programs that use libc's stdio (most things, but not everything). ### `unbuffer` (from `expect`) ```bash unbuffer tail -f /var/log/something | grep error ``` Creates a pseudo-terminal (pty) so the program *thinks* it's talking to a terminal and uses line buffering. Heavier than `stdbuf` but works on programs that don't use libc's stdio. ### In your own code: just don't add buffering In Rust, raw `std::fs::File` writes are unbuffered. Every `.write()` call goes straight to the kernel via the `write` syscall: ```rust use std::io::Write; // Immediately available on the read end. No flush needed. write_file.write_all(b"first line\n")?; // Reader already has that line. Do whatever. tokio::time::sleep(Duration::from_secs(1)).await; // This also lands immediately. write_file.write_all(b"second line\n")?; ``` If you wrap it in `BufWriter`, now you've opted into the same buffering libc does: ```rust use std::io::{BufWriter, Write}; let mut writer = BufWriter::new(write_file); writer.write_all(b"first line\n")?; // NOT visible yet. Sitting in an 8KB userspace buffer. writer.flush()?; // NOW visible on the read end. ``` Rust's `println!` and `stdout().lock()` do their own tty detection similar to libc. If you need guaranteed unbuffered writes, use the raw fd or explicitly flush. ## How pikl uses this In pikl's test helpers, we create a pipe to feed action scripts to the `--action-fd` flag: ```rust let mut fds = [0i32; 2]; unsafe { libc::pipe(fds.as_mut_ptr()) }; let [read_fd, write_fd] = fds; // Wrap the write end in a File. Raw, unbuffered. let mut write_file = unsafe { std::fs::File::from_raw_fd(write_fd) }; // Write the script. Immediately available on read_fd. write_file.write_all(script.as_bytes())?; // Close the write end so the reader gets EOF. drop(write_file); ``` Then in the child process, we remap the read end to the expected fd: ```rust // pre_exec runs in the child after fork(), before exec() cmd.pre_exec(move || { if read_fd != target_fd { // make fd 3 point to the pipe libc::dup2(read_fd, target_fd); // close the original (now redundant) libc::close(read_fd); } Ok(()) }); ``` For streaming/async scenarios (like feeding items to pikl over time), the same approach works. Just don't drop the write end. Each `write_all` call pushes bytes through the pipe immediately, and the reader picks them up as they arrive. No flush needed because `File` doesn't buffer. ## tl;dr - Pipes are instant. They're a kernel FIFO with zero buffering. - The "buffering" you see is libc's `FILE*` layer choosing full buffering when stdout isn't a terminal. - `stdbuf -oL` or `unbuffer` to fix other people's programs. - In your own code, use raw `File` (not `BufWriter`) and every write lands immediately. - It was always libc. Bloody libc.