diff --git a/docs/lessons/unix-pipes-and-buffering.md b/docs/lessons/unix-pipes-and-buffering.md new file mode 100644 index 0000000..cc55434 --- /dev/null +++ b/docs/lessons/unix-pipes-and-buffering.md @@ -0,0 +1,186 @@ +# Unix Pipes Are Not Buffered (But Everything Else Is) + +If you've ever piped a command into another and wondered why +the output seems to "lag" or arrive in chunks, this one's +for you. The pipe isn't the problem. It never was. + +## Pipes are just a kernel FIFO + +When the shell sets up `cmd1 | cmd2`, it does roughly this: + +```c +int fds[2]; +pipe(fds); // fds[0] = read end, fds[1] = write end +// fork cmd1, dup2(fds[1], STDOUT) +// cmd1's stdout writes into the pipe +// fork cmd2, dup2(fds[0], STDIN) +// cmd2's stdin reads from the pipe +``` + +The pipe itself is a dumb byte queue in the kernel. No +buffering strategy, no flushing, no opinions. Bytes written +to the write end are immediately available on the read end. +It has a capacity (64KB on Linux, varies elsewhere) and +`write()` blocks if it's full. That's your backpressure. + +Think of it like a bounded `tokio::sync::mpsc::channel` but +for raw bytes instead of typed messages. One side writes, +the other reads, the kernel handles the queue. + +## So where does the buffering come from? + +The C standard library (`libc` / `glibc`). Specifically, its +`FILE*` stream layer (the thing behind `printf`, `puts`, +`fwrite` to stdout, etc.). + +When a C program starts up, before your `main()` even runs, +libc's runtime initializes stdout with this rule: + +| stdout points to... | Buffering mode | +|----------------------|----------------------------| +| A terminal (tty) | **Line-buffered**: flushes on every `\n` | +| A pipe or file | **Fully buffered**: flushes when the internal buffer fills (~4-8KB) | + +This detection happens via `isatty(STDOUT_FILENO)`. The +program checks if its stdout is a terminal and picks a +buffering strategy accordingly. + +**This is not a decision the shell makes.** The shell just +wires up the pipe. The *program* decides to buffer based on +what it sees on the other end. + +## The classic surprise + +```bash +# Works fine. stdout is a terminal, line-buffered, lines +# appear immediately. +tail -f /var/log/something + +# Seems to lag. stdout is a pipe, fully buffered, lines +# arrive in 4KB chunks. +tail -f /var/log/something | grep error +``` + +The pipe between `tail` and `grep` is instant. But `tail` +detects its stdout is a pipe, switches to full buffering, +and holds onto output until its internal buffer fills. So +`grep` sits there waiting for a 4KB chunk instead of getting +lines one at a time. + +Same deal with any command. `awk`, `sed`, `cut`, they all +do the same isatty check. + +## The workarounds + +### `stdbuf`: override libc's buffering choice + +```bash +stdbuf -oL tail -f /var/log/something | grep error +``` + +`-oL` means "force stdout to line-buffered." It works by +LD_PRELOADing a shim library that overrides libc's +initialization. This only works for dynamically-linked +programs that use libc's stdio (most things, but not +everything). + +### `unbuffer` (from `expect`) + +```bash +unbuffer tail -f /var/log/something | grep error +``` + +Creates a pseudo-terminal (pty) so the program *thinks* +it's talking to a terminal and uses line buffering. Heavier +than `stdbuf` but works on programs that don't use libc's +stdio. + +### In your own code: just don't add buffering + +In Rust, raw `std::fs::File` writes are unbuffered. Every +`.write()` call goes straight to the kernel via the `write` +syscall: + +```rust +use std::io::Write; + +// Immediately available on the read end. No flush needed. +write_file.write_all(b"first line\n")?; + +// Reader already has that line. Do whatever. +tokio::time::sleep(Duration::from_secs(1)).await; + +// This also lands immediately. +write_file.write_all(b"second line\n")?; +``` + +If you wrap it in `BufWriter`, now you've opted into the +same buffering libc does: + +```rust +use std::io::{BufWriter, Write}; + +let mut writer = BufWriter::new(write_file); +writer.write_all(b"first line\n")?; +// NOT visible yet. Sitting in an 8KB userspace buffer. +writer.flush()?; +// NOW visible on the read end. +``` + +Rust's `println!` and `stdout().lock()` do their own tty +detection similar to libc. If you need guaranteed unbuffered +writes, use the raw fd or explicitly flush. + +## How pikl uses this + +In pikl's test helpers, we create a pipe to feed action +scripts to the `--action-fd` flag: + +```rust +let mut fds = [0i32; 2]; +unsafe { libc::pipe(fds.as_mut_ptr()) }; +let [read_fd, write_fd] = fds; + +// Wrap the write end in a File. Raw, unbuffered. +let mut write_file = + unsafe { std::fs::File::from_raw_fd(write_fd) }; + +// Write the script. Immediately available on read_fd. +write_file.write_all(script.as_bytes())?; +// Close the write end so the reader gets EOF. +drop(write_file); +``` + +Then in the child process, we remap the read end to the +expected fd: + +```rust +// pre_exec runs in the child after fork(), before exec() +cmd.pre_exec(move || { + if read_fd != target_fd { + // make fd 3 point to the pipe + libc::dup2(read_fd, target_fd); + // close the original (now redundant) + libc::close(read_fd); + } + Ok(()) +}); +``` + +For streaming/async scenarios (like feeding items to pikl +over time), the same approach works. Just don't drop the +write end. Each `write_all` call pushes bytes through the +pipe immediately, and the reader picks them up as they +arrive. No flush needed because `File` doesn't buffer. + +## tl;dr + +- Pipes are instant. They're a kernel FIFO with zero + buffering. +- The "buffering" you see is libc's `FILE*` layer choosing + full buffering when stdout isn't a terminal. +- `stdbuf -oL` or `unbuffer` to fix other people's + programs. +- In your own code, use raw `File` (not `BufWriter`) and + every write lands immediately. +- It was always libc. Bloody libc.