doc: Lesson about fd handling to avoid buffered pipes.
This commit is contained in:
186
docs/lessons/unix-pipes-and-buffering.md
Normal file
186
docs/lessons/unix-pipes-and-buffering.md
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
# Unix Pipes Are Not Buffered (But Everything Else Is)
|
||||||
|
|
||||||
|
If you've ever piped a command into another and wondered why
|
||||||
|
the output seems to "lag" or arrive in chunks, this one's
|
||||||
|
for you. The pipe isn't the problem. It never was.
|
||||||
|
|
||||||
|
## Pipes are just a kernel FIFO
|
||||||
|
|
||||||
|
When the shell sets up `cmd1 | cmd2`, it does roughly this:
|
||||||
|
|
||||||
|
```c
|
||||||
|
int fds[2];
|
||||||
|
pipe(fds); // fds[0] = read end, fds[1] = write end
|
||||||
|
// fork cmd1, dup2(fds[1], STDOUT)
|
||||||
|
// cmd1's stdout writes into the pipe
|
||||||
|
// fork cmd2, dup2(fds[0], STDIN)
|
||||||
|
// cmd2's stdin reads from the pipe
|
||||||
|
```
|
||||||
|
|
||||||
|
The pipe itself is a dumb byte queue in the kernel. No
|
||||||
|
buffering strategy, no flushing, no opinions. Bytes written
|
||||||
|
to the write end are immediately available on the read end.
|
||||||
|
It has a capacity (64KB on Linux, varies elsewhere) and
|
||||||
|
`write()` blocks if it's full. That's your backpressure.
|
||||||
|
|
||||||
|
Think of it like a bounded `tokio::sync::mpsc::channel` but
|
||||||
|
for raw bytes instead of typed messages. One side writes,
|
||||||
|
the other reads, the kernel handles the queue.
|
||||||
|
|
||||||
|
## So where does the buffering come from?
|
||||||
|
|
||||||
|
The C standard library (`libc` / `glibc`). Specifically, its
|
||||||
|
`FILE*` stream layer (the thing behind `printf`, `puts`,
|
||||||
|
`fwrite` to stdout, etc.).
|
||||||
|
|
||||||
|
When a C program starts up, before your `main()` even runs,
|
||||||
|
libc's runtime initializes stdout with this rule:
|
||||||
|
|
||||||
|
| stdout points to... | Buffering mode |
|
||||||
|
|----------------------|----------------------------|
|
||||||
|
| A terminal (tty) | **Line-buffered**: flushes on every `\n` |
|
||||||
|
| A pipe or file | **Fully buffered**: flushes when the internal buffer fills (~4-8KB) |
|
||||||
|
|
||||||
|
This detection happens via `isatty(STDOUT_FILENO)`. The
|
||||||
|
program checks if its stdout is a terminal and picks a
|
||||||
|
buffering strategy accordingly.
|
||||||
|
|
||||||
|
**This is not a decision the shell makes.** The shell just
|
||||||
|
wires up the pipe. The *program* decides to buffer based on
|
||||||
|
what it sees on the other end.
|
||||||
|
|
||||||
|
## The classic surprise
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Works fine. stdout is a terminal, line-buffered, lines
|
||||||
|
# appear immediately.
|
||||||
|
tail -f /var/log/something
|
||||||
|
|
||||||
|
# Seems to lag. stdout is a pipe, fully buffered, lines
|
||||||
|
# arrive in 4KB chunks.
|
||||||
|
tail -f /var/log/something | grep error
|
||||||
|
```
|
||||||
|
|
||||||
|
The pipe between `tail` and `grep` is instant. But `tail`
|
||||||
|
detects its stdout is a pipe, switches to full buffering,
|
||||||
|
and holds onto output until its internal buffer fills. So
|
||||||
|
`grep` sits there waiting for a 4KB chunk instead of getting
|
||||||
|
lines one at a time.
|
||||||
|
|
||||||
|
Same deal with any command. `awk`, `sed`, `cut`, they all
|
||||||
|
do the same isatty check.
|
||||||
|
|
||||||
|
## The workarounds
|
||||||
|
|
||||||
|
### `stdbuf`: override libc's buffering choice
|
||||||
|
|
||||||
|
```bash
|
||||||
|
stdbuf -oL tail -f /var/log/something | grep error
|
||||||
|
```
|
||||||
|
|
||||||
|
`-oL` means "force stdout to line-buffered." It works by
|
||||||
|
LD_PRELOADing a shim library that overrides libc's
|
||||||
|
initialization. This only works for dynamically-linked
|
||||||
|
programs that use libc's stdio (most things, but not
|
||||||
|
everything).
|
||||||
|
|
||||||
|
### `unbuffer` (from `expect`)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
unbuffer tail -f /var/log/something | grep error
|
||||||
|
```
|
||||||
|
|
||||||
|
Creates a pseudo-terminal (pty) so the program *thinks*
|
||||||
|
it's talking to a terminal and uses line buffering. Heavier
|
||||||
|
than `stdbuf` but works on programs that don't use libc's
|
||||||
|
stdio.
|
||||||
|
|
||||||
|
### In your own code: just don't add buffering
|
||||||
|
|
||||||
|
In Rust, raw `std::fs::File` writes are unbuffered. Every
|
||||||
|
`.write()` call goes straight to the kernel via the `write`
|
||||||
|
syscall:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use std::io::Write;
|
||||||
|
|
||||||
|
// Immediately available on the read end. No flush needed.
|
||||||
|
write_file.write_all(b"first line\n")?;
|
||||||
|
|
||||||
|
// Reader already has that line. Do whatever.
|
||||||
|
tokio::time::sleep(Duration::from_secs(1)).await;
|
||||||
|
|
||||||
|
// This also lands immediately.
|
||||||
|
write_file.write_all(b"second line\n")?;
|
||||||
|
```
|
||||||
|
|
||||||
|
If you wrap it in `BufWriter`, now you've opted into the
|
||||||
|
same buffering libc does:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use std::io::{BufWriter, Write};
|
||||||
|
|
||||||
|
let mut writer = BufWriter::new(write_file);
|
||||||
|
writer.write_all(b"first line\n")?;
|
||||||
|
// NOT visible yet. Sitting in an 8KB userspace buffer.
|
||||||
|
writer.flush()?;
|
||||||
|
// NOW visible on the read end.
|
||||||
|
```
|
||||||
|
|
||||||
|
Rust's `println!` and `stdout().lock()` do their own tty
|
||||||
|
detection similar to libc. If you need guaranteed unbuffered
|
||||||
|
writes, use the raw fd or explicitly flush.
|
||||||
|
|
||||||
|
## How pikl uses this
|
||||||
|
|
||||||
|
In pikl's test helpers, we create a pipe to feed action
|
||||||
|
scripts to the `--action-fd` flag:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let mut fds = [0i32; 2];
|
||||||
|
unsafe { libc::pipe(fds.as_mut_ptr()) };
|
||||||
|
let [read_fd, write_fd] = fds;
|
||||||
|
|
||||||
|
// Wrap the write end in a File. Raw, unbuffered.
|
||||||
|
let mut write_file =
|
||||||
|
unsafe { std::fs::File::from_raw_fd(write_fd) };
|
||||||
|
|
||||||
|
// Write the script. Immediately available on read_fd.
|
||||||
|
write_file.write_all(script.as_bytes())?;
|
||||||
|
// Close the write end so the reader gets EOF.
|
||||||
|
drop(write_file);
|
||||||
|
```
|
||||||
|
|
||||||
|
Then in the child process, we remap the read end to the
|
||||||
|
expected fd:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// pre_exec runs in the child after fork(), before exec()
|
||||||
|
cmd.pre_exec(move || {
|
||||||
|
if read_fd != target_fd {
|
||||||
|
// make fd 3 point to the pipe
|
||||||
|
libc::dup2(read_fd, target_fd);
|
||||||
|
// close the original (now redundant)
|
||||||
|
libc::close(read_fd);
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
For streaming/async scenarios (like feeding items to pikl
|
||||||
|
over time), the same approach works. Just don't drop the
|
||||||
|
write end. Each `write_all` call pushes bytes through the
|
||||||
|
pipe immediately, and the reader picks them up as they
|
||||||
|
arrive. No flush needed because `File` doesn't buffer.
|
||||||
|
|
||||||
|
## tl;dr
|
||||||
|
|
||||||
|
- Pipes are instant. They're a kernel FIFO with zero
|
||||||
|
buffering.
|
||||||
|
- The "buffering" you see is libc's `FILE*` layer choosing
|
||||||
|
full buffering when stdout isn't a terminal.
|
||||||
|
- `stdbuf -oL` or `unbuffer` to fix other people's
|
||||||
|
programs.
|
||||||
|
- In your own code, use raw `File` (not `BufWriter`) and
|
||||||
|
every write lands immediately.
|
||||||
|
- It was always libc. Bloody libc.
|
||||||
Reference in New Issue
Block a user