This is a collection of tricks for handling the offspring your processes fork and related topics.
When using processes instead of threads, you will sooner or later have to
handle signals using Kernel.trap
.4 To be able to integrate this
nicely with the rest of your code that probably blocks in
Cod.select
as much as possible, you can use the self-pipe
trick.
Rather than cover what a self-pipe is again, I refer you to the extensive documentation online. ’cause this is an old trick!
Using cod, it boils down to this:
self_pipe = Cod.pipe.split
# Register a handler for USR1
trap(:USR1) { self_pipe.write.put :USR1 }
Process.kill(:USR1,
Process.pid)
# Do something without worrying about signals
# Here's the advantage of self-pipe: You can
# decide when to listen for signals. Otherwise
# trap is very preemptive.
self_pipe.read.get # => :USR1
Did you notice that a split pipe returns an array that also answers to
#read
and #write
? This is useful for when you cannot
come up with a name for both ends, as in the above example.
All is not butter and honey in the land of forks. As it is the case with every style of programming, there are a number of things to be aware of. This section of the tutorial strives to give you a heads-up to most of them. Please tell me if something is missing.
Ruby buffers output sent to IO streams for a while. So does the operating system. When you fork a new process, the OS buffers get flushed to disk while the Ruby buffers get duplicated into the new process. (Along with the open file descriptor)
Ruby buffers get flushed at other moments, like when you write more output to the IO stream or when a child process exits. And the drama unfolds: Your child processes will write the unflushed Ruby buffers to the open IO stream upon exit. Their master process will write the same buffer to the same stream on its next write. And you’ll end up with duplicated file contents.
To prevent this from happening: Either flush the streams before you fork
stream.flush
or have them synch to OS buffers immediately on write
stream.sync = true
or even better – don’t let the child inherit the stream in the first place. This can only be achieved by opening the stream after the fork.
While the previous gotcha was more of a Ruby bug, this is a unix problem: When you use signals in your Ruby program, you might mess up C extensions you use. Without signal handling, someone might write the following C code:
size = recv(socket, &buffer, 1024, 0);
assert(size > 0);
// Do something with the data in buffer
But once you register a trap for a signal (man 2 sigaction
),
signals might have to be delivered to your process during the blocking call to
recv
. What your OS will do in this case is really simple and
never happens until you register a sigaction
: It returns from the
recv
call with an exit code of -1. (and an _errno
of
EAGAIN)
You could not care less. This marks the precise spot where that library becomes useless to you, since you’d have to read all the code back and fix all instances where calls are made to the operating system with the wrong set of assumptions. Now you know at least where that assertion fault (or EAGAIN error) all over sudden comes from.
Your average Ruby is not copy-on-write friendly. This means that even though right after a fork, you don’t use double the memory you used in that parent process, some time afterwards you will, since the garbage collector Ruby uses will touch all that memory, creating a child copy of it.
If you create short-lived processes, this is not a problem. And if you create long-lived server processes – just thought I’d give you the heads-up. Try to google for Ruby and ‘Copy on Write’ – you’ll find prior art.
Some excellent books have been written about Unix. I personally did enjoy reading ‘Advanced Programming in the UNIX environment’ (Stevens, Rago). But hey, who am I to tell you that you should read a book.