The first step in video encoding is to get the image. This is the job of the
capture module which initializes the capture device and asks it to capture frames.
Then, when the encoder needs a frame it performs optional (and slow!) color conversions
and gives the frame to the encoder. Note that format convertions are really time consuming
and should be avoided in most cases as they take sometimes more time than the encoding itself.
b) Encoding
To produce an MPEG video stream, the first thing you need to know is how to
encode intra frames. The encoding process is the same as in JPEG and is based
on the fact that our eyes are much more sensitive to lowest spacial frequencies
than to highest. This means you won't really notice if a frame is pixelized in
some areas, even if you'll fell that the picture is of lesser quality.
So the first step is to cut the picture into 8x8 pixel blocks and do a transform which will make it easier to deal with the spatial frequencies. This is called the DCT (Discrete Cosine Tranform) and it is also the most time consuming step of the encoding process. In fame this is done in MMX assembly using a dumb matrix product which is quite fast compared to other algorythm because it takes advantage of an MMX instruction capable of making 4 multiplies and two additions in only a few cycles. However, if someone manage to implement something better, that would be great :-)
The second step is now to discard highest frequencies, what is done during the quantization step. Each coefficient of the 8x8 DCT is multiplied by a quantization coefficient. This too is done in assembly.
Then the resulting 8x8 block is read in zigzag order to obtain a bitstream which is then encoded using a mix between run-length coding and fixed-table Huffman coding. This is once again done in assembly for speed reasons.
Finaly the video bitstream is packed to produce an MPEG stream which is either
written down to a file or sent to the network module.
c) Networking
Once the stream is made and packed, it can be sent on network. There are protocols for sending mpeg on network in the MPEG-2 specifications but currently fame doesn't use them. Since it is only capable of encoding video intra frames, the protocol used is quite simple. It is based on UDP. A sequence header, which contains the video size and frame rate, is sent every 100 frames to enable a decoder to synchronize on the stream and understand the data which is sent to him. This means that this decoder must be able to discard any data that it catches before the first sequence header (SMPEG for example does). Then any frame encoded is sent as a network packet, and thus if the packet is lost, the decoders won't desynchronize. This is a really simple network protocol and it is planned to change in the future, by using the capabilities of the MPEG system layer.