Add in video and you increase the bandwidth requirements exponetially, though.
More, yes. Exponetially, not necessarily.
We have an exact size that the video could be (160x120, since it's a hud animation).
We can do it in a specific 8-bit palette rather than RGB color.
We can limit FPS of it to something like 10FPS for fast (T1/LAN) connections, and less for slower connections.
We can encode it with the voice data and then only transmit video with voice rather than having a full-time stream.
We can restrict video to only those clients who have connection speeds fast enough.
We can further reduce size by using scanlines which would double vertical and/or horizontal resolution.
We can compress the video data (future CVP support means that we get compression tech built in).
And all of this can be done from the client side sending the video in the first place. RGB to 8-bit conversion, plus the specific size, plus vertical scanlines, would mean that a video frame (without compression of any sort) would be
160x60 bytes in size (9.4K). Using compression we could probably cut that by another 80-90%, down to the 1.5K range. At 10 FPS that comes out to be 15K additionally for every second of voice data, 5 FPS would give 7.5K additionally, 3 FPS would give 4.5K, etc. If we ditch the current voice compression tech and go with Speex (the eventual plan) then voice data size drops and could possibly make up the difference.
Not great, but not all that bad either.
