Hard Light Productions Forums
Off-Topic Discussion => Programming => Topic started by: Mika on October 10, 2008, 06:36:05 pm
-
Hi,
Just thought that someone here could know, how would you compute equation like:
S = uBVBTvT as fast as possible? Ts in the equation denote transpose of the matrix or the vector. It is known that all elements of the matrices and vectors contain real numbers.
Here, u is a 1x4 vector with elements [u^3 u^2 u 1].
v is a similar 1x4 vector with elements [w^3 w^2 w 1].
B is a matrix with coefficients:
[ -1 3 -3 1]
[ 3 -6 3 0]
[ -3 0 3 0]
[ 1 4 1 0]
V is a 4x4 matrix containing real numbers whose elements are denoted as:
[ V11 V12 V13 V14 ]
[ V21 V22 V23 V24 ]
[ V31 V32 V33 V34 ]
[ V41 V42 V43 V44 ]
So far, I have managed to reduce the thing in to 26 summations and 34 multiplications. Anyone having ideas if it can be simplified further?
Mika
EDIT: Correction - 30 summations.
-
A general rule with these things is to avoid matrix multiplication, so you should do it like (uB)V(BTvT), where the things in parenthesis are done first. That is about as fast as it's going to get. If V is symmetric and positive definite, a Cholesky decomposition of it may simplify it further, but that would probably be more trouble than it's worth for such a small matrix.
-
As a general rule, yes, that is the fastest way to do it. However, now there is almost one row full of zeros in B, and when the products are opened, this cancels some elements.
Should I phrase the question differently: what is the minimum possible amount of computations for exactly this special case?
Mika
-
Ach, got it down to 31 multiplications and 29 additions. However, there seems to be an interesting question on the line.
Suppose I write:
up2=u*u
t_up2=3*up2
Now if the there is a part of code, where it is stated -t_up2, I suppose this should be a lot more faster than writing (-1)*t_up2? At least according to my understanding, processor should have a built-in operation for swapping those zeros and ones to get the negation, which would be a lot more faster than the crude multiplication.
Mika
-
I expect any decent compiler would optimise that for you.
-
I'm not too up on more modern machine code, but what used to happen here was you'd drop out of the compiler and into assembly language and set the negative bit on the flag section of the variable, so, say for example it was bit 3 (from the right) on the flag byte that represented whether the number was positive or negative, you'd simply load up the flag byte and XOR it (iirc- might be OR) with 4.
However, that may be utterly useless information these days :(
-
As a general rule, yes, that is the fastest way to do it. However, now there is almost one row full of zeros in B, and when the products are opened, this cancels some elements.
You could be right in this case but my guess is the vector products would still be better, unless you know something more about V. B would have to be triangular, Toeplitz or something like that to be worth taking the matrix products first. The zeros let you skip some operations but the final result won't necessarily have any zeros unless there was a full row or column of them.
Of course, it also depends on what exactly you're doing with it. It looks like you want to compute this expression many times in succession. If you need to do it for a couple of different V but a lot more u and v, then it may well make sense to do the matrix product first.
-
Well, that expression is computed a several hundreds of thousands of times / second, with both u, w and V varying. But note that I won't be calculating that product using any for loops, but direct substitution.
In the meantime, got a couple of more question related to the internal functions of the processor.
1) How long does it actually take to allocate memory for a variable type double? This is related to pondering if it is worthwile to pre-compute a simple expression like t_tup2=3*u*u beforehand, if the result only gets used twice. If memory allocation takes those precious CPU cycles, it might be a better idea to simply compute the product twice on the fly rather than substitute the precomputed value.
2) Consider expression n*x, where x is a real number and n is a integer number. Given that summation is usually faster than product, how many times one can sum the same variable until summation becomes more expensive than computing the product directly?
I know I could probably get the answers looking at the processor instruction timing tables, the problem is that those numbers are theoretical, and do not account the operating system running things in the background. And when it comes to operating systems, that is where my computer literacy actually ends.
Mika
-
1) How long does it actually take to allocate memory for a variable type double?
I guess these variables will be local for a function? At least my understanding has been that addressing stack variables is pretty fast, definitely faster than doing multiplications on CPU.
But I guess you need to consider caching issues and all those messy things too, if you want the ultimate performance... tough.
-
For those interested, the answers for most of my questions could be found from the Assembly community forums (www.asmcommunity.net). It seems that those guys know the processors like the back of their hands. If more questions arise, I'll be asking directly from there.
Mika