### Author Topic: Anyone good with numerics?  (Read 2519 times)

0 Members and 1 Guest are viewing this topic.

#### Mika

• 28
##### Anyone good with numerics?
Hi,

Just thought that someone here could know, how would you compute equation like:
S = uBVBTvT as fast as possible? Ts in the equation denote transpose of the matrix or the vector. It is known that all elements of the matrices and vectors contain real numbers.

Here, u is a 1x4 vector with elements [u^3 u^2 u 1].
v is a similar 1x4 vector with elements [w^3 w^2 w 1].
B is a matrix with coefficients:

[  -1   3   -3   1]
[   3  -6    3    0]
[  -3   0    3    0]
[   1   4    1    0]

V is a 4x4 matrix containing real numbers whose elements are denoted as:

[ V11   V12   V13   V14 ]
[ V21   V22   V23   V24 ]
[ V31   V32   V33   V34 ]
[ V41   V42   V43   V44 ]

So far, I have managed to reduce the thing in to 26 summations and 34 multiplications. Anyone having ideas if it can be simplified further?

Mika

EDIT: Correction - 30 summations.
« Last Edit: October 10, 2008, 06:52:10 pm by Mika »
Relaxed movement is always more effective than forced movement.

#### CP5670

• Dr. Evil
• Global Moderator
• 212
• 142857
##### Re: Anyone good with numerics?
A general rule with these things is to avoid matrix multiplication, so you should do it like (uB)V(BTvT), where the things in parenthesis are done first. That is about as fast as it's going to get. If V is symmetric and positive definite, a Cholesky decomposition of it may simplify it further, but that would probably be more trouble than it's worth for such a small matrix.

#### Mika

• 28
##### Re: Anyone good with numerics?
As a general rule, yes, that is the fastest way to do it. However, now there is almost one row full of zeros in B, and when the products are opened, this cancels some elements.

Should I phrase the question differently: what is the minimum possible amount of computations for exactly this special case?

Mika
Relaxed movement is always more effective than forced movement.

#### Mika

• 28
##### Re: Anyone good with numerics?
Ach, got it down to 31 multiplications and 29 additions. However, there seems to be an interesting question on the line.

Suppose I write:
up2=u*u
t_up2=3*up2

Now if the there is a part of code, where it is stated -t_up2, I suppose this should be a lot more faster than writing (-1)*t_up2? At least according to my understanding, processor should have a built-in operation for swapping those zeros and ones to get the negation, which would be a lot more faster than the crude multiplication.

Mika
Relaxed movement is always more effective than forced movement.

#### Spicious

• Master Chief John-158
• 210
##### Re: Anyone good with numerics?
I expect any decent compiler would optimise that for you.

#### Flipside

• əp!sd!l£
• 212
##### Re: Anyone good with numerics?
I'm not too up on more modern machine code, but what used to happen here was you'd drop out of the compiler and into assembly language and set the negative bit on the flag section of the variable, so, say for example it was bit 3 (from the right) on the flag byte that represented whether the number was positive or negative, you'd simply load up the flag byte and XOR it (iirc- might be OR) with 4.

However, that may be utterly useless information these days

#### CP5670

• Dr. Evil
• Global Moderator
• 212
• 142857
##### Re: Anyone good with numerics?
As a general rule, yes, that is the fastest way to do it. However, now there is almost one row full of zeros in B, and when the products are opened, this cancels some elements.

You could be right in this case but my guess is the vector products would still be better, unless you know something more about V. B would have to be triangular, Toeplitz or something like that to be worth taking the matrix products first. The zeros let you skip some operations but the final result won't necessarily have any zeros unless there was a full row or column of them.

Of course, it also depends on what exactly you're doing with it. It looks like you want to compute this expression many times in succession. If you need to do it for a couple of different V but a lot more u and v, then it may well make sense to do the matrix product first.

#### Mika

• 28
##### Re: Anyone good with numerics?
Well, that expression is computed a several hundreds of thousands of times / second, with both u, w and V varying. But note that I won't be calculating that product using any for loops, but direct substitution.

In the meantime,  got a couple of more question related to the internal functions of the processor.

1) How long does it actually take to allocate memory for a variable type double? This is related to pondering if it is worthwile to pre-compute a simple expression like t_tup2=3*u*u beforehand, if the result only gets used twice. If memory allocation takes those precious CPU cycles, it might be a better idea to simply compute the product twice on the fly rather than substitute the precomputed value.

2) Consider expression n*x, where x is a real number and n is a integer number. Given that summation is usually faster than product, how many times one can sum the same variable until summation becomes more expensive than computing the product directly?

I know I could probably get the answers looking at the processor instruction timing tables, the problem is that those numbers are theoretical, and do not account the operating system running things in the background. And when it comes to operating systems, that is where my computer literacy actually ends.

Mika
Relaxed movement is always more effective than forced movement.

#### castor

• 29
##### Re: Anyone good with numerics?
1) How long does it actually take to allocate memory for a variable type double?
I guess these variables will be local for a function? At least my understanding has been that addressing stack variables is pretty fast, definitely faster than doing multiplications on CPU.
But I guess you need to consider caching issues and all those messy things too, if you want the ultimate performance... tough.

#### Mika

• 28
##### Re: Anyone good with numerics?
For those interested, the answers for most of my questions could be found from the Assembly community forums (www.asmcommunity.net). It seems that those guys know the processors like the back of their hands. If more questions arise, I'll be asking directly from there.

Mika
Relaxed movement is always more effective than forced movement.