Hard Light Productions Forums
Off-Topic Discussion => Programming => Topic started by: Mika on February 13, 2009, 04:18:24 pm
-
I recently read some articles whose message was: programs written for single core don't apply anymore. The whole programming industry needs a new way of thinking if the performance of multicore processors is ever to be delivered. That kept me of thinking, is it likely that there are actually no methods to fully develop programs to utilize multiple processors? Some serious game developers talk about more Ph. Ds required especially in this area to fully utilize multicore stuff.
From my personal part I know that ray-tracing has been multi-cored for a long time, and it is a kind of process that should be easily parallellized, and performance increases with multiple cores are verified, though the performance increase curve is logarithmic due to processors needing to communicate at least some amount of data. But what about other programs? When we have had Master of Science and Doctors of Philosophy writing scientific and engineering software for years, what about "normal" programs like games and so?
Can I expect Windows 7 to handle multiple processors any faster than Windows XP? Would this, along with significant increase of memory capacity, be a reason to buy the next generation killer computer that should withstand the test of time for next 5 or 10 years?
I know there are quite skilled programmers in here: what are your thoughts about this stuff?
Mika (approaching 1 promille levels fast for a Friday night!)
-
Windows 7 handles multiple cores differently somehow, since FL studio 8 only uses one core in vista and uses both of my cores in windows 7. It appears to have some sort of hyperthreading going on.
However, if an application doesn't need multiple threads, it doesn't need multiple threads. Like, a text editor.
-
Well, it's more or less true. The idea that processor speeds double every 24 months or so is no longer valid. We're entering "the great computing depression (http://www.lysator.liu.se/upplysning/fa/The_Great_Depression.pdf)". Already it's possible to put more transistors on a chip than we can power on, there are high error rates, climbing as die sizes get smaller, it takes 200 cycles to fetch from RAM, there's a limit to how much we can gain from instruction level parallellism,. It's not gonna cut it anymore.
So what happens now? CRAM IN MORE CORES! Enter parallell computing (http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html). Single threaded applications scale at the rate of single thread CPU:s. this increase is diminishing. Whether we like it or not, it's the future.
-
http://channel9.msdn.com/shows/Going+Deep/Mark-Russinovich-Inside-Windows-7/
This is an interview with one of Windows' Kernel Developers.
Maybe this will help you answering your question how Windows 7 scales with more processors
compared to Vista and XP.
-
I thought I'd throw in some work info.
I've been working on an application at work that has always been a single threaded application (it's written in Salford FTN95 and is very definitely (please!) going to that starry place in the sky sometime soon along with its code base of nightmares.)
It caused a few folks at work a few surprises when they left behind their single core 3.6GHz Xeons for a dual quad core (x8!) 2.4GHz machine. The application was almost literally 2x slower.
We're currently going through the motions of redesigning all the data structures and algorithms to be able to make it multi-threaded/parallel -able.
It is not an easy task!
I do agree with kode, parallel computing will be it in 5 years - but I reckon you'll still be able to buy very fast single/dual core processors because the market will be there for both those that scale well to manymany cores and those that don't scale will still use 1 or 2 cores. (my $0.02)
-
How do you see the programming practise to change? I realised parallellization is far from being well understood. They are saying that high level languages are needed to fully utilize parallellization, otherwise programmer has a high probability to get stuck on a single core programming mode. For example, for all numerical computing stuff I have read and done, not a single one of the cases or studies touched parallellization, even though it is pretty important in numerics.
Mika
-
How do you see the programming practise to change? I realised parallellization is far from being well understood. They are saying that high level languages are needed to fully utilize parallellization, otherwise programmer has a high probability to get stuck on a single core programming mode. For example, for all numerical computing stuff I have read and done, not a single one of the cases or studies touched parallellization, even though it is pretty important in numerics.
Mika
The only reason you need high level languages to support parallelization is because most programmers are stupid.
-
How do you see the programming practise to change? I realised parallellization is far from being well understood. They are saying that high level languages are needed to fully utilize parallellization, otherwise programmer has a high probability to get stuck on a single core programming mode. For example, for all numerical computing stuff I have read and done, not a single one of the cases or studies touched parallellization, even though it is pretty important in numerics.
Mika
In terms of a change in practice, it really does mean thinking way in advance of writing any code. Coding cowboys will run into a lot of trouble trying to write threaded code, while those who write detailed documents of exactly what they're trying to do will spot errors (such as not locking shared variables before use) and algorithmic inefficiencies (such as contention for locked variables) far in advance and be able to either rewrite or mitigate their effect before writing code.
-
How do you see the programming practise to change? I realised parallellization is far from being well understood. They are saying that high level languages are needed to fully utilize parallellization, otherwise programmer has a high probability to get stuck on a single core programming mode. For example, for all numerical computing stuff I have read and done, not a single one of the cases or studies touched parallellization, even though it is pretty important in numerics.
Mika
In terms of a change in practice, it really does mean thinking way in advance of writing any code. Coding cowboys will run into a lot of trouble trying to write threaded code, while those who write detailed documents of exactly what they're trying to do will spot errors (such as not locking shared variables before use) and algorithmic inefficiencies (such as contention for locked variables) far in advance and be able to either rewrite or mitigate their effect before writing code.
When I was writing my engine, all I knew in advance was that i'd need to lock the physics thread somehow, and it worked out fine. Personally I think it depends on the project and the kind of multithreading involved, because not every project requires a 500 page document detailing how to structure a version number and how many times you're allowed to pick your nose and how your not allowed to exceed 4294967295 characters in a string.
But then again, maybe I'm a multi-threaded coding cowboy :P
-
When I was writing my engine, all I knew in advance was that i'd need to lock the physics thread somehow, and it worked out fine. Personally I think it depends on the project and the kind of multithreading involved, because not every project requires a 500 page document detailing how to structure a version number and how many times you're allowed to pick your nose and how your not allowed to exceed 4294967295 characters in a string.
But then again, maybe I'm a multi-threaded coding cowboy :P
I didn't quite mean it like that!
I 'spose a better way of putting would have been: Have a clue what you're doing before you start, and the scope of what you're trying to do.
If you're not the one who is going to be maintaining the code, you do need to have good documentation since the person reading it won't know your code like you do, and figuring out race and deadlock conditions from code is notoriously tricky.
-
When I was writing my engine, all I knew in advance was that i'd need to lock the physics thread somehow, and it worked out fine. Personally I think it depends on the project and the kind of multithreading involved, because not every project requires a 500 page document detailing how to structure a version number and how many times you're allowed to pick your nose and how your not allowed to exceed 4294967295 characters in a string.
But then again, maybe I'm a multi-threaded coding cowboy :P
I didn't quite mean it like that!
I 'spose a better way of putting would have been: Have a clue what you're doing before you start, and the scope of what you're trying to do.
If you're not the one who is going to be maintaining the code, you do need to have good documentation since the person reading it won't know your code like you do, and figuring out race and deadlock conditions from code is notoriously tricky.
But if no one else is going to be touching the code, you needn't worry about such things :p
-
When I was writing my engine, all I knew in advance was that i'd need to lock the physics thread somehow, and it worked out fine. Personally I think it depends on the project and the kind of multithreading involved, because not every project requires a 500 page document detailing how to structure a version number and how many times you're allowed to pick your nose and how your not allowed to exceed 4294967295 characters in a string.
But then again, maybe I'm a multi-threaded coding cowboy :P
I didn't quite mean it like that!
I 'spose a better way of putting would have been: Have a clue what you're doing before you start, and the scope of what you're trying to do.
If you're not the one who is going to be maintaining the code, you do need to have good documentation since the person reading it won't know your code like you do, and figuring out race and deadlock conditions from code is notoriously tricky.
But if no one else is going to be touching the code, you needn't worry about such things :p
I dissent. You in six months may not necessarily have a clue about what present day you was thinking. Even rudimentary documentation is vital for all non-throwaway code.
-
When I was writing my engine, all I knew in advance was that i'd need to lock the physics thread somehow, and it worked out fine. Personally I think it depends on the project and the kind of multithreading involved, because not every project requires a 500 page document detailing how to structure a version number and how many times you're allowed to pick your nose and how your not allowed to exceed 4294967295 characters in a string.
But then again, maybe I'm a multi-threaded coding cowboy :P
I didn't quite mean it like that!
I 'spose a better way of putting would have been: Have a clue what you're doing before you start, and the scope of what you're trying to do.
If you're not the one who is going to be maintaining the code, you do need to have good documentation since the person reading it won't know your code like you do, and figuring out race and deadlock conditions from code is notoriously tricky.
But if no one else is going to be touching the code, you needn't worry about such things :p
I dissent. You in six months may not necessarily have a clue about what present day you was thinking. Even rudimentary documentation is vital for all non-throwaway code.
That's what the occasional comment is for.
-
Then another question, how often do you think in terms of processor cycles while doing multithreaded code? Or have the compliers made that sort of thinking obsolete? I mean that if all processors are running parallellizable threads and one finishes before others, how well does the required number of cycles for each operation hold while running also the operating system?
I'm really starting to consider if I should write some parts of code with Assembly (numerics actually). Though it seems a little distant since the last processor I did it was 80186. I know that in general the compilers should be able to optimize code more efficiently than me, but recent survey with disassembler on the code that the complier put out would suggest that in my choice of compiler it actually isn't so.
Mika
-
While I wouldn't suggest that that kind of thinking is obsolete, I'd suggest that it is not worth writing anything in assembly anymore.
Readability and maintainability need to outweigh performance now.
The difference between a good optimising compiler (with well written code) and hand tuned assembly shouldn't be huge (and in fact, sometimes the compiler may pick up on an optimisation you miss!).
I mean that if all processors are running parallellizable threads and one finishes before others, how well does the required number of cycles for each operation hold while running also the operating system?
Depends on what you want it to do!
-
While I wouldn't suggest that that kind of thinking is obsolete, I'd suggest that it is not worth writing anything in assembly anymore.
Readability and maintainability need to outweigh performance now.
The difference between a good optimising compiler (with well written code) and hand tuned assembly shouldn't be huge (and in fact, sometimes the compiler may pick up on an optimisation you miss!).
I'm not sure about that. Guys at ASM Community say otherwise. Of course, there's no point on writing the whole thing with Assembler, but the most time consuming inner loop. I read they have actually gained remarkable speedups with hand optimizations, but this is coming from people who I guess have been doing that stuff for tens of years and know well the internal happenings in the most current processors.
Also, the little cynicist inside tells me it's a better career move to leave an obscure, totally undocumented Assemby code part inside your program that is only maintainable by you. :D
But then again, I'm a littlebit old fashioned programmer.
Though, I tried to search the forum in ASM Community to find information on multithreading with Assembly. I didn't get many hits and the forum is strangely quiet about that. Maybe I should consider that as a hint?
Mika
-
The optimizations gained by coding in assembly are not worth the absurd amount of time that is wasted on them. Of course you can optimize stuff in assembly. Does that mean that recoding your networking engine in assembly to gain 1 frame per second is worth it when you have a bottleneck in your rendering pipeline that causes a 30% reduction? Noooooooo.
Unless you are absolutely, totally positive that you have optimized everything that can possibly be optimized in your C++ program, and that there is No faster way of doing it throughout your entire program (or all classes that interact with the section of code in question), should you actually start thinking about writing it in assembly. Assembly optimizations work well with anything that is so ridiculously low-level that you should either be finding source code that has already done it for you, or be ignoring it altogether.
Also, the little cynicist inside tells me it's a better career move to leave an obscure, totally undocumented Assembly code part inside your program that is only maintainable by you.
That's not a cynicist, that's a mentally retarded inner child that has been deprived of nutrition for weeks on end.
-
I've been reading about pointer aliasing in C/C++ recently (specifically, where and when to use the restrict keyword)
Maybe this is one of the major reasons why there is a big difference.
I'm still learning!
-
I agree that ASM optimizations are waste of time, most of the time...
But if I've followed this correctly, Mika is talking about scientific calculations involving heavy number crunching (to be run on supercomputer?). So most of the time will probably be spent in just a few tight loops. ASM optimizations could make sense in that case.
-
I agree that ASM optimizations are waste of time, most of the time...
But if I've followed this correctly, Mika is talking about scientific calculations involving heavy number crunching (to be run on supercomputer?). So most of the time will probably be spent in just a few tight loops. ASM optimizations could make sense in that case.
Otherwise correct with the exception of the supercomputer part. Yeah, most of my stuff utilizes CPU only. Sometimes I need RAM, but it is not that important than the raw computation power. But I don't know if the loops are that tight. Most of them are about inverting a matrix, computing spline coordinates or just plain Newton iteration.
Mika
-
For anyone interested, here is a good collection of all kinds of programming tips for C++ and ASM:
http://www.agner.org/optimize/
I find the dependency chain checks to be the most difficult to fathom. I really need to check the machine code of Visual Studio to check if it can really optimize the throughput for processor while looping certain algorithms.
There is some interesting stuff on that page regarding the standard library speed comparisons. For example, they mention their version of memcpy is four to five times faster when it is hand written in Assembly.
Mika
-
http://video.google.com/videoplay?docid=6981137233069932108&q=erlang%202008&hl=en is fairly interesting.
-
Unfortunately, I have came to a conclusion that current multicore programming is not what it is supposed to be for my purposes. There seems to be something fundamentally wrong with the current PC processor technology to take full advantage of the cores. From what I have gathered, the culprit is the memory that is shared between several processors that is making things more difficult. Though I don't really claim to be expert with this issue, but I have seen lots of benchmarks showing little improvement of performance for quadcores or dualcores, and then again some of them do show that.
However, I find out that the CUDA is actually quite interesting from parallel computing point of view, and my current understanding is that this GPU implementation works around the memory issue by having different memory spaces reserved for each processor. And they seem to have quite good track record of scientifically shown increases in performance. Right now I'm actually considering buying one. But given as I'm running with Athlon XP I guess it would mean total system revamp, and I don't think this computer has served it's time wholly yet.
Mika
-
In my own multithreading experience, I've found that shared memory is a really annoying problem.
-
Yes, the real simple solution is to not share memory. message passing can actually be made to work well.
-
Yes, the real simple solution is to not share memory. message passing can actually be made to work well.
My only question about not sharing memory would then be that in which kind of applications this is actually possible? And in this case I specially mean dual and quadcore processors.
Mika
-
It's possible if you can completely and utterly isolate a calculation, minus the return value.
-
Yes, the real simple solution is to not share memory. message passing can actually be made to work well.
My only question about not sharing memory would then be that in which kind of applications this is actually possible? And in this case I specially mean dual and quadcore processors.
Mika
Just about any you can think of.
Also, you don't have to sign your every post. We know it's you from the name in the left part of the post.
-
Just about any you can think of.
Also, you don't have to sign your every post. We know it's you from the name in the left part of the post.
I can only figure out a few examples where it could be possible, one of being random number generation with four cores that doesn't need to be stored anywhere. While definately possible, this has little benefit for real computing applications. The summation for current multicore CPUs that I have gathered is that while speed-ups are theoretically within limits, I think that in practise it can be pretty difficult to achieve large speed-ups for heavy computing with current architecture.
One of the most interesting things is that in software that has supported multiple CPU's for tens of years (ray tracing), local optimizations are still done using only single CPU. When going for all-around optimization, multiple cores are used then.
The reason I leave my name visible is partly because of past internet forum habits, another reason is that when replying to my (sometimes overtly long) posts, people don't need to check from the start who wrote what.
Mika
-
Did you really look up message passing?