Argh. These shaders are a disaster. Shader Model 3.0 is required, so there's no chance of running them on radeons 9xxx. The specific implementation details vary from one card to another and some things may work on cards that do not support SM 3.0 just because, for example, the instruction limit is larger than required by standard. A quick run of cgc (nvidia's shader compiler) on the most basic vertex shader (b-vert.sdr):
./cgc -oglsl -profile vs_2_0 b-vert.sdr>/dev/null
(0) : error C6001: Temporary register limit of 12 exceeded; 14 registers needed to compile program
97 lines, 1 errors.
./cgc -oglsl -profile vs_2_x b-vert.sdr>/dev/null
(0) : error C6002: Instruction limit of 256 exceeded; 632 instructions needed to compile program
97 lines, 1 errors.
./cgc -oglsl -profile vs_3_0 b-vert.sdr>/dev/null
97 lines, 0 errors.
The instruction number seems way too high here.
At all cost avoid branching! Be aware that on SM 2.0 hardware the support for branching is done by calculating both paths and multiplying one result by 1 and the second one by 0 and then summing them up. Simple proof:
uniform float dotV;
void main()
{
float specLight;
if (dotV > 0.0) {
specLight = pow(dotV, gl_FrontMaterial.shininess);
} else {
specLight = 0.0;
}
gl_FragColor = vec4(0,0,0,specLight);
}
produces:
MOVR R0.y, {0}.x;
MOVR R0.x, {0};
MOVR R0.z, gl_FrontMaterial$shininess.x;
SGTRC HC.x, dotV, R0.y;
POWR R0.x(NE), dotV.x, R0.z;
MOVR o[COLR].w, R0.x;
MOVR o[COLR].xyz, {0}.x;
The equivalent shader that does not use branching
uniform float dotV;
void main()
{
float specLight;
specLight = pow(max(0.0,dotV), gl_FrontMaterial.shininess);
gl_FragColor = vec4(0,0,0,specLight);
}
produces the following code:
MOVR R0.x, {0};
MAXR R0.x, R0, dotV;
MOVR o[COLR].xyz, {0}.x;
POWR o[COLR].w, R0.x, gl_FrontMaterial$shininess.x;
The
if (i > n_lights) {
return;
}
in render_vertex_light() does not help at all on SM 2.0 hardware, because, as said earlier, all the following code will be calculated anyway.
All that branching should be removed ASAP.