Author Topic: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)  (Read 3151 times)

0 Members and 1 Guest are viewing this topic.

Offline taylor

  • Super SCP/Linux Guru
  • Moderator
  • 212
    • http://www.icculus.org/~taylor
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
That looks good, but you should also check to see if the frontlist, etc. variables are divisible by 4.  That could cause the same issue.
I'm not looking at the code, but those are just offsets of p, right? In which case chunk_size alignment will determine alignment of the rest.

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Ok i think im on the right track, but i need to be sure im understanding exactly how this works, this looks good?


https://i.imgur.com/MInETWl.png

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Ok some progress, and i think this is as far ill get today
i added this to line 1560 in modelread.cpp
Code: [Select]
ubyte *absp, *abspp, *p = pm->submodel[n].bsp_data;
absp = (ubyte*)vm_malloc(pm->submodel[n].bsp_data_size+(100*sizeof(ubyte*)));
memset(absp, NULL, pm->submodel[n].bsp_data_size+(100 * sizeof(ubyte*)));
abspp = absp;
int chunk_type, chunk_size, copied=0;
chunk_type = w(p);
chunk_size = w(p + 4);

if ((chunk_size % 4) != 0)
{
int newsize = chunk_size + 4 - (chunk_size % 4);
mprintf(("Warning: Unaligned memory access Defpoints: Chunk Type %d, chunk_size %d in modelread.cpp:1572\n", chunk_type, chunk_size));
mprintf(("Fixing Defpoints: Chunk Type %d, old chunk_size %d, new chunk_size %d\n", chunk_type, chunk_size, newsize));
memcpy(absp, p, chunk_size);
copied += chunk_size;
*(absp + 4) = newsize;
absp += newsize;
}
else
{
memcpy(absp, p, chunk_size);
absp += chunk_size;
copied += chunk_size;
}
memcpy(absp, p+chunk_size, pm->submodel[n].bsp_data_size-copied);
vm_free(pm->submodel[n].bsp_data);
pm->submodel[n].bsp_data = abspp;

I tried that to align the first defpoints that at least in WCS that seem to be the unaligned part, and sometimes it works in the techroom to load most models, but in others it wouldt load the model, but it dosent crash the game either, it just gets stuck there.
Ill keep looking on it this week or next weekend.

edit:updated.
« Last Edit: November 25, 2019, 09:08:01 am by ShivanSpS »

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
That looks good, but you should also check to see if the frontlist, etc. variables are divisible by 4.  That could cause the same issue.
I'm not looking at the code, but those are just offsets of p, right? In which case chunk_size alignment will determine alignment of the rest.

If I understand the code correctly, they are byte offsets of p.  The model_octant_find_faces_sub function is passed a pointer which is p plus a number, and p is of type ubyte.

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
The problem seems to be that, if i change defpoints chunk_size with the new one something crashs or loops (probably loops) silently somewhere else in the code for some models. Im not longer trying on the pi itselft, im doing it on x86 and visual studio.

« Last Edit: November 25, 2019, 09:00:46 am by ShivanSpS »

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Err, don't replace the BSP data with the aligned version.  Try just copying the bad chunk, aligning it, and passing it to the next function.  Then when the function returns, free the copy.  Don't touch the underlying bsp tree.

This implies the function will create and maintain recursive copies of sub-trees if the sub-trees are also unaligned, but as long as the copies aren't too large, this should be fine.  The tree is finite.

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Ok i came out with... this is not a fix, it is more of a hack, too make it work. There are unaligned calls all over the place even after doing this, but the CPU is dealing and fixing those by itself. This may not be true with EVERY ARM cpu out there or with every model.

This "hack" involves in modifing 2 functions.
point_in_octant in modeloctant.cpp:25
Code: [Select]
// returns 1 if a point is in an octant.
int point_in_octant( polymodel * pm, model_octant * oct, vec3d *vert )
{
vec3d *avert;
avert= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(avert, vert, sizeof(vec3d));

if ( avert->xyz.x < oct->min.xyz.x ) free(avert);return 0;
if ( avert->xyz.x > oct->max.xyz.x ) free(avert);return 0;

if ( avert->xyz.y < oct->min.xyz.y ) free(avert);return 0;
if ( avert->xyz.y > oct->max.xyz.y ) free(avert);return 0;

if ( avert->xyz.z < oct->min.xyz.z) free(avert);return 0;
if ( avert->xyz.z > oct->max.xyz.z) free(avert);return 0;
free(avert);
return 1;
}

fvi_ray_boundingbox in fvi.cpp:326
Code: [Select]
int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
{
int middle = ((1<<0) | (1<<1) | (1<<2));
int i;
int which_plane;
float maxt[3];
float candidate_plane[3];

vec3d *ap0, *amin, *amax;
ap0= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(ap0, p0, sizeof(vec3d));
amin= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(amin, min, sizeof(vec3d));
amax= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(amax, max, sizeof(vec3d));

for (i = 0; i < 3; i++) {
if (ap0->a1d[i] < amin->a1d[i]) {
candidate_plane[i] = amin->a1d[i];
middle &= ~(1<<i);
} else if (ap0->a1d[i] > amax->a1d[i]) {
candidate_plane[i] = amax->a1d[i];
middle &= ~(1<<i);
}
}

// ray origin inside bounding box?
// (are all three bits still set?)
if (middle == ((1<<0) | (1<<1) | (1<<2))) {
*hitpt = *ap0;
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 1;
}

// calculate T distances to candidate plane
for (i = 0; i < 3; i++) {
if ( (middle & (1<<i)) || (pdir->a1d[i] == 0.0f) ) {
maxt[i] = -1.0f;
} else {
maxt[i] = (candidateint fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
{
int middle = ((1<<0) | (1<<1) | (1<<2));
int i;
int which_plane;
float maxt[3];
float candidate_plane[3];

vec3d *ap0, *amin, *amax;
ap0= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(ap0, p0, sizeof(vec3d));
amin= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(amin, min, sizeof(vec3d));
amax= (vec3d*)vm_malloc(sizeof(vec3d));
memcpy(amax, max, sizeof(vec3d));

for (i = 0; i < 3; i++) {
if (ap0->a1d[i] < amin->a1d[i]) {
candidate_plane[i] = amin->a1d[i];
middle &= ~(1<<i);
} else if (ap0->a1d[i] > amax->a1d[i]) {
candidate_plane[i] = amax->a1d[i];
middle &= ~(1<<i);
}
}

// ray origin inside bounding box?
// (are all three bits still set?)
if (middle == ((1<<0) | (1<<1) | (1<<2))) {
*hitpt = *ap0;
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 1;
}

// calculate T distances to candidate plane
for (i = 0; i < 3; i++) {
if ( (middle & (1<<i)) || (pdir->a1d[i] == 0.0f) ) {
maxt[i] = -1.0f;
} else {
maxt[i] = (candidate_plane[i] - ap0->a1d[i]) / pdir->a1d[i];
}
}

// Get largest of the maxt's for final choice of intersection
which_plane = 0;
for (i = 1; i < 3; i++) {
if (maxt[which_plane] < maxt[i]) {
which_plane = i;
}
}

// check final candidate actually inside box
if (maxt[which_plane] < 0.0f) {
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 0;
}

for (i = 0; i < 3; i++) {
if (which_plane == i) {
hitpt->a1d[i] = candidate_plane[i];
} else {
hitpt->a1d[i] = (maxt[which_plane] * pdir->a1d[i]) + ap0->a1d[i];

if ( (hitpt->a1d[i] < amin->a1d[i]) || (hitpt->a1d[i] > amax->a1d[i]) ) {
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 0;
}
}
}
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 1;
}_plane[i] - ap0->a1d[i]) / pdir->a1d[i];
}
}

// Get largest of the maxt's for final choice of intersection
which_plane = 0;
for (i = 1; i < 3; i++) {
if (maxt[which_plane] < maxt[i]) {
which_plane = i;
}
}

// check final candidate actually inside box
if (maxt[which_plane] < 0.0f) {
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 0;
}

for (i = 0; i < 3; i++) {
if (which_plane == i) {
hitpt->a1d[i] = candidate_plane[i];
} else {
hitpt->a1d[i] = (maxt[which_plane] * pdir->a1d[i]) + ap0->a1d[i];

if ( (hitpt->a1d[i] < amin->a1d[i]) || (hitpt->a1d[i] > amax->a1d[i]) ) {
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 0;
}
}
}
vm_free(ap0);
vm_free(amin);
vm_free(amax);
return 1;
}

I keep testing this, i already finished the first 2 missions of Wing Commander Saga on the Raspberry PI with this. I also tested Dispora and it seems to work as well.

EDIT: Unfortunately, this only works with Debug build.
Code: [Select]
Thread 1 "fs2_open_3.7.2" received signal SIGBUS, Bus error.
0x0023f7f0 in model_collide_parse_bsp(bsp_collision_tree*, void*, int)
modelcollide.cpp:729

BTW, the 3.7.2 is not on git? i see the 3.7.0, but not the 3.7.2, ill like to fork and add changes on 3.7.2 for Raspberry PI users.
« Last Edit: November 25, 2019, 06:46:03 pm by ShivanSpS »

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
It would be better to implement a proper fix, rather than a hack.  The reason Release builds still aren't working is probably because of the remaining unaligned calls.

I've asked chief1983 to tag the appropriate revision for 3.7.2.

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
There are only 2 possible proper fixes, one is fixing the models themselves, on that PCS2 need to be updated, and im not sure if there are other model programs for pof edit/build. ModelView32 has to be discontinued, i dont think anyone has the source code to fix it, and im petty sure no one will be interested in doing it.

The other is taking the BSP data, and copy&align it when the model is loaded, but as you can see any of my attempts to mess around with the bsp_data ends in issues somewhere else.  Any other thing will look similar to what im doing there, copy the data from unaligned address to a new place and use it.
The problem is massive, these unaligned models ends up with having a lot of "vec3d" data in unaligned possitions. The CPU fixes +90% of them all, but if you disable that you can see it.
So before even attempt to try to fix the bsp data on fly i need to understand perfectly how BSP and the pof data structure itselft works, probably write a .net c# program fist (im more im me element here) that read the POFs and fixes them, and after that try to apply the same here. So it will take some time.

Also i want to try other things, software dxt1/3/5 descompresion using a lib, maybe i can implement that on ddsutils.cpp, and backport 3.8.0 fs1/fs2 Demo support to 3.7.2. Soo much stuff and so little time...
« Last Edit: November 25, 2019, 08:32:32 pm by ShivanSpS »

 

Offline mjn.mixael

  • Cutscene Master
  • 212
  • Anims: 420, Cutscenes: 10, Mainhalls: 7, Logos: 52
    • Steam
    • Twitter
    • Mix-Hai Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Wtf. Is this thread serious? Amazing.
Cutscene Upgrade Project - Mainhall Remakes - MixaelANITools - Between the Ashes - MjnMixael's Render Boutique - Mix-Hai Productions
Youtube Channel - P3D Model Box - Photobucket Albums - Model Releases - Downloads
Between the Ashes is looking for committed testers, PM me for details.
Report MediaVP issues, now on the MediaVP Mantis! Read all about it Here!

 
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Broadly speaking, x86 CPUs only incur a significant performance penalty for unaligned accesses when they straddle a cacheline boundary.  That's because software using unaligned accesses is so common in the x86 world that the hardware has been optimised to accommodate it.  This wasn't quite so true when Freespace was first released, but at worst you would incur a one-cycle delay per unaligned access.

Conversely, ARM CPUs are mostly still designed to be efficient first and fast second, on the grounds that an efficiently designed CPU might run fast anyway if the software is written properly.  The circuitry to handle unaligned accesses is therefore considered an unnecessary complexity, given that well-written software should avoid it; unaligned executable code is outright forbidden (as ARM instructions are all 4 bytes long).  This goes so far that unaligned accesses are not merely unoptimised, but totally unsupported in hardware.  To work around this when an unaligned access happens anyway, it is possible to handle the exception in software by replacing the single access with multiple byte-wide accesses, but of course this is slower and requires an extra temporary register; a single 32-bit load becomes a 7-instruction sequence (load byte 0, load byte 1, combine bytes, load byte 2…) that is difficult to run in parallel.  And that's ignoring the major overhead of taking the exception in the first place.

I think it would be wise to at least process all the mods in Knossos to ensure they have proper alignment, as well as patching the tools to make them produce aligned output in the first place.  But to allow the apparently large corpus of misaligned mods to run on ARM, an alignment workaround does need to be added to the loader.

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
My god what have we done

As for the tag for 3.7.2, it doesn't exist because we weren't even on Git yet.  The quote from the release post concerns me though:  "It is an export of 3.7.2 branch r11329".  I don't see a 3.7.2 branch in our git even though most of the older branches were created.  Someone got too aggressive with branch pruning I think and deleted it.  Either way, I found the git commit was still on my fork and pushed a branch and tag for 3.7.2 up to the main repository.  Enjoy.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
I got the remaining bus errors on release config traped in these bits of code on modelcollide.cpp

line 786
Code: [Select]
case OP_SORTNORM:
if ( version >= 2000 ) {
/*min = vp(p+56);
max = vp(p+68);

node_buffer[i].min = *min;
node_buffer[i].max = *max;*/
ubyte *ac;
ac = (ubyte*)vm_malloc(4);
memcpy(ac,p+56,4);
min = vp(ac);
node_buffer[i].min = *min;
memcpy(ac,p+68,4);
max = vp(ac);
node_buffer[i].max = *max;
vm_free(ac);
}
and
Code: [Select]
case OP_BOUNDBOX:
                      /*min = vp(p+8);
max = vp(p+20);

node_buffer[i].min = *min;
node_buffer[i].max = *max;*/
ubyte *ac;
ac = (ubyte*)vm_malloc(4);
memcpy(ac,p+8,4);
min = vp(ac);
node_buffer[i].min = *min;
memcpy(ac,p+20,4);
max = vp(ac);
node_buffer[i].max = *max;
vm_free(ac);
but now ships has no collision (mostly)and a get a segfault when killing another fighter with a missile. Not sure what i did wrong, that should work. Maybe its late and i cant see it.

Once again, is the same problem, a vec3d being dereferenced, it has been this same issue every time. At least the debug build works and it actually works very well so far.
« Last Edit: November 30, 2019, 09:44:53 am by ShivanSpS »

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Ok now that im making progress in parsing pofs im going to point out some things that i belive should be corrected in the wiki.

https://wiki.hard-light.net/index.php/POF_data_structure
Quote
The rest of the file is a bunch of chunks. Each chunk is:
char[4] chunk_id  // see below for available chunk types
int length        // length of the chunk.

While this is correct, it worth to mention that lenght and size (in case of BSP_data structure https://wiki.hard-light.net/index.php/BSP_data_structure) are not considered the same for chunks that are part of the pof data structure and chunks that are part from BSP_data.
In pof data structure the lenght does not considers the 8 bytes that belong to chunk type and chunk lenght, meaning if the pointer is at the begining of the chunk and you want to go to the next it is p+=lenght+8. This is important to point out because the BSP_data chunks are the exact oposite, there the chunk_size already has the 8 bytes in consideration.
Like this:



Now in "OBJ2" type chunk, in the wiki it says:
Quote
int submodel_number  // What submodel number this is.

#ifdef version2116orhigher
 // FreeSpace 2
 float radius        // radius of this subobject
 int submodel_parent // What submodel is this model's parent. Equal to -1 if none.
 vector offset       // Offset to from parent object <- Added 09/10/98
#else
 // FreeSpace 1
 int submodel_parent // What submodel is this model's parent. Equal to -1 if none.
 vector offset       // Offset to from parent object <- Added 09/10/98
 float radius        // radius of this subobject
#endif

vector geometric_center
vector bounding_box_min_point
vector bounding_box_max_point

string submodel_name
string properites

int movement_type
int movement_axis

int reserved         // must be 0
int bsp_data_size    // number of bytes now following
char[bsp_data_size] bsp_data  // contains actual polygons, etc.

This is wrong, there is a int before each string indicating the lenght of the string, makes sence because otherwise i would have no idea how to parse it. interesting to know that if the properites string is empty the lenght is 4 and are actually 4 NULLs in there and the names as well, if there is 9 chars, the leght is 12 with extra nulls (V Models)

« Last Edit: December 01, 2019, 04:24:43 pm by ShivanSpS »

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
By all means, update the wiki where it needs updating.  I have created an account for you using your HLP email.

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
Thanks Goober, i will update the wiki once im done with this.

Question: Does someone know what information is used in modelcollide.cpp, void model_collide_parse_bsp(bsp_collision_tree *tree, void *model_ptr, int version)? Because i managed to fix the pofs enoght to make them load on 3.7.2 Debug whiout any code modifications, but release still crashes on that function with a bus error, most likely on those vec3d i mentioned earlier.
« Last Edit: December 01, 2019, 09:11:18 pm by ShivanSpS »

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
You can browse the code on GitHub:
https://github.com/scp-fs2open/fs2open.github.com/blob/master/code/model/modelcollide.cpp#L733

EDIT: Actually you would have had to have the code to make modifications.  What do you mean, does anyone know what information is used?

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
You can browse the code on GitHub:
https://github.com/scp-fs2open/fs2open.github.com/blob/master/code/model/modelcollide.cpp#L733

EDIT: Actually you would have had to have the code to make modifications.  What do you mean, does anyone know what information is used?

I was asking because since i dont fully understand how the system, as a whole, works maybe i was missing something, from what i see in that function the likely suspects are the sortnorm chunk min and max boundbox, and the boundbox chunk min and max points. Both could be in highly unaligned positions due to the posibility of "defpoints" having an incorrect size and string aligment not being enforced on OBJ2 chunk. Meaning im still missing something.

Whats strange about all this is that i was spot on trying to replace the entire BSP_Data with one with aligned defpoints copied to a new memory location on modelread.cpp, defpoints looks like the only type of chunk with a non standart data type on bsp_data, the others are just a bunch of int and floats, impossible to be unaligned. Strange that it did not work(for all models) when i tried that.
« Last Edit: December 01, 2019, 07:53:49 pm by ShivanSpS »

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
¯\_(ツ)_/¯

Honestly, I think you are the person who understands this the most - certainly the person who has studied this to this level of detail in the past 10 years.

All I can say is to break down the problem into manageable tasks, take things one at a time, take baby steps, and try to squash the problems one by one.  Feel free to keep posting your progress in this thread though; if we have any insight to offer, we will certainly chime in.

 

Offline ShivanSpS

  • 210
Re: [3.7.2-ARM] Unaligned Memory Access at modeloctant.cpp:28 (title edited)
I just did a terrible discovery. As i said im working on a tool to align the .pofs, it is still not ready as it has some issues here and there, but is enoght to align the data so well that with 3.7.2 Debug i can already see 3 to 4 FPS gain because im using aligned data and the kernel does not have to intervine so much. But im afraid its not only the models, its the FSO code as well, probably were FSO stores this specific data. There is one issue while loading these models after i align them with the tool using unmodified 3.7.2 debug: It crashes with a bus error on a shield impact. fvi.cpp:326 Is those vec3ds data again. BUT, this is the bad part, if i remove "SLDC" chunk from the .pofs this does not happens, and SLDC is FSO only and thus the retail models dosent have that.
The chunk chain leading up to SLDC data is aligned, there maybe some strings inside other chunks that arent yet aligned but the chain offset itselft it is, and SLDC chunk is just a bunch of int and floats, is not unaligned, and it will never be. Its the code.

Release 3.7.2 still crashes on that void model_collide_parse_bsp(bsp_collision_tree *tree, void *model_ptr, int version)(and those *min,*max are the problem), and my best guess is that is the same issue, the debug build has to have something that kicks that data into aligment.

So i need to stop with the tool for a while and look into FSO code again.