Broadly speaking, x86 CPUs only incur a significant performance penalty for unaligned accesses when they straddle a cacheline boundary. That's because software using unaligned accesses is so common in the x86 world that the hardware has been optimised to accommodate it. This wasn't quite so true when Freespace was first released, but at worst you would incur a one-cycle delay per unaligned access.
Conversely, ARM CPUs are mostly still designed to be efficient first and fast second, on the grounds that an efficiently designed CPU might run fast anyway if the software is written properly. The circuitry to handle unaligned accesses is therefore considered an unnecessary complexity, given that well-written software should avoid it; unaligned executable code is outright forbidden (as ARM instructions are all 4 bytes long). This goes so far that unaligned accesses are not merely unoptimised, but totally unsupported in hardware. To work around this when an unaligned access happens anyway, it is possible to handle the exception in software by replacing the single access with multiple byte-wide accesses, but of course this is slower and requires an extra temporary register; a single 32-bit load becomes a 7-instruction sequence (load byte 0, load byte 1, combine bytes, load byte 2…) that is difficult to run in parallel. And that's ignoring the major overhead of taking the exception in the first place.
I think it would be wise to at least process all the mods in Knossos to ensure they have proper alignment, as well as patching the tools to make them produce aligned output in the first place. But to allow the apparently large corpus of misaligned mods to run on ARM, an alignment workaround does need to be added to the loader.