I currently don't have that much time, but did some work on this anyway. Specifically, I started with some performance tests with the vs2012 profiler, in the asteroid mission where you first find the iceni (I didn't find a specific blue planet mission that I should try instead, I'm open to suggestions

)
While there are not as many objects as in some current mods, I thought it's a good start.
Version: Release (no SSE, because reasons)
Inclusive samples in:
game_do_frame: 13358
obj_move_all: 2012
obj_sort_and_collide: 995
->obj_find_overlap_colliders: 786 (->obj_collide_pair: 452)
->obj_quicksort_colliders: 209
That means sort_and_collide excluding obj_collide_pair (which is the narrow phase) had 543 samples or ~55% of sort_and_collide or ~27% of obj_move_all or ~4% of game_do_frame
Another thing I noticed is that obj_collide_pair has roughly 1/3 to 1/2 of its samples in operator[] of the hashmap (~40% to be more precise)
Well, that was just a very small mission, I probably should do this again with a larger one to see how it scales...
So my current plan when I have time:
Try if caching get_collider_endpoint results will lead to a performance improvement with the current system (might even do this today)
Fix a very weird bug with the alternative system I tried to implement.
Optimize the alternative system
Do some profiling with different missions (I need help here, tell me which one I should choose :> )
Do some profiling with SSE2 builds.
Btw, what do you actually use for profiling? Just curious.