I have to repeat that I'm pretty sure that the broadphases uses a cube for sorting, not a sphere.
So, if I understand this correctly, what the engine does is:
1. Sort objects by one coordinate
2. Detect whether two objects are closer together along that coordinate than the sum of their radii
3. If so, check whether they are also closer together than the sum of their radii in the other two dimensions.
This would be a bounding cube, aligned to the coordinate system.
If so, we can get from the bounding cube to a bounding cylinder by checking whether the two objects that can overlap in the first dimension (x) can overlap in y and z by: deltay²+deltaz² > r1² + r² . This is almost as cheap computationally, and could reduce conditional jumps, if the current algorithm checks both y and z dimensions independently.
On any architecture where the vector-units are big enough to hold and compute all three coordinates at once, checking for the full sphere
after checking for overlap in the first dimension would take practically no extra time.