To really balance whether or not the difference comes down to AI or Hardware, you'd need to set up something like the following, I'd think:
Herc/Herc2 vs Herc/Herc2 (make sure they are the same in both, matching loadouts), where Friendly has it's AI class to BP and Hostile has Retail.
Captain to Captain and the like or as near equivalent as possible.
(For real regression, then repeat the exercise with the Friendly using the Retail AI and the Hostile using the BP AI just to remove any factors)
You can then vary the ships (but always matching ship to ship and loadout to loadout) however many times you want to gauge the effective levels of AI vs AI when hardware configuration is no longer the concern.
Additionally, if you wanted to adjust AI levels (so that it's no longer a Captain vs Captain of 2 AI behaviours) you could do that as well.