It's pretty straight forward in terms of implementation. The problem you are running into is small sample size and a bit of confirmation bias. If you were to run the same battle 20,000 times in a row, the results would average out. But as mentioned, it's quite possible that you can indeed run into unlucky streaks.
Again, this has nothing to do with defense getting favored in any way from a mechanics stand point. That's just not the case.
-Kaz