So here's
the post. Normally in an elo system the expectation value of elo change due to a game is zero (~true for 1v1). In teams however !predicted win chances are not distinct enough (too close to 50%). Therefore you can increase your team elo systematically by only playing games with !predicted win chance > 50%, because in reality it is even higher.
How to evaluate prediction systemsIn
2014 @KingRaptor published a file with 1195 team games (after 1v1, ffa, coop and uneven teams have been deleted).
I searched for a better evaluation method for prediction systems (where unbalanced games are considered and the best strategy is not always guessing 100%). Fortunately I found ingenious scoring rules that punish false high probability predictions hard and I have proven that score is maximized for guessing the true probability:
[Spoiler] - short, german version:
https://de.wikipedia.org/wiki/Scoring_rule - long, english version:
https://en.wikipedia.org/wiki/Scoring_ruleYou guess a probability p that team 1 (:= the winning team in @KingRaptor's data) wins a game and you get a score depending on the outcome of the game. I made an affine transformation of the logarithmic scoring rule so that score for guessing p=50% with any outcome is 0 and score for guessing p=100% with outcome true is 1 (Score for guessing p=0% with outcome true is minus infinity!): If the outcome is true (team 1 wins), the score is
1+ln(p)/ln(2). (If it doesn't, the score is 1+ln(1-p)/ln(2), but this is not needed for @KingRaptor's data, because team 1 is defined as the winner team.) In order to account for unbalanced teams, you have to score better than the current system. It calculates the probability that team with avg elo A wins over team with avg elo B as 1/(1+10^((B-A)/400)). And you have to calculate the average score per game for lots of games.
Let me know if I should prepare a file (with/without uneven teams, with/without my results) to test your own prediction systems (includes not only logarithmic, but also Brier scoring transformed to a scale from 0 to 1).
Balance vs. predictionTeam balance is how players are distributed to the teams. Changing the prediction system doesn't directly change team balance, but indirectly by changing elo wins and losses. The systems presented here don't change balance directly, my more sophisticated but untested systems do. A term to also minimize standard deviation can be added to any system
like this and will only change balance direclty, but not predictions. Testing a prediction system only really makes sense on teams that are balanced with the corresponding balance system. Other systems will probably predict better on their own balance than a test on current balance hypothesizes (especially my more advanced ones that change balance not only indirectly but directly.)
Concrete results!predict has reached an average trans log score of 0.079 on 1195 games, "sqrt sm" has 0.098 and "smes" 0.100 (where 0 is guessing always 50%, 1 is guessing everything 100% right and minus infinity is guessing only one thing 100% wrong). The 2-normed average of differences of probability predictions between current and smes is 9%, between "sqrt sm" and "smes" only 2% even though "smes" is of another type. For example in
B245589 20 on Cooper_Hill_TNM02-V1 !predict gave the winning team 54.6%, "sqrt sm" 64.2% and "smes" 71.6%.
"smes" system"smes" means size dependently modified elo sum system. It is based on elo sums and is the best system tested on those 1195 games. n is the average size of the 2 opposing teams (for example 2.5 in a 2v3). The probability p is the win probability according to the normal elo formula with elo sums instead of elo averages. It has been shown that calculating "elo sums" by n*(average elo) is better for uneven teams than calculating real elo sums. The finally predicted probability is then (p^D)/(p^D+(1-p)^D), where D is the distinctivity modificator that depends in n (D>1 increases prediction distinctivity, D<1 decreases) with D(n=1)=1 (to leave 1v1 unchanged). Here we use D=0.5+0.5^n.
"sqrt sm" systemWhen adding uneven team games (1726 games on the whole), "sqrt sm" seems to do better, but "smes" and "sqrt sm" have similar and better results than !predict. "sqrt sm" simply uses !predicted probability and modifies it with D=sqrt(n).