1 |
So here's [url=http://zero-k.info/Forum/Thread/20071?page=3#143529]the post[/url]. Normally in an elo system the expectation value of elo change due to a game is zero (~true for 1v1). In teams however !predicted win chances are not distinct enough (too close to 50%). Therefore you can increase your team elo systematically by only playing games with !predicted win chance > 50%, because in reality it is even higher.
|
1 |
So here's [url=http://zero-k.info/Forum/Thread/20071?page=3#143529]the post[/url]. Normally in an elo system the expectation value of elo change due to a game is zero (~true for 1v1). In teams however !predicted win chances are not distinct enough (too close to 50%). Therefore you can increase your team elo systematically by only playing games with !predicted win chance > 50%, because in reality it is even higher.
|
|
|
2 |
\n
|
2 |
[quote][url=http://zero-k.info/Forum/Thread/7402?page=1#89655]I want you (...) to come up with something more complex to prove that springie balance is fundamentally wrong[/url]
|
3 |
[quote][url=http://zero-k.info/Forum/Thread/7402?page=1#89655]I want you (...) to come up with something more complex to prove that springie balance is fundamentally wrong[/url]
|
3 |
@Yogzototh [/quote][quote][url=http://zero-k.info/Forum/Thread/20071?page=4#143648]We should make this into a mini competition.. someone provide a standard data set of eg. 1000 3v3+ games in an easy to use Excel format and we all apply krazy math to achieve the best prediction.[/url]
|
4 |
@Yogzototh [/quote][quote][url=http://zero-k.info/Forum/Thread/20071?page=4#143648]We should make this into a mini competition.. someone provide a standard data set of eg. 1000 3v3+ games in an easy to use Excel format and we all apply krazy math to achieve the best prediction.[/url]
|
4 |
@[GBC]1v0ry_k1ng [/quote]
|
5 |
@[GBC]1v0ry_k1ng [/quote]
|
5 |
[b]How to evaluate prediction systems[/b]
|
6 |
[b]How to evaluate prediction systems[/b]
|
6 |
In [url=http://zero-k.info/Forum/Thread/7402?page=1]2014[/url] @KingRaptor published a file with 1195 team games (after 1v1, ffa, coop and uneven teams have been deleted).
|
7 |
In [url=http://zero-k.info/Forum/Thread/7402?page=1]2014[/url] @KingRaptor published a file with 1195 team games (after 1v1, ffa, coop and uneven teams have been deleted).
|
7 |
\n
|
8 |
\n
|
8 |
I searched for a better evaluation method for prediction systems (where unbalanced games are considered and the best strategy is not always guessing 100%). Fortunately I found ingenious scoring rules that punish false high probability predictions hard and I have proven that score is maximized for guessing the true probability:
|
9 |
I searched for a better evaluation method for prediction systems (where unbalanced games are considered and the best strategy is not always guessing 100%). Fortunately I found ingenious scoring rules that punish false high probability predictions hard and I have proven that score is maximized for guessing the true probability:
|
9 |
[spoiler] - short, german version: [url=https://de.wikipedia.org/wiki/Scoring_rule]https://de.wikipedia.org/wiki/Scoring_rule[/url]
|
10 |
[spoiler] - short, german version: [url=https://de.wikipedia.org/wiki/Scoring_rule]https://de.wikipedia.org/wiki/Scoring_rule[/url]
|
10 |
- long, english version: [url=https://en.wikipedia.org/wiki/Scoring_rule]https://en.wikipedia.org/wiki/Scoring_rule[/url]
|
11 |
- long, english version: [url=https://en.wikipedia.org/wiki/Scoring_rule]https://en.wikipedia.org/wiki/Scoring_rule[/url]
|
11 |
You guess a probability p that team 1 (:= the winning team in @KingRaptor's data) wins a game and you get a score depending on the outcome of the game. I made an affine transformation of the logarithmic scoring rule so that score for guessing p=50% with any outcome is 0 and score for guessing p=100% with outcome true is 1 (Score for guessing p=0% with outcome true is minus infinity!): If the outcome is true (team 1 wins), the score is [b]1+ln(p)/ln(2)[/b]. (If it doesn't, the score is 1+ln(1-p)/ln(2), but this is not needed for @KingRaptor's data, because team 1 is defined as the winner team.) In order to account for unbalanced teams, you have to score better than the current system. It calculates the probability that team with avg elo A wins over team with avg elo B as 1/(1+10^((B-A)/400)). And you have to calculate the average score per game for lots of games.[/spoiler]Let me know if I should prepare a file (with/without uneven teams, with/without my results) to test your own prediction systems (includes not only logarithmic, but also Brier scoring transformed to a scale from 0 to 1).
|
12 |
You guess a probability p that team 1 (:= the winning team in @KingRaptor's data) wins a game and you get a score depending on the outcome of the game. I made an affine transformation of the logarithmic scoring rule so that score for guessing p=50% with any outcome is 0 and score for guessing p=100% with outcome true is 1 (Score for guessing p=0% with outcome true is minus infinity!): If the outcome is true (team 1 wins), the score is [b]1+ln(p)/ln(2)[/b]. (If it doesn't, the score is 1+ln(1-p)/ln(2), but this is not needed for @KingRaptor's data, because team 1 is defined as the winner team.) In order to account for unbalanced teams, you have to score better than the current system. It calculates the probability that team with avg elo A wins over team with avg elo B as 1/(1+10^((B-A)/400)). And you have to calculate the average score per game for lots of games.[/spoiler]Let me know if I should prepare a file (with/without uneven teams, with/without my results) to test your own prediction systems (includes not only logarithmic, but also Brier scoring transformed to a scale from 0 to 1).
|
12 |
\n
|
13 |
\n
|
13 |
[b]Balance vs. prediction[/b]
|
14 |
[b]Balance vs. prediction[/b]
|
14 |
Team balance is how players are distributed to the teams. Changing the prediction system doesn't directly change team balance, but indirectly by changing elo wins and losses. The systems presented here don't change balance directly, my more sophisticated but untested systems do. A term to also minimize standard deviation can be added to any system [url=http://zero-k.info/Forum/Thread/7402?page=1#89751]like this[/url] and will only change balance direclty, but not predictions. Testing a prediction system only really makes sense on teams that are balanced with the corresponding balance system. Other systems will probably predict better on their own balance than a test on current balance hypothesizes (especially my more advanced ones that change balance not only indirectly but directly.)
|
15 |
Team balance is how players are distributed to the teams. Changing the prediction system doesn't directly change team balance, but indirectly by changing elo wins and losses. The systems presented here don't change balance directly, my more sophisticated but untested systems do. A term to also minimize standard deviation can be added to any system [url=http://zero-k.info/Forum/Thread/7402?page=1#89751]like this[/url] and will only change balance direclty, but not predictions. Testing a prediction system only really makes sense on teams that are balanced with the corresponding balance system. Other systems will probably predict better on their own balance than a test on current balance hypothesizes (especially my more advanced ones that change balance not only indirectly but directly.)
|
15 |
\n
|
16 |
\n
|
16 |
[b]Concrete results[/b]
|
17 |
[b]Concrete results[/b]
|
17 |
!predict has reached an average trans log score of 0.079 on 1195 games, "sqrt sm" has 0.098 and "smes" 0.100 (where 0 is guessing always 50%, 1 is guessing everything 100% right and minus infinity is guessing only one thing 100% wrong). The 2-normed average of differences of probability predictions between current and smes is 9%, between "sqrt sm" and "smes" only 2% even though "smes" is of another type. For example in @B245589 !predict gave the winning team 54.6%, "sqrt sm" 64.2% and "smes" 71.6%.
|
18 |
!predict has reached an average trans log score of 0.079 on 1195 games, "sqrt sm" has 0.098 and "smes" 0.100 (where 0 is guessing always 50%, 1 is guessing everything 100% right and minus infinity is guessing only one thing 100% wrong). The 2-normed average of differences of probability predictions between current and smes is 9%, between "sqrt sm" and "smes" only 2% even though "smes" is of another type. For example in @B245589 !predict gave the winning team 54.6%, "sqrt sm" 64.2% and "smes" 71.6%.
|
18 |
\n
|
19 |
\n
|
19 |
[b]"smes" system[/b]
|
20 |
[b]"smes" system[/b]
|
20 |
"smes" means size dependently modified elo sum system. It is based on elo sums and is the best system tested on those 1195 games. n is the average size of the 2 opposing teams (for example 2.5 in a 2v3). The probability p is the win probability according to the normal elo formula with elo sums instead of elo averages. It has been shown that calculating "elo sums" by n*(average elo) is better for uneven teams than calculating real elo sums. The finally predicted probability is then (p^D)/(p^D+(1-p)^D), where D is the distinctivity modificator that depends in n (D>1 increases prediction distinctivity, D<1 decreases) with D(n=1)=1 (to leave 1v1 unchanged). Here we use D=0.5+0.5^n.
|
21 |
"smes" means size dependently modified elo sum system. It is based on elo sums and is the best system tested on those 1195 games. n is the average size of the 2 opposing teams (for example 2.5 in a 2v3). The probability p is the win probability according to the normal elo formula with elo sums instead of elo averages. It has been shown that calculating "elo sums" by n*(average elo) is better for uneven teams than calculating real elo sums. The finally predicted probability is then (p^D)/(p^D+(1-p)^D), where D is the distinctivity modificator that depends in n (D>1 increases prediction distinctivity, D<1 decreases) with D(n=1)=1 (to leave 1v1 unchanged). Here we use D=0.5+0.5^n.
|
21 |
\n
|
22 |
\n
|
22 |
[b]"sqrt sm" system[/b]
|
23 |
[b]"sqrt sm" system[/b]
|
23 |
When adding uneven team games (1726 games on the whole), "sqrt sm" seems to do better, but "smes" and "sqrt sm" have similar and better results than !predict. "sqrt sm" simply uses !predicted probability and modifies it with D=sqrt(n).
|
24 |
When adding uneven team games (1726 games on the whole), "sqrt sm" seems to do better, but "smes" and "sqrt sm" have similar and better results than !predict. "sqrt sm" simply uses !predicted probability and modifies it with D=sqrt(n).
|