Align X-style judging with freestyle judging methodology

This discussion has an associated proposal. View Proposal Details here.

Comments about this discussion:

Started 11 May 05:44

When comparing X-style to Freestyle or Flatland it's pretty clear we're looking for something different in X-style. Compared to freestyle it's a much more technical competition and compared to flatland there's a lot more space and a different selection of skills being shown. I think those differences are what has made X-style such a popular competition and why there is a space for it.

With that said I think the way X-style is judged seem rather weird:

– it's clear from some of the discussions that "difficulty" is understood differently amongst different people here
– having competitors judge opens the door to bias (like if half them like a certain style) and being a great unicyclist doesn't necessarily mean one is a great judge
– the chief judge pick the best judges, again cheapening the competition
– raw placement points have a number of known weaknesses, which is why they've been replaced in freestyle and flatland
– since the judging table is made up of competitors the chief judge can't stamp out bias and there's little guarantee that the judging tables in different competitions will come to similar conclusions

As much as the judging system is unique to X-style I don't think it makes the competition better. It seems to me it's a quirk riders simply accept for the competition, rather than something that makes them choose the competition over other competitions.

For what it's worth my personal strategy has always been to simply choose the tricks most unlike what the other riders choose and overwhelm the judges that way. At Unicon in Korea the top 5 riders all had completely different styles and while I expect some that has signed up to judge or been picked to judge to know how say a coast compares to a unispin trick, we can't expect that of riders.

So my proposal is pretty simple: let's discuss which 3-4 things make up "difficulty" and assign them each weights. The judges will then give points for each category from 1-10. We can then either use the raw points or use percentages like freestyle. The chief judge should choose the best judges available. The results are shared with riders so they can know which areas they got a lot of points in and where they're weak (say, if someone has low variation they'll work on that before their next competition).

X-style is a fun competition but it doesn't need to reinvent how a competition like it is judged. It's perfectly reasonable to disagree on which aspects of a routine should give more points, but we owe it to our competitors that they can take a look at the results and know where to improve. We also owe it to our competitors that the same routine in different competitions will be judged the same and we have a hard time with that as it stands.

Comment 11 May 06:57

Magnus you wrote: "raw placement points have a number of known weaknesses, which is why they've been replaced in freestyle and flatland" - but what are the weaknesses in X-Style? There were undisputed problems in freestyle, but they were mainly related to the fact that there are two categories, performance and technics, which are judged by separate judges - there is no such thing in X-Style. The problems known from freestyle, which are also of a mathematical nature, are therefore not transferable to X-Style.

I have made the experience in a lot of competitions that the judges in X-Style are very unanimous regarding the order of the riders. In freestyle I have often seen results with a much larger spread. Personally I think that the judging principle in X-Style works very well and produces reliable results.

Comment 11 May 07:37

I agree with Jan.

Comment 12 May 05:17

I also agree with Jan

Comment 12 May 10:51

On the topic of the limitations of placement points: placement points mean the judges can't rule that some people were closer in level than other people (especially when ties are removed). When judging you often see that people occupy different "tiers" and while it may be hard to decide the order of riders A, B and C one might know that they're each a tier above riders D and E. Now, if other judges disagree and might place rider D with riders A, B and C the end results may not reflect that everyone agrees A, B and C are almost equal. The fact that we can produce an order and figure out who to either send to the next round or to award does not mean we shouldn't try to make the judging better.

At Unicon in Korea I was ranked one position higher than Ryan in the prelim round and one position lower in the final. Since the runs in either round were fairly similar it leaves us to wonder why we ended up being ranked differently in the two rounds. If there has been judges that have ranked us very differently from each other (like 2 and 5, with 2 between), the change in starting group has made a difference in the internal results – something that wouldn't happen with a 1-10 point system.

Anyways, even if placement points perfectly rank riders, they still don't give much insight into where to improve. In freestyle it's great to be able to see the results separately for technical, performance and dismounts – not only does it help to understand why the results were as they were, it also allows the rider to pinpoint areas for improvement. Sometimes the results are really close, sometimes there's one rider who really pulled ahead in a certain category, sometimes someone just didn't show as many tricks as other riders – knowing what made the difference allows the rider to set a direction for their continued improvement.

Another thing is that we specifically make it harder to add nuance to the rules as they are now. It's pretty clear that a big part of difficulty is variation and the volume of tricks, and we're adding some language to that effect. In practice judges can disregard that and judge difficulty as "difficulty of most impressive trick" or "difficulty of longest combo" or similar (I don't mean to suggest ill intent, but judges need to consider a load of things before ranking riders). Actually asking judges to produce points for the different aspects of the routine will ensure whatever nuance we agree should be reflected in the rankings actually gets considered.

Comment 14 May 12:23

I think difficulty and variation and the volume of tricks are different things - of course a high variation and a high number of tricks can indicate a high difficulty, but I can also achieve a high variation and a high number of tricks with a lot of different easy tricks - therefore X-Style is not about these aspects, but about the overall difficulty shown in the run. And the rules also say that X-Style is not about the "difficulty of most impressive trick" or "difficulty of longest combo" or similar.. they say: "[...] the difficulty of the shown skills." and to make it quite unambiguously we want so change that into: "[...] the difficulty of the shown skills in each run. When comparing the difficulty of runs, judges should take into consideration the difficulty of each competitor's run as a whole."

The judge has only to rank the riders in order of the shown difficulty. You definitely can't compare this to freestyle, where there are a lot of different categories, which are all scored.

I agree that one judge alone can not rule that some people were closer in level than other people - but I don't think this is necessary in X-Style either, because there are no different categories which are judged by different judges and put together afterwards.

How close two riders are together can be seen indirectly by the sum of the placement points - the closer two riders are to each other, the more likely it is that different judges will award different placings and thus the sums of the placement points for similarly riders will be closer together than for riders who clearly differ.

And of course it is possible that the ranking order between the preliminaries and the finals changes - but a new run is shown (which, even if it is similar to the preliminaries, will definitely never be identical) and there are also other judges sitting in the jury, so that a new ranking order can be produced with every judging system.

Comment 15 May 10:23

Those are good points. I only think it makes sense to compare X-style judging to technical judging in freestyle and flatland judging without last trick, and not the scoring there as a whole. For both those scores it has been decided to divide the score up in categories. X-style has always been the place to show off high difficulty and I think it makes sense to continue with that. Still, the understanding of how to judge difficulty has always leaned on the way it's understood for freestyle (ie variation and mastery are disregarded).

We've now added language that it's the routine as a whole. We've also made a note of the fact that tricks done longer should count more. There's a discussion if different trick categories should count more (harder to do a cool dragseat combo, then a cool standup combo). I think it would be good to be even more specific in what counts towards difficulty, especially considering the whole run should be taken into account. For instance most riders use the entire time and show tricks the entire time – it's hard(er) to score that against someone doing only few crazy tricks. Some guideline would be great.

As for the point about closer riders being switched in order by more judges is true. We still don't allow any individual judge to offer the opinion that certain riders are closer together. There's no problem in certain judges feeling certain riders are closer together and normalize it when the scores are calculated.

And I fully agree that different runs are shown in each round and so they can't be compared. Different judging tables should come to similar conclusions on the same rides though, however the current ranking system amplify the differences in opinions that invariably exist. I find that unfortunate.

Comment 16 May 07:23

I agree that using placement points has limitations and it does not allow for judges to show the distribution of riders abilities, only the rank. However, using ranking allows for much faster judging. As a performer and an audience member, I hate how long judging for freestyle takes. I prefer the simple style of judging used in X-Style.

Comment 17 May 12:06

I don't really see X-style judging or Freestyle judging as much different in terms of how quickly results can be announced. The judges write some notes on a piece of paper and tabulate scores at the end. Ultimately it's up to a computer to calculate the actual score.

The main difference is that freestyle judges must write down scores after each performance whereas X-style judges must hold all the routines in their head and can by definition only give definite scores after the end of the all the routines. For what it's worth flatland has a similar system to what I'm proposing and they have no problem writing down scores and having little break between performances in the prelims.

Comment 19 May 19:14

There doesn't seem to be unanimous support for this. Magnus, please indicate if you are planning on creating a proposal. Otherwise this discussion should be closed.

Comment 20 May 06:17

Yes, you're right. I'll make a proposal

Comment 20 May 06:39

I've made a proposal. We could consider not normalizing scores although I think that would be a big difference. I've also added a not that the final placements are by points with the person with the most points winning. Currently that's not actually specified explicitly who wins the final (although it should be self explanatory).

Comment 21 May 13:34

If the main purpose of the proposal is that riders should be able to see from the result how large the differences between the individual riders are, then there is a much better way than totaling each judge's score and restating it as a percentage of that judge's total given points for each rider. Because with this method the percentages and also the differences depend very strongly on the size of the start group and are therefore almost meaningless...

Comment 21 May 14:05

The purpose is two fold: to allow judges to give points that are closer together and to allow people to see how far they're apart. It only makes sense to compare scores within a starting group anyways, but I can see there might be some problems that the percentages are going to be more unevenly distributed if someone gives points 2-5 than 7-10 (since 5 is 2.5x2 but 10 is only 1.5x7). I made the proposal this way to make the change from now smaller (relative preference in both cases) but I can see it could make more sense to make it a straight 0-10 scoring system.

Comment 21 May 15:26

I absolutely don't understand what you're trying to say with "but I can see there might be some problems that the percentages are going to be more unevenly distributed if someone gives points 2-5 than 7-10 (since 5 is 2.5x2 but 10 is only 1.5x7)".

But as long as you want convert the points given by the judges into percentages like in freestyle, the overall percentage value and therefore the differences you'd like to make visible to the riders will remain extremely dependent on the size of the starting group. I very much doubt that a rider can really imagine how far his performance is from that of the other rider by using the difference of two percentages calculated in this way, simply because the mathematics used here are unsuitable for this.

The individual judges don't need to make clear how close two riders are to each other, because as I've already written, within a ranking system this gets clear about how " clear " the different judges distribute the placings. The bigger the differences, the clearer the ranking distribution and thus the distance in the total ranking points becomes.

For both purposes, the proposed rule is either not really suitable or unnecessary.

Comment 21 May 17:14

Restating what didn't make sense. If there's 3 riders in a group and they get the scores 6-4-2 one of them will get 50% and the others 33 and 16%. If instead they got the scores 10-8-6 there would again be 2 raw points difference but the top rider would now get 40% instead of 50%. If a judge would tend to give higher points the relative points would mean less.

As for normalizing scores they can be normalized in a bunch of ways and percentages are just the best way to show that the score is normalized. You obviously need to consider how many other riders are in the group but most people should know enough about percentages that it wouldn't be a problem. Regardless no scores are made available to riders as it is now.