One of the biggest complaints I see all the time on the trove forums is about the random number generator (RNG).

I have a doctorate degree in statistics (emphasis in data mining + machine learning) and have been a teaching assistant and instructor for statistics classes and have learned that almost everyone doesn't know anything about statistics. The people who have taken introduction to statistics may memorize formulas, but most of their intuition is still not developed to properly use the information. This post is meant to educate players, developers, and anyone else who is interested about statistics and how it applies to RNG.

With the recent patch regarding dragon fragments and the many dragons, RNG is once again a hot topic. But this topic has always been hot because of primordial dragon eggs, decent gem drop rates, etc...

There are two main complaints about RNG,

1) Length it takes to complete a task (E.G. collect a primordial dragon egg)

2) The instability of RNG (how some players may have 6 primordial dragon eggs and how some players have none even though they opened the same amount of empowered gem boxes).

Both of these concepts relate to a probability distribution: negative binomial distribution.

To understand the negative binomial distribution, a quick summary of the binomial distribution is needed.

Binomial distribution: "n" independent trials with a probability of "p" to be a success. Everything in trove is based off the binomial distribution when it comes to RNG. For example, If I mine 1000 glacial blocks, there is some probability of getting a fragment. The number of trials is 1000 and the probability of getting a fragment is "p". The probability of "p" is unknown and typically estimated, but different probability curves can be calculated using many different values of "p".

Now the important distribution: negative binomial distribution. The negative binomial distribution uses two parameters, "p" which is the probability of success exactly as defined in the binomial distribution and "r" which is the number of desired successes. If you want to get 5 ice dragon fragments, then "r" = 5. The negative binomial distribution is calculating the probability for the number of trials "n" before you have "r" successes. If we knew the probability of the drop rate, we can see the distribution curve and calculate all the probabilities of getting 5 dragon fragments within "n" trials Using these parameters, the mean and variance of the probability distribution can be calculated.

What is the mean and variance? The mean relates to the length at which a task is expected to take (answers question 1) until it is accomplished and the variance relates to the stability of the task (answers question 2). A lower mean is interpreted as less trials are needed to complete the task, a lower variance means the stability of the task is increased (trove people will be completing the task about the same number of trials).

Let me iterate because using these concepts, RNG is not bad. RNG is used in almost every game possible. What is bad is the implementation of RNG and this can be adjusted according to those two questions.

Let's look at the mean and variance formula for the negative binomial distribution, there are variations of this formula depending on the specific formula of the negative binomial.

mean (lower means shorter) = r*((1-p) / p + 1)

variance = r * (1-p)/(p^2)

These formulas are very intuitive. If we only need 1 dragon egg fragment and the probability of success if p = 0.5, the mean is equal to 2 which means it'll take 2 trials on average to get the 1 dragon egg.

If we only need 1 dragon egg fragment and the probability of success is p = 1/3, the mean is equal to 3 which means it'll take 3 trials on average to get the 1 dragon egg.

We see an inverse relationship, as p decreases (like in the two examples) then the number of trials increases which is very intuitive. I'm sure trove developers and all of us knows this, but what is more important is the variance because it relates to the stability of RNG.

If r = 1, p = 0.5, then the variance is 2

If r = 1, p = 1/3, then the variance is 6

If r = 1, p = 1/6 then the variance is 30

Using these formula, it's clear that as the probability decreases, the instability of RNG increases at a quadratic rate. This is the problem with time gated content such as diamond dragon eggs, the probability of getting a diamond dragon egg is ridiculously low and we only get a limited number of trials, the stability of this RNG is not only unstable, but only have a limited number of trials per week.

How is this information useful? This is the key concept of this entire thread. If you don't read or understand anything, then read this paragraph. Someone suggested earlier in the week to increase the probability of the new dragon fragments by 10 but making it require r = 50 dragon fragments instead of r =5. He suggested this because it'll give us a sense of progression with about the same amount of time, but what was not realized is one very important property happens with this,

Here are the two scenarios:

1) p =p, r = 5

2) p = 10p, r = 50

*A side note, while the means come out to be the same, you can't just take any probability and multiply it by any constant. 10*0.5 = 5 which is not a valid probability.

1) The mean is 5*((1-p) / p + 1)

2) The mean is 50*((1-10p)/10p + 1)

For a probability of p = 1/3000

1) 15000

2) 15000

However, let's see what happens to the stability of RNG,

1) The variance is 5*(1-p)/p^2 = 44,985,000

2) The variance is 50*(1-10p)/(10p)^2 = 4,485,000

We see that the variance is decreased by a factor of 10 fold making it much much more stable than the previous scenario even though it's the same amount of time. It seems the biggest problem with RNG implementation in Trove is not necessarily the length of time, but the stability in it and the stability can be fixed by these principles.

------

Bonus section: variance and stability. While in the above section, I talked about the variance and stability but haven't shown the exact relationship between these two concepts. Using concepts such as the central limit theorem, margin of error, we can relate the variance to the approximate number of trials for a 95% confidence interval. Which means, 95% of trove will get the "r" successes within the interval n = (a,b) (which can be calculated with different values of p). When the variance is reduced by a factor of 10, this will make 95% of trove will get the same exact "r" successes within the interval n = (a/sqrt(10),b/sqrt(10)) or n = (a/3.2,b/3.2). A significant amount of stability is gained.