Primer on Conditional Probability
When testing RSI I used conditional probability as my statistical tool, thinking that keeping it to high school level statistics would make the analysis easier to follow. However, how many of us remember high school statistics?
Visual Statistics
This is a Venn Diagram. They're useful for visualizing statistical problems. The circle labelled "A" contains all of the times that event "A" occurs. That event could be anything you care to measure, but in our case it'll be the times we get a positive or negative return over some future time period.
The overlap of the circles A and B is called the joint probability. It contains all of the times that A and B occur together.
Calculating Probabilities
Just like you'd calculate probability of getting heads on a fair coin by dividing all the heads you've seen by the total number of flips, the joint probability is calculated by counting all of the times the two events occur together by the total number of events.
The conditional probability adjusts this joint probability. Conditional probability asks: for all of the times that B happened, how often did both A and B occur together? Just like the above, to calculate this we take the joint probability, and divide by the total count of events in B. That's conditional probability.
The marginal probability is the probability that an event occurs, regardless of anything else. Like the probability of getting heads on a coin flip, regardless of the wind conditions or the person flipping it. If the conditional probability of event A given that B occurs is the same as the marginal probability of A occurring, then these events are said to be statistically independent: they don't depend on each other.
Statistical Edge and Random Noise
I like to call the difference between the marginal probability and conditional probability "edge". If the edge is close to zero, the signal is noise. If it's big, it might be a real signal.
In the real world we often have small sample sizes. If you flip a fair coin 1000 times, you might conclude that the probability of getting a heads is 53%! We know it should be 50, but you haven't taken enough samples. You can experiment with this concept here.
To compensate for small sample sizes, I like to compare my conditional probability results to a few sets of randomly generated samples of the same length. I compute the same "edge" for these series, which given their randomness we know should be exactly zero, and compare it to the edge for the candidate signal.
If the candidate signal's edge falls within the distribution of the random noise edge, then we can conclude that it's not worth adding to our toolbox. It doesn't definitively prove that it's 100% worthless, but it does show that it's contribution to your decision making process should be vanishingly small.