[suppose you examine the reults of NFL games played from 1995-1997, and based on those results, hypothesize that NFL home dogs of +7 or more are good bets.]
The most generally used standard of statistical significance is five percent. As a close approximation, five percent rarity occurs when the W-L record of the sample you have gathered is two standard errors different from 50 percent wins.
The square root of your sample size is the standard deviation of the difference between your total wins and total losses, which I will call excess wins.
The easy way to find the number of standard errors is to divide the excess wins by the standard deviation.
You have reached that five-percent point when your excess wins is two standard errors.
For example, suppose you tested the hypothesis that NFL home dogs of +7 or more are good bets by examining the NFL home dogs played during the 1998-2000 seasons. Suppose you came up with a W-L record of 30-25.
That's a sample of 55 decisions. The square root of 55 is 7.4. Thirty wins minus 25 losses is 5 excess wins. That's less than one standard error. For 55 decisions you need a W-L record of 35-20 to have statistical significance at the five percent level.
Suppose the W-L record for NFL home dogs of +7 or more was 32-15 for games played during 1995-1997. If you add those games in, you get a record of 62-37 for the six-year period 1995-2000. Can you call that a sample of 99 decisions?
No you cannot do that. The reason is you used those 1995-1997 games to formulate and modify your hypothesis. You cannot also use them to test the hypothesis. Only the games played in years other than 1995-1997 can be used to test that hypothesis.