When I was younger I believed in a kind of physical determinism. I thought that if you knew exactly where everything was and exactly how fast it was going, you could perfectly predict everything. I think this belief persists with many folks in data science, especially those that are also software developers like myself. Anyone who hits the inference button on a statistical package is in danger of being seduced by the arbitrary precision in their output and confusing it with probable reality. Knowing the difference between significant and insignificant figures is something that was ingrained in me when I studied mechanical engineering and the gist of it is applicable here, too.
Step one is a sense of persistent doubt that is inherent to my being. On top of that, using informative priors can help as they intrinsically regularize your model.
More helpfully, in the same course/textbook mentioned above (Richard McElreath's Statistical Rethinking) a causality-directed approach is suggested which avoids the "causal salad" of over-specified models. First you draw yourself a structural causal model that includes every variable and how they should relate to each other. This model will have consequences and it can be disproven. You can't include spurious variables unless you have a good sense for how they could actually causally interact with your outcome. You can test the consequences of your model to validate it, and when your model is wrong it also tells you something useful about your scientific understanding inherent in your structural model.
When a good model is wrong, you learn something. When a bad model is wrong, you don't learn anything. Starting with a structural model of causality helps us make good models that teach us something even if they are wrong.
Interesting post. As you model financial markets, what approach or approaches do you take to avoid model over-specification?
Step one is a sense of persistent doubt that is inherent to my being. On top of that, using informative priors can help as they intrinsically regularize your model.
More helpfully, in the same course/textbook mentioned above (Richard McElreath's Statistical Rethinking) a causality-directed approach is suggested which avoids the "causal salad" of over-specified models. First you draw yourself a structural causal model that includes every variable and how they should relate to each other. This model will have consequences and it can be disproven. You can't include spurious variables unless you have a good sense for how they could actually causally interact with your outcome. You can test the consequences of your model to validate it, and when your model is wrong it also tells you something useful about your scientific understanding inherent in your structural model.
When a good model is wrong, you learn something. When a bad model is wrong, you don't learn anything. Starting with a structural model of causality helps us make good models that teach us something even if they are wrong.