Since learning that for every two people there are three definitions of ‘scientific theory’, I have discovered to my dismay that this problem is also rife in science journalism and, perhaps more worryingly, in the technical literature. Perhaps this confusion is ubiquitous in the global scientific community, with the result that there are in fact no agreed-upon usages of terms like ‘fact’, ‘theory’, ‘hypothesis’ and ‘law’. In either case, I’ll lay out the set of definitions that seem to make the most sense both individually and together as a cohesive system of scientific terminology.

Facts are arguably the least interesting of the categories under discussion. They exist in many forms, most notably as quantitative data points and qualitative assessments. They are the core truths about what is found in the world. Chimpanzees inherit 24 chromosomes from each parent whilst humans inherit 23. One of these 23, chromosome 2, has two centromeric sequences, whilst a typical chromosome has only one. Human chromosome 2 is remarkably close in number of base pairs to the combined total of chimpanzee chromosomes 2A and 2B (formerly 12 and 13). Homologues for most of the genes in chimpanzee chromosome 2A are found in one particular half of human chromosome 2, called 2p, and nowhere else, whilst homologues for those in 2B are found in the other half of human chromosome 2, called 2q, and nowhere else. These are facts of nature. Note that on their own they are exceedingly uninteresting. Any intrigue they may provoke is intrigue regarding their explanation; the how, rather than the what. Here we trespass onto the territory of ‘theory’; a good theory would unite the observations just outlined in a single explanatory framework.

Theories are the unifying concepts of science; they connect the dots A common misunderstanding is that there exists a point of graduation at which a theory upgrades to become a fact or a law. Stephen Jay Gould described the role of theories in this misconception as “part of a hierarchy of confidence, running downhill from fact to theory to hypothesis to guess.” To criticise a scientific theory in such trite terms as “it’s a theory, not a fact” is to fail to grasp the concept that theories and facts are not only different things, but different types of things. The statement itself is true, -theories are indeed not facts- but it is a truism, i.e. it adds precisely nothing of any profundity to the discussion. I would go so far as to say that every repetition of “just a theory” directly increases the level of scientific illiteracy in the world.  Comparing facts and theories as unequal rungs in a ladder of certainty is like comparing my location to how I came to be there. In applying this definition to the example of human chromosome 2, a strong theory would present a single core idea that could explain all of the facts detailed. In this case, the strongest theory is that of an ancestral fusion event of chimpanzee chromosomes 2A and 2B. This would explain the extra chromosome inherited from each parent by chimpanzees, the presence of two centromere-like sequences rather than one, the spatial arrangement of gene homologues found on the two chimpanzee chromosomes, and the closeness in the number of base pairs in human chromosome 2 to the combined total of chimpanzee chromosomes 2A and 2B. A useful trick in detecting a misunderstanding of what theories are is listening for a reference to “the evidence for theory X”, rather than to “the facts explained by theory X”.  Another requisite for an idea to be classified as a scientific theory is the power to generate testable hypotheses, i.e. predictions and retrodictions, that give theories another of their defining qualities: the potential for falsification.

In colloquial English, ‘hypothesis’ and ‘theory’ are often used interchangeably, with the result that both have now lost the specific sense of their meanings. A hypothesis is a prediction made prior to collection or analysis of the results, and is confirmed or rejected depending on how well it matches the data. By contrast, theories are judged on their utility in explaining the data. Hypotheses are useful tools to employ, as they provide researchers with specific -and more importantly, testable- claims about the nature of the phenomenon under investigation. An important distinction to make here is that between ‘hypothesis’ and ‘guess’. Hypotheses are generated by theories, which themselves are complex explanatory structures. Thus, hypotheses are not blind, randomly-generated ‘guesses’, but rather specific predictions informed and shaped by a preexisting framework that has shown success in explaining a range of relevant observations. They are the feedback mechanism through which theory and experiment inform one another and effect development in the field.  I don’t want to give the impression that all hypotheses are predictions, however. Whilst a biochemist might make predictions regarding the outcome of an enzymatic reaction, a palaeontologist might make retrodictions regarding the kind of specimens found fossilised in specific geological strata. So whilst predictions relate to events that haven’t yet happened, retrodictions address events that have already happened.

The final scientific category that I felt to be most in need of clarification is that of laws. We commonly hear laws described as forces actively bracketing various occurrences of nature, keeping them within specific boundaries of possibility. Any attempt to understand ‘laws of nature’ at any depth using this definition falls apart almost immediately, however. A more useful way of thinking about laws is as our attempts to construct descriptive models of nature by compartmentalising it into qualitatively distinct boxes. These boxes, labelled ‘thermodynamics’ and ‘Mendelian inheritance’, to take two examples, represent attempts to ascribe generalised descriptive frameworks to the observations contained within. It is easy to see, in the case of Mendelian Inheritance, the distinction between theory and law: the law of Mendelian Inheritance describes the inheritance patterns we observe in diploid organisms, whilst the cellular mechanisms underlying these patterns are explained by various facets of Cell Theory. Hopefully this has clarified that the mistake made by many of those who sympathise with the ‘hierarchy of confidence’ idea -that laws arise when theories become sufficiently supported- is indeed a mistake.

Proof is a category of thought that has no place whatsoever in science (or any other evidential field of enquiry, for that matter), and is hence not a scientific category. The commonly held definition of proof is one of absolute confirmation, of unshakeable certainty. Both of these notions are alien to the scientific process, which generates cumulative banks of data that may be explained by one theory or another. As the wealth of data explained by the prevalent theory grows, so does the conviction with which we can suggest -tentatively, always tentatively- that the explanation proposed may reflect the reality of that component of nature. Keep this in mind next time you find references amongst the confused scribbles of some shoddy science journalism to ‘scientific/clinical proof’. Those skeptical of this exclusion of proof as a meaningful notion in science may instinctively be tempted to adduce scientific theories that explain so many of the observations that they’re commonly taken, for all intents and purposes, to be proven. An example of a theory that has supposedly graduated to a ‘proven’ status could be the idea that microbes are causative factors in some types of disease. Sure, the data explained by this theory is immeasurably more plentiful than those explained by several other theories that come to mind. But there is no magical height, a point of graduation, at which the stack of published data will suddenly constitute ‘proof’ of an idea, and attempts to assign such a height will forever be groundless.

Common speech, it seems, has done a pretty thorough job of eroding and mashing together many of these definitions into a useless, amorphous concept of nothing in particular. If the remaining semantic utility of these categories is not to be lost in this colloquial haze, we as scientists should be especially vigilant in our choice of words, ensuring that we recognise and acknowledge these important distinctions.