The Science of Science - Slow thoughts

There’s an emerging scientific discipline dedicated to the study of science itself. The book that introduced me to the field is called, simply, The Science of Science and is written by Dashun Wang and Albert-László Barabási. What follows is an attempt at capturing some fresh insights provoked by reading that book.

To what extent is impact correlated with superior intellectual gifts? It’s not so clear; persistence can be just as important.

Less than one percent of all scientists (given a sample of 15 million individuals and their productivity spanning 15 years) are able to keep a constant pace of publishing at least one paper every year. This small fraction contains the most high-impact researchers, putting out around 40 percent of all papers and 87 percent of all papers with more than one thousand citations.

If the publishing pace of a highly productive scientist lags, even for one single year, the average impact of his or her papers become significantly lower.

It’s quite tricky to compare productivity between disciplines. For example, computer science (which has only existed since 1962) has a publication tradition adapted to the field’s nature of rapid development, where many scientists chose conference proceedings rather than journals as their primary venue to communicate advances.

One anecdotal consequence: when the US News and World Report was to rank the world’s best computer science departments in 2017, the list was so flawed that the Computing Research Association had to put out a special notice calling it “nonsense”. It wasn’t sloppiness that caused this faux-pas, but simply an attempt to measure one field by the same metrics—journal publications recorded by the Web of Science—which had worked well when measuring other fields.

Scientific productivity differs from field to field. One study looking at average output per faculty over a five year period, ranged from over ten papers in the field of chemistry, compared to just over one in the field of history.

Adjusting for the apples vs pears-problem however, an interesting phenomenon becomes evident: Regardless of field, the productivity of individual scientist follows a lognormal distribution.

That means there’s a vast gap between the most productive and the rest of the pack. Which is counter-intuitive since it’s very different from how performance is distributed in many other arenas. Take sports for example. Tiger Woods on a good day beats the competition by only a few strokes, and Flo-Jo outruns her competition by mere fractions of seconds.

The first to observe this phenomenon was William Shockley. That’s the same Shockley who was awarded the 1956 Nobel prize for his part in inventing the transistor and who played an important role in establishing what we now think of as Silicon Valley (also the same Shockley who was eventually isolated from friends, family and colleagues when they couldn’t stomach his advocacy for eugenics.)

Shockley’s explanation for the lognormal distribution was that productivity in science is the aggregate outcome of a funnel with eight hurdles; all of which a good scientist needs to excel at. Shockley’s eight hurdles were:

Identify a good problem
Make progress with it
Recognize a worthwhile result
Make a decision as to when to stop the research and start writing up the results
Write adequately
Profit adequately from critisism
Show determination to submit the paper for publication
Make changes if required by the journal of by the referees

The idea is that a scientist who shines in facing any seven out of these eight hurdles, will still fall far behind the productivity race just by being slightly challenged on the last one of them.

The h-index, which Jorge Hirsch came up with in 2005, has become the dominant way of measuring the impact of individual scientists. A high h-index has proven a somewhat reliable indicator of high achievement, but the reverse is not necessarily always true. Peter Higgs is a living example of a scientist who achieved exceptional impact with just a few seminal papers.

It’s also true that the h-index is tricky to normalize between fields with varying traditions as to co-authorship and citation distribution. A field like high-energy experimental physics, where large collaborations are the norm, will exhibit doped h-index. Molecular biologist in turn, tend to get cited more often than physicists, and hence have an even higher h-idex.

Einstein’s words should be heeded: “Many of the things you can count, don’t count. Many of the things you can’t count, do count.”

“For to everyone who has still more be given, and he will have an abundance. But from the one who has not, even what he has will be taken away”

The above quote appears in the Gospel of Mathew and has lent its name to the so called Mathew effect, which holds that there’s an inherently self-boosting dynamic at play when papers start attaining a critical mass of citation: the more visible they get, the more they surface on other scientist’s radars. The Mathew effect explains why success breeds success, but also means impact as measured by citation doesn’t necessarily capture the full picture.

Age matter: Regardless of what domain we look at or how we define achievement, one’s best work tends to occur around mid-career, or between the ages of 30 and 40. Einstein who had his annus mirabilis at age 26 and Newton who had his at age 23 were rare outliers. Age 19 seems to be the threshold below which no great achievement can be accomplished, whereas there’s no clear upper boundary. In fact many careers have a second productivity peak just before retirement.

All in all the peak age of great minds have increased over time. In the early days of the Nobel prize, two thirds of the recipients did their prize-winning work by age 40. Since then the mean peak age has risen by about six years in total. In combination with the fact that recognition is also gradually being delayed, we’re looking at a scenario where it’ll be more and more common that worthy Nobel laureates will have died before the award committee have time to distinguish them. Or otherwise put: “By the end of this century, the prizewinners’ average age for receiving the award is likely to exceed their projected life expectancy.”

We can’t look at peak age without making a distinction between conceptual and empirical innovators.

Werner Heisenberg is a brilliant example of the former category. At the tender age of 21 he almost flunked his PhD defense, stumbling badly in the subjects of astronomy and experimental physics. Nevertheless, he went on to make revolutionary contributions to the field of quantum physics in the following years, building almost exclusively on a priori logic. Charles Darwin, by contrast, built his scientific breakthroughs on a lifelong accumulation of empirical evidence.

Which of these was the most prominent scientist? Wrong question.

The random impact rule proves that the link between age and impact / creativity can theoretically be decoupled. The first two decades of your career are really not more creative than the following two; you just draw more aces early because you try harder.

Or otherwise put: the central factor is productivity, and younger researchers are statistically more eager to try over and over again, putting out one paper after another. The more papers a scientist publishes, the higher is the impact of her highest impact paper.

Human activity—any human activity—often come in bursts. This pattern has been documented in a number of different activities, from email and phone communication to sexual contacts.

This burstiness help explain what’s known as hot streaks. Hot streaks exist in sports, artistic endeavors and in science as well. You want numbers? Sure thing: 90 percent of scientists experience at least one hot streak, as do 91 percent of artists and 82 percent of film makers.

Happy story: John Fenn was 67 years old when he published a paper identifying a novel electro-spray ion source. He thought it was a major breakthrough but his employer Yale still pushed him out the door. He reluctantly relocated to Virginia Commonwealth University, which was happy to provide Fenn with the lab he needed to continue his investigations. Where he went on a classic hot streak spanning five years, publishing seminal papers which would earn him the 2002 Nobel prize in chemistry.

The theory of hot streaks, coupled with the random impact rule, contradict the general pattern that our best work is likely to happen age 30-40.

(I’ve previously written about how this pattern is different for entrepreneurs).

This is no news but let’s spell it out: the renaissance man is dead. The days are long gone when it was possible for any one individual to know everything, or even to grasp the full body of knowledge in one single field. This increasing “burden of knowledge” means scientists have to narrow their focus in order to reach the frontier of knowledge, and that individual scientists have fantastic grasp of their isolated little piece of the puzzle, but struggle to find common ground that allows them to communicate their insights with fellow scientists. (Again something I’ve previously written about).

It also means that big collaborative projects are on the rise. Which comes at a price. Over the past 60 years, larger teams have garnered ever more citations than smaller ones, but their disruptiveness declines with each additional team member.

This can be explained by the fact that large teams typically source their ideas from recent high-impact work, which means its output is immediately relevant to contemporary scientists. Small teams in contrast, tend to experience a much longer citation delay but their work often persist further into the future.

You’ve probably heard about the six degrees of separation theory. An intra-scientific flavor of that concept is measured by the Erdős number, so called after the Hungarian mathematician Paul Erdős. He wasn’t just prolific, he was also a pioneer in collaborative science; almost all of the more than 1500 papers he published during his lifetime were co-authored with one or more fellow scientists.

If you were a co-author on one of those papers, you’d have an Erdős number of 1. If someone then got the honor of co-authoring with you, that someone would get an Erdős number of 2, and so on and so forth.

Bill Gates has an Erdős number of 4, even though he isn’t even a mathematician. From a network science point of view, that speaks to growing sets of “connected components”, which really just means that the scientific community is getting ever more tightly knitted. The fact that scientists collaborate more and more give rise to the phenomenon poetically referred to as The Invisible College (a term coined by 17th century scientist Robert Boyle).

What’s interesting with the Invisible College, is the peer effects it exerts.

Peer effects can be observed whenever a mediocre students starts to improve from the simple fact that they get to share dorm room with high performers. We’ve all felt the influence—good and bad—from the people we surround ourselves with.

What’s unexpected is that the peer effect is also at work across a connected network, regardless of its geographical spread. For example: whenever a ‘superstar’ scientist unexpectedly dies, her former collaborators typically suffer a lasting 5 – 8 percent decline in productivity rate, regardless of physical proximity.

The same phenomenon could be observed when jewish scientist were kicked out of Nazi Germany: “The loss of a coauthor of average quality reduced a German professor’s productivity by about 13 percent in physics and 16,5 percent in chemistry. […] To be clear, these coauthors were not necessarily colleagues in the same university but were often located in different institutions and cities across Germany. Once again, these results speak to the idea that, at least in science, the ‘invisible college’ is as important as the formal college in which we reside”

William H. Muir was a professor of animal science at Purdue University. He’s famous for an experiment where he grouped highly productive hens into the same cages to try and see if he could maximize their collective egg laying capability.

It turned out that after six generations of super layer-breeding, the chicken in the cage had dwindled from nine to three. The rest had been murdered by their coop-mates. Meanwhile the control group, which consisted of average hens, prospered and ended up outperforming the super hens.

What’s true with animals is true for humans; creating the best team is not a simple matter of recruiting the best, a conundrum known as the too-much-talent effect.

When does “all-star” become “too-much-talent”? The answer, as always, is that it depends. The threshold is very pronounced in team sports like basketball, but less so in for example baseball, which requires less cohesion and coordination.

So how do we know if it makes more sense to expand our team with people who ‘speak the same language’ we do, or rather pick talent that bring new experiences? This is in fact one of the most studied questions in the literature of team science, attracting researchers from disciplines ranging from psychology to economy and sociology.

The answer doesn’t exactly address the question, such as we’ve posed it. Because it turns out that the ‘degree of talent’ in a team isn’t what counts. Instead, the answer has to do with diversity.

A study looking at 2,5 million papers published in the US between 1985 and 2008, shows that teams with four or five authors from different ethnicities had, on average, five to ten percent more citations than those written by authors of the same ethnicity. You get similar patterns when measuring institutional and national diversity, but across all different measures of diversity, ethnicity seems to offer the most significant lift in terms of impact.

High IQ predicts that an individual will perform well on any number of different tasks. This is one of the most replicated results in psychology.

It would be easy to assume then that teams with high average IQ are equally high performing, but that is not true.

Instead what’s more important for team performance are ‘soft’ factors like the average social sensitivity of team members, equality in the distribution of conversational turn-taking, and the proportion of females in the group.

In science and engineering, the last author cited on a paper—also called the corresponding author—often gets as much if not more credit than the first author. (For example, David Wineland was awarded the 2012 Nobel prize in physics for his contributions, while being listed last author of the key paper).

This is not true in disciplines such as sociology or psychology, where being the last author really is the least desirable position. It’s also not true for mathematics, where authors are listed alphabetically, as they are in economics, finance and in particle physics.

These differences matter. Disciplines where authors are listed in order of their contribution tend to suffer from author bloat. A recent survey of 2300 papers in biology, physics and social science found that 33 percent of all listed authors failed to meet the criteria of co-authorship.

Listing authors alphabetically indicates that everyone has made an equally important contribution, which tends to keep lists of authors short. One could think this practice would be conducive to fair play, but in some instances the opposite is true.

One of the most tenacious cases of gender discrimination in academia has traditionally been the “leaky pipeline” which means female economists are twice as likely as their male counterparts to be denied tenure. The reason for this has puzzled scientist for decades, but it’s a mystery no more.

It boils down to prejudice, of course, and here’s how to prove it: It turns out that the disparity between male and female economists applying for tenure track disappears completely when comparing only candidates who solo-authored all their published papers.

As soon as CV’s start to include team-authored papers, the two genders parts ways.

Men gets just as much credit for solo work as for collaborative papers. When women coauthor with other women it’s perceived as less valuable, and when they stoop to coauthor with other men, they ger especially punished.

This pattern is not observable in a discipline like sociology, for example, where the fact that the order of authorship conveys information about contribution, makes it bias far more visible.

Speaking of citations by the way, it’s become far more common for scientists to claim “joint first authorship”. This is especially prominent in high impact biomedical clinical journals. In 2012, 37 percent of all papers in Cell had co-first authorship. The equivalent number for Nature was 33 percent and for Science it was 25 percent. All of these numbers are up from *zero* percent in 1990.

The nominal five years quoted by most American PhD programs is false marketing. Instead, the average time to degree within engineering and life science is closer to eight years (less than a quarter of PhD students are done within five years). And that’s not accounting for the fact that 40 to 50 percent of all students who begin their doctoral education in the US never graduate.

Data suggests that we’ll see more scientific discovery in the next twenty years than in all of history. How can this exponential growth be kept up? Because unlike a colony of bacteria—which also grows exponentially but will always eventually run out of energy—science runs on ideas; a resource that grows the more it is used.

Failure is the blind spot of the science of science. That’s because most failed experiments goes unpublished. There’s been some interesting studies however, arriving at quite counter-intuitive results.

For example, one study looked at junior scientists whose proposals to the NIH fell just above and just below the funding cutoff. Comparing these “near-misses” with “narrow-wins” two things become apparent.

First of all there was more than a ten percent chance that the rejects would disappear entirely from science. Yet most surprisingly, the data also indicated that the near-miss individuals who kept working as scientist systematically outperformed the near-winners in the long run. Failing early, hence, seems to be an indicator of high impact in the long run.

The status of a certain paper—and by extension of its author/s—is deemed by three factors: how soon it starts getting attention from the rest of the community; how long it remains relevant/cited, and its relative fitness, as measured in total number of citations.

The first two of those factors fade away when we’re looking to measure the “ultimate impact” of a paper, meaning how important it’s going to be to science over its entire lifetime. And not only that; when zooming out like this it also becomes apparent that it doesn’t matter whether the paper was published in a high status journal or not.

Just as we shouldn’t judge a book by its cover, we also shouldn’t judge a paper by its journal.

There’s a whole field within the field which explores alternative ways of measuring scientific impact. The blanket term is “alt-metric”.

Alt-metric ranges from counting page views and shares on social media (which turns out to have a very weak correlation with scientific impact), to looking at patents in relation to scientific breakthroughs, which carries some interesting information. A patent with 14 citations is likely to be one hundred times more valuable than a patent with eight citations.

Lastly a note on another one of science-of-science’s blind spots. The discipline relies heavily on methodologies that have been developed within micro-economics over the last three decades. These methods have led to what’s known in that discipline as the Credibility Revolution, and they essentially boil down to two things: formulating assumptions that can be verified in double blind studies, and working with so called “natural experiments”. (Anyone who’s read the classic Thinking Fast and Slow knows what we’re talking about.)

Greater certainty comes at a price however; answers whose causal relationship can be well identified tend to cover a narrower range of topics. Or otherwise put: there’s an inescapable tension between certainty and generalizability. This has led to a growing consensus within the field of economics that many of the big questions in society – like the impact of global warming – cannot be answered by super-reliable techniques.

What that means for the science of science, moving forward, is that we’ll need to embrace methodological pluralism.