Part 2:
In this case, it was the content more than any spreadsheet features themselves that indicated something was afoot:
The Anomaly: Strange Demographic Responses
As mentioned above, students in this study were asked to report their demographics. Here is a screenshot of the posted original materials, indicating exactly what they were asked and how:
We retrieved the data from the OSF (OSF | The Moral Virtue of Authenticity), where Gino (or someone using her credentials) posted it in 2015. The anomaly in this dataset involves how some students answered Question #6: “Year in School.”
The screenshot below shows a portion of the dataset. In the “yearSchool” column, you can see that students approached this “Year in School” question in a number of different ways. For example, a junior might have written “junior”, or “2016” or “class of 2016” or “3” (to signify that they are in their third year). All of these responses are reasonable.
A less reasonable response is “Harvard”, an incorrect answer to the question. It is difficult to imagine many students independently making this highly idiosyncratic mistake. Nevertheless, the data file indicates that 20 students did so. Moreover, and adding to the peculiarity, those students’ responses are all within 35 rows (450 through 484) of each other in the posted dataset:
So, it seems somebody did a copy/paste falsification of the data to make the results what they wanted… and they were super-lazy in falsifying.
Every “normal” observation is represented as a blue dot, whereas the 20 “Harvard” observations are represented as red X’s:
Indeed, we should note that while for all four studies covered in this series we found evidence of data tampering, we do not believe (in the least) that we’ve identified all of the tampering that happened within these studies. Without access to the original (un-tampered) data files – files we believe Harvard had access to – we can only identify instances when the data tamperer slipped up, forgetting to re-sort here, making a copy-paste error there. There is no reason (at all) to expect that when a data tamperer makes a mistake when changing one thing in a database, that she makes the same mistake when changing all things in that database.