Spreadsheet problems in Biology
“Autocorrection” of gene names
Some genes have been renamed because Microsoft Excel reads them as dates1. From Humble Pi,
The gene has the catchy name of MARCH5, and if you think that looks like a date, then you can already see where this is going. Over on your first chromosome, the gene SEP15 is busy making some other important protein. Type those gene names into Excel and they’ll transform into 5-Mar and 15-Sep, encoded as 3/5/2019 and 9/15/2019 (or whatever the current year is) in the Formula Bat of the US version. All mention of MARCH5 and SEP15 has been obliterated. …
In 2016 three intrepid researchers in Melbourne analyzed eighteen journals that had published genome research between 2005 and 2015 and found a total of 35,175 publicly available Excel files associated with 3,597 different research papers. They wrote a program to autodownload the Excel files, then scan them for lists of gene names, keeping an eye out for where they had been “autocorrected” by Excel into something else. After checking through the offending files manually to remove false positives, the researchers were left with 987 Excel spreadsheets (from 704 different pieces of research) that had gene-name errors introduced by Excel. In their sample, they found that 19.6 percent of gene research crunched in Excel contained errors.