Monthly Archives: January 2012

Why so many acronyms?

Biomedical informaticians are forever complaining that the language of medicine is inconsistent, imprecise, difficult to understand and potentially a source of medical errors.  Two particular complaints are the variety of ways in which a standard entity may be described and inconsistencies in the way a term or acronym may be interpreted.  Examples of inconsistent description abound.  For example, Ross Koppel has found 40 ways of recording blood pressure and 30 ways of recording tobacco use in an EHR.  A reasonable question is why doesn’t everyone agree on one standard way of referring to blood pressure and tobacco use?

Acronym collisions are a second source of irritation for the informatics community.  Acronym collisions occur when the same acronym is used for different purposes.   For example, PSA is used as a gene symbol for 7 different genes in the human genome.  Similarly, PCR may refer to polymerase chain reaction in molecular biology, phosphocreatine in metabolism, pathological complete remission in oncology and premature contraction in cardiology.  Given the theoretically enormous space of 3 and 4 letter acronyms, isn’t it surprising that different fields have settled on the same abbreviations for such different entities?   The space of pronounceable acronyms is far smaller than the space of all possible 2, 3 or 4 letter combinations, and the birthday paradox applies.  While there may be only a small chance in comparing the acronyms used in two different disciplines that they will choose the same acronym for two different purposes, as the number of acronyms in use in both fields increases, the probability that all the acronyms in one field will avoid all of the acronyms used in the other field becomes small.

Why not avoid the problem altogether and simply prohibit physicians from using acronyms in their medical notes?  To answer this questions we need to consider the forces driving language use in biomedicine.  Writing notes is a significant time burden for both physicians and nurses.  In a recent study, physician spent an average of 20 to 100 minutes per day writing (Hripcsak et al 2011).   Further, the time spent composing notes undoubtedly follows a Zipf-like distribution with a small number of clinicians writing a large fraction of all notes.  For the individuals writing most of the notes, the time spent on writing is substantial, and there is general agreement that clinical staff are under significant time pressure to complete a patient encounter as rapidly as possible.  The time spent documenting the visit counts just as much as time spent with the patient.  Shorthand and acronyms are valuable forms of data compression that reduce the amount of text that needs to be produced and as a consequence, greatly reduces the time spent writing.   This is not unique to biomedicine; natural language is highly compressed.  Pronouns are used all the time to refer to an expression occurring elsewhere in a corpus.  Replacing an entire noun phrase with a short simple pronoun is a great compression technique, but because any given pronoun could potentially refer to a number of noun phrases, there is often uncertainty in resolving the phrase to which a pronoun refers.  Probabilistic rules for anaphora resolution have been developed in the field of computational linguistics, but the process remains error prone, even for human readers.  If the informatics community truly wanted to reduce ambiguity in clinical text, an obvious first step would be to ban the use of pronouns.  This would, of course, incite a general revolt from the clinical staff, but the uncertainty associated with all of the variant nomenclature and acronym collisions pales by comparison the the ambiguity associated with the word “it”.

The most effective compression algorithm depends on the frequency of terms in a particular text stream.  Medicine is not a single homogeneous community.  Rather, there are dozens or hundreds of different communities, each with their own domains of expertise and relevant vocabulary and, of course, their own characteristic set of term frequencies.  In oncology, terms referring to cancer may be quite frequent while they will be less so in other fields of medicine.  The most effective compression algorithm for a clinical oncologist would, therefore, be a strategy the substituted short abbreviations for frequently used long terms, BC for breast cancer, DFS for disease free survival, OS for overall survival  or pCR for pathological complete remission.  In ophthalmology, RE and LE are not used for “right eye” and “left eye” because they conflict with common abbreviations.  Instead, ophthalmology has settled on OD and OS.  Nevertheless, determining the appropriate resolution of “OS” is straightforward in almost all situations.

Why are there so many ways to refer to a single entity?  The frequency of term use differs dramatically between disciplines.  A molecular biologist may use the polymerase chain reaction frequently and achieve significant text compression by abbreviating it PCR.  On the other hand, the molecular biologist might refer to breast cancer with only moderate frequency and therefore would avoid BC as potentially confusing but might still use BRCA.  Taking a population genetics perspective, acronyms can be viewed as alleles.  An allele (or acronym) will increase in frequency if it confers a selective advantage.  In the context of language, a selective advantage would be a large compression efficiency associated with relatively little ambiguity.  Each biomedical community has a distinct set of term frequencies so the optimal set of acronyms will differ between different communities (e.g. BRCA in molecular biology vs. BC in oncology).

Taking the analogy to population genetics a step further, ambiguity occurs at disciplinary boundaries.  When a clinical oncologist use pCR, they almost certainly are referring to pathological complete remission, but when a molecular oncology researcher uses PCR, it may be unclear whether they are referring to pathological remission or polymerase chain reaction.  The acronyms in use in one community are in effect competing with the acronyms in use in neighboring communities.  In population genetics, this is precisely the situation that results in selection for diversity and accelerated genetic drift.  A pathogen coat protein evolves to escape the host immune response.  The host immune response evolves to prevent growth of the pathogen, and both the pathogen coat protein and the host immune response genes end up evolving at a much faster rate than other genes.  As biomedical technology changes, the optimal set of acronyms for each community will drift to adapt to the changes in term frequency introduced by new techniques, and inevitably at the intersection of communities there will be acronym competition accelerating this drift.

Are acronyms a net source of error?  Certainly, it is possible that an acronym from one field might be misinterpreted by a reader from a different field, but compared to the pronoun resolution problem, this is a relatively trivial issue.   How often is OS in ophthalmology confused with OS in oncology?  Acronyms are highly context dependent, and almost always, the appropriate context in which to interpret an acronym is obvious.   Some informaticians have proposed rules forbidding the use of acronyms.  First, this is likely to greatly annoy users because it will force them to generate verbose, time consuming notes when their time would be better spent on other activities.  Second, it is likely to increase the overall error rate by increasing the length and complexity of the terms used in the text.

The use of acronyms may actually prevent errors.  Because acronyms are short, in their domain, common terms, the chance that the author will misspell the acronym is small.  The term for which the acronym substitutes is typically much longer and as a consequence invites a greater possibility of a spelling or typographic error.  How many ways are their to misspell multiple sclerosis or polymerase chain reaction compared to MS or PCR?  On balance, the use of acronyms is likely to reduce the overall error rate in biomedical communication.  Further, because acronyms compress the text and reduce the time needed to generate a sentence, the number of sentences a busy clinician is able to produce increases.  Would you rather that your clinical staff wrote a fixed number of sentences in a verbose acronymless style or more sentences in a condensed style facilitated by the use of acronyms?  From the perspective of the total information content of the note, the latter is clearly preferable, even if it results in occasional term collisions between disciplines.

Hripcsak G, Vawdry DK, Fred MR and Bostwick SB (2011) “Use of electronic clinical documentation: time spent and team interactions” JAMIA 18:112-117.


The Stability of Google

With algorithms drawing on billions of websites and petabytes of data, one might assume that Google’s search rankings are a relatively stable information resource.  An anecdote from today’s news suggests that caution is in order.  On January 6, 2012, a press release was issued that was picked up by a number of news organizations including UPI, CBS, Yahoo and  The press release refers to a paper to appear in the Journal of Women’s Health (impact factor 1.5) that describes a small study (36 women) on the effect of red wine consumption on plasma estrogen and androgen levels.  The press release does not mention the numerous epidemiology studies showing no difference between red and white wine consumption; they both increase the risk of breast cancer even at modest levels of consumption (e.g. 1-5).

On January 7th, one day after the press release, a Google web search for “wine breast cancer risk” returned 8.3 million hits.  9 of the top 10 and 16 of the top 20 highest rank hits refer to the study described in the press release.  The 7th ranked hit is a web page from that acknowledges an increased risk of breast cancer with heavy alcohol consumption but suggests that moderate wine consumption may be OK. The 13th and 17th ranked hits are earlier news stories reporting increased risk with alcohol consumption, and the 20th ranked hit is a Wikipedia page with a reasonable discussion of the increased risk of cancer associated with alcohol consumption.  Note that these are not the Google News ranks, these are the main Google web search results.

The PR firm employed by the research institute may be patting themselves on the back about their great success in promoting the visibility of their client, and the press release may have struck a nerve because many of us enjoy a glass of good wine and would really like to believe it will improve our health.   Unfortunately, many of the news articles have titles like “Red wine may reduce breast cancer risk” or “New Study Shows Red Wine May Reduce Cancer Risk In Women”.   A women seeing one of these news articles, might follow up with a Google search and conclude that a major study had been released causing a paradigm shift in the field and that consumption of moderate amounts of red wine would be protective against breast cancer risk.   The study in question is small, the article has not appeared in print, the journal in which it will appear has a modest impact factor, and the findings refer only to biochemical changes, not epidemiologically validated cancer risk.  There remains a substantial body of epidemiology showing that red wine consumption is associated with increased risk for breast cancer in human populations.  There is no paradigm shift.  It will be interesting to see how Google’s search results evolve over the coming weeks and months, but at present they are quite misleading.  Given the frequency with which physicians and nurses consult Google, the fickle nature of Google as an information source is worrisome.

Note: while this blog refers to Google searches, and also rank links to news articles about this press release highly in their search results.   Seems like everyone is losing sight of the need to provide reliable information in the race to be first with their search results.

1)      Willet et al. (1987) NEJM 316(19):1174
2)     Allen et al. J Natl Cancer Inst (2009) 101 (5): 296-305.
3)     Newcomb et al. Cancer Epid Biomarkers 18(3):1007-10.
4)     Li et al. European Journal of Cancer (2009) 45(5):843-50.
5)     Zhang et al. Am J Epidemiol (2009) 165 (6): 667–76

David J. States MD PhD FACMI
Chief Scientific Officer
OncProTech LLC