Why so many acronyms?

Biomedical informaticians are forever complaining that the language of medicine is inconsistent, imprecise, difficult to understand and potentially a source of medical errors.  Two particular complaints are the variety of ways in which a standard entity may be described and inconsistencies in the way a term or acronym may be interpreted.  Examples of inconsistent description abound.  For example, Ross Koppel has found 40 ways of recording blood pressure and 30 ways of recording tobacco use in an EHR.  A reasonable question is why doesn’t everyone agree on one standard way of referring to blood pressure and tobacco use?

Acronym collisions are a second source of irritation for the informatics community.  Acronym collisions occur when the same acronym is used for different purposes.   For example, PSA is used as a gene symbol for 7 different genes in the human genome.  Similarly, PCR may refer to polymerase chain reaction in molecular biology, phosphocreatine in metabolism, pathological complete remission in oncology and premature contraction in cardiology.  Given the theoretically enormous space of 3 and 4 letter acronyms, isn’t it surprising that different fields have settled on the same abbreviations for such different entities?   The space of pronounceable acronyms is far smaller than the space of all possible 2, 3 or 4 letter combinations, and the birthday paradox applies.  While there may be only a small chance in comparing the acronyms used in two different disciplines that they will choose the same acronym for two different purposes, as the number of acronyms in use in both fields increases, the probability that all the acronyms in one field will avoid all of the acronyms used in the other field becomes small.

Why not avoid the problem altogether and simply prohibit physicians from using acronyms in their medical notes?  To answer this questions we need to consider the forces driving language use in biomedicine.  Writing notes is a significant time burden for both physicians and nurses.  In a recent study, physician spent an average of 20 to 100 minutes per day writing (Hripcsak et al 2011).   Further, the time spent composing notes undoubtedly follows a Zipf-like distribution with a small number of clinicians writing a large fraction of all notes.  For the individuals writing most of the notes, the time spent on writing is substantial, and there is general agreement that clinical staff are under significant time pressure to complete a patient encounter as rapidly as possible.  The time spent documenting the visit counts just as much as time spent with the patient.  Shorthand and acronyms are valuable forms of data compression that reduce the amount of text that needs to be produced and as a consequence, greatly reduces the time spent writing.   This is not unique to biomedicine; natural language is highly compressed.  Pronouns are used all the time to refer to an expression occurring elsewhere in a corpus.  Replacing an entire noun phrase with a short simple pronoun is a great compression technique, but because any given pronoun could potentially refer to a number of noun phrases, there is often uncertainty in resolving the phrase to which a pronoun refers.  Probabilistic rules for anaphora resolution have been developed in the field of computational linguistics, but the process remains error prone, even for human readers.  If the informatics community truly wanted to reduce ambiguity in clinical text, an obvious first step would be to ban the use of pronouns.  This would, of course, incite a general revolt from the clinical staff, but the uncertainty associated with all of the variant nomenclature and acronym collisions pales by comparison the the ambiguity associated with the word “it”.

The most effective compression algorithm depends on the frequency of terms in a particular text stream.  Medicine is not a single homogeneous community.  Rather, there are dozens or hundreds of different communities, each with their own domains of expertise and relevant vocabulary and, of course, their own characteristic set of term frequencies.  In oncology, terms referring to cancer may be quite frequent while they will be less so in other fields of medicine.  The most effective compression algorithm for a clinical oncologist would, therefore, be a strategy the substituted short abbreviations for frequently used long terms, BC for breast cancer, DFS for disease free survival, OS for overall survival  or pCR for pathological complete remission.  In ophthalmology, RE and LE are not used for “right eye” and “left eye” because they conflict with common abbreviations.  Instead, ophthalmology has settled on OD and OS.  Nevertheless, determining the appropriate resolution of “OS” is straightforward in almost all situations.

Why are there so many ways to refer to a single entity?  The frequency of term use differs dramatically between disciplines.  A molecular biologist may use the polymerase chain reaction frequently and achieve significant text compression by abbreviating it PCR.  On the other hand, the molecular biologist might refer to breast cancer with only moderate frequency and therefore would avoid BC as potentially confusing but might still use BRCA.  Taking a population genetics perspective, acronyms can be viewed as alleles.  An allele (or acronym) will increase in frequency if it confers a selective advantage.  In the context of language, a selective advantage would be a large compression efficiency associated with relatively little ambiguity.  Each biomedical community has a distinct set of term frequencies so the optimal set of acronyms will differ between different communities (e.g. BRCA in molecular biology vs. BC in oncology).

Taking the analogy to population genetics a step further, ambiguity occurs at disciplinary boundaries.  When a clinical oncologist use pCR, they almost certainly are referring to pathological complete remission, but when a molecular oncology researcher uses PCR, it may be unclear whether they are referring to pathological remission or polymerase chain reaction.  The acronyms in use in one community are in effect competing with the acronyms in use in neighboring communities.  In population genetics, this is precisely the situation that results in selection for diversity and accelerated genetic drift.  A pathogen coat protein evolves to escape the host immune response.  The host immune response evolves to prevent growth of the pathogen, and both the pathogen coat protein and the host immune response genes end up evolving at a much faster rate than other genes.  As biomedical technology changes, the optimal set of acronyms for each community will drift to adapt to the changes in term frequency introduced by new techniques, and inevitably at the intersection of communities there will be acronym competition accelerating this drift.

Are acronyms a net source of error?  Certainly, it is possible that an acronym from one field might be misinterpreted by a reader from a different field, but compared to the pronoun resolution problem, this is a relatively trivial issue.   How often is OS in ophthalmology confused with OS in oncology?  Acronyms are highly context dependent, and almost always, the appropriate context in which to interpret an acronym is obvious.   Some informaticians have proposed rules forbidding the use of acronyms.  First, this is likely to greatly annoy users because it will force them to generate verbose, time consuming notes when their time would be better spent on other activities.  Second, it is likely to increase the overall error rate by increasing the length and complexity of the terms used in the text.

The use of acronyms may actually prevent errors.  Because acronyms are short, in their domain, common terms, the chance that the author will misspell the acronym is small.  The term for which the acronym substitutes is typically much longer and as a consequence invites a greater possibility of a spelling or typographic error.  How many ways are their to misspell multiple sclerosis or polymerase chain reaction compared to MS or PCR?  On balance, the use of acronyms is likely to reduce the overall error rate in biomedical communication.  Further, because acronyms compress the text and reduce the time needed to generate a sentence, the number of sentences a busy clinician is able to produce increases.  Would you rather that your clinical staff wrote a fixed number of sentences in a verbose acronymless style or more sentences in a condensed style facilitated by the use of acronyms?  From the perspective of the total information content of the note, the latter is clearly preferable, even if it results in occasional term collisions between disciplines.

Hripcsak G, Vawdry DK, Fred MR and Bostwick SB (2011) “Use of electronic clinical documentation: time spent and team interactions” JAMIA 18:112-117.

Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: