**DOI:**10.1128/JCM.37.3.848-851.1999

## ABSTRACT

Since it is not possible to state with certainty that contamination has occurred during a nucleic acid amplification assay in the absence of a positive result for a negative control, methods of elementary probability theory are used to illustrate how to identify those runs in which the possibility of contamination should be considered. The use of binomial and Poisson distributions and an analysis of clusters are presented with illustrative examples to demonstrate their use.

In working with target amplification methodologies such as PCR, ligase chain reaction (LCR), transcription-mediated amplification, and nucleic acid sequence-based amplification, the clinical laboratory must strictly adhere to a number of procedures to minimize the risk of contamination (2). These include such safeguards as employing enzymatic inactivation with uracil- N -glycosylase when performing PCR, performing the amplification part of the assay in a different location from the specimen processing, and using positive displacement pipettes. Despite the use of such practices, the possibility of contamination of specimens with amplified products remains. Contamination, producing false-positive assay results, can have important clinical consequences.

Although it is not possible to determine with absolute certainty whether or not contamination has occurred during a given amplification run in the absence of a negative control that yields a positive result, it is possible, by using elementary probability theory, to identify those runs that arouse suspicion of contamination. In order to use these methods, data regarding the tested population should be known. It is important to note that the analyses used in the examples in this work apply only to qualitative assay systems and not to quantitative systems, with the exception of the use of the Poisson approximation of the copy number.

Several examples will be used to illustrate the use of elementary probability theory. In these examples, there are several terms that will be used. These include the following: (i) events, which will be represented by A, B, etc.; (ii) probabilities of events, represented by P(A), P(B), etc. (example: in the toss of a fair coin, if the event H is obtaining heads and the event T is obtaining tails, then P(H) = P(T) = 0.5; (iii) P(X = k) is the probability of obtaining k successes in a number of trials; (iv) the number of ways to choose an unordered subset of size k out of n elements is (_{k}^{n}) = n!/k!(n − k)!, where n! = the product of all positive integers less than or equal to n (thus, 1! = 1, 2! = 1 · 2, 3! = 1 · 2 · 3, and in general n! = 1 · 2 · 3… n ; 0! is defined as equaling 1; (_{k}^{n}) is also the number of combinations of n items taken k at a time); (v) e = 2.71828… .

Example 1: the binomial distribution.On the basis of historical data, 10% of all specimens submitted for LCR for detection of *Chlamydia trachomatis* are positive in a given laboratory. Thus, the probability that an individual specimen is positive for*C. trachomatis* is 0.1 in this laboratory.

An individual run consists of 19 patient specimens and five controls. A run is performed in which all control specimens are within range. Of the 19 patient specimens, 14 are negative and 5 are positive. Is this the result of contamination?

Analysis of example 1.In a case in which there are n independent repetitions of an experiment in which there can be only two outcomes, one with probability p and the other with probability (1 − p) , the resulting density is known as the binomial density, with parameters n and p .

Let *X* = the number of successes in n independent trials, with probability p of success on each trial. Then, *P*(*X = k*) = (_{k}^{n})*p ^{k}*(1 −

*p*)

^{n−k}for k = 0, 1, 2, …,

*n.*Let

*q*= 1 −

*p.*Then,

Interpretation of results for example 1.Since the probability of obtaining such a result due to chance alone is 3.5%, a number of possibilities have to be considered. (i) This could, in fact, be due to chance alone and should be expected to occur approximately one time in each 29 runs. If this assay is run daily, such a result would be expected to occur, on average, approximately once per month. (ii) Contamination may have occurred. If no explanation is apparent, this must be evaluated. Such an evaluation should include performing a run with known negative specimens as well as performing the assay with swab specimens from areas of the instrument and the laboratory to see if a positive result occurs, indicative of contamination. (iii) At least one of the assumptions used to analyze this event may be incorrect. A number of possibilities exist, including the two following possibilities. (a) The probability of a positive result in the population may have increased since it was initially determined. Even a rather modest change, such as a change in the probability of a positive test from 10 to 12%, can have a major impact on the probabilities calculated from equation 1. The resulting calculation with n = 19, p = 0.12, q = 0.88 would be*C. trachomatis* by LCR, are, in fact, made up of subpopulations that differ in the probability of a positive specimen (see example 2). Those specimens that come, for example, from the population of patients in the sexually transmitted disease (STD) clinic may have a probability of >0.1 of being positive, while those specimens from outpatient areas in which screening is performed on asymptomatic patients may have a probability of <0.1 of being positive. If, as occasionally happens, the number of specimens from the STD clinic in a given run is higher than expected, the population that has been sampled does not, in fact have a probability of 0.1 of having a positive result from the assay, but a probability of >0.1.

Example 2: the binomial distribution for specimens that come from two different populations.On the basis of historical data, 20% of specimens submitted for LCR for detection of *C. trachomatis* from the STD clinic are positive and 1% of specimens from other patient locations are positive. In a given run, 9 specimens from the STD clinic and 10 specimens from other sites are tested. Five of these specimens are positive. What is the probability that five or more specimens would be positive on the basis of chance alone?

Analysis of example 2.The site-specific probabilities will often not be known by laboratories, and patient demographics will not be supplied with the specimens. This is particularly likely to be the case in reference laboratories. In this example, by knowing the site-specific probabilities of a positive assay, it is possible to refine the model that was used in example 1. Here, if a total of n specimens are tested, with r specimens from population 1 and (n − r) from population 2, we can calculate the probability of obtaining a stated number, k , of positive results. In population 1, let the probability of a positive result from a specimen be p_{1}. Then, since p_{1} + q_{1} = 1, the probability of a negative result is 1 − p_{1} = q_{1}. Since r specimens are obtained from this population then, from equation 1 we obtain the following:*p*_{2} and the probability of a negative result is 1 − p_{2} = q_{2}. If (n − r) specimens are obtained from this population then the probability of obtaining k_{2} positive results is as follows:_{1} = 0.2, and p_{2} = 0.01. Let A be the event where five or more positive results occur among the 19 patient specimens in the run; then, P(A) = the probability of obtaining five or more positive results in the run. Since the probability of obtaining at least five positive results is 1 − (probability of obtaining zero or one or two or three or four positive results), P(A) = 1 − [P(0) + P(1) + P(2) + P(3) + P(4)].

The only ways of obtaining 0 or 1 or 2 or 3 or 4 positive results are summarized in Table 1. Table2 summarizes the probability of each of the events, calculated from equations 2 and 3, by using the same method of calculation as that used in example 1, and rounded to three decimal places.

Since the events—obtaining a given number of positive results from the STD clinic specimens and obtaining a given number of positive results from the specimens from other sites—are independent events, the probability that both events occur is the product of their respective probabilities. Thus,

Example 3: the use of the Poisson approximation to the binomial distribution.A laboratory performs PCR for *Mycobacterium tuberculosis* on samples of cerebrospinal fluid. Historically, positive results have constituted 3% of the samples tested. In reviewing the results of 100 samples subjected to PCR, what is the probability that there will be at most 2 positive samples?

Analysis of example 3.We have seen, as in example 1, that an exact calculation can be performed by using the binomial expansion. Since this case corresponds to a binomial density with n = 100 and p = 0.03 , we can calculate the probability of this directly, using equation 1, where A represents the event that there are zero, one, or two positive results and P(A) is the probability of obtaining zero, one, or two positive results: P(A) = q^{100} + 100q^{99} p + 9900q^{98}p^{2} = 0.42. For large numbers of samples, if the probability of a positive result is small, it may be more convenient to use the Poisson approximation to the binomial distribution, where λ = np.

Then, the probability of having k positive results in n samples is approximately*X* = 0) = 3^{0}e^{−3}/0! = e^{−3}. The probability of obtaining one positive result is found when k = 1: P(X = 1) = 3^{1}e^{−3}/1! = 3e^{−3}. The probability of obtaining two positive results is found when k = 2: P(X = 2) = 3^{2}e^{−3}/2! = 9e^{−3}/2. Therefore, the probability of obtaining zero, one, or two positive results is

Example 4: further use of the Poisson distribution.A positive control for a PCR assay was made by diluting a known positive sample and then using aliquots of this dilution to make individual positive controls. In testing the positive control by PCR amplification, it was positive (yielded the appropriate PCR product) 90% of the time but was negative (did not yield the PCR product) 10% of the time. Assuming that the amplification was performed properly, that the efficiency of the amplification assay is 100%, and that the efficiency does not vary from run to run, on average how many copies of the target nucleic acid sequence are present in each aliquot of the positive control?

Analysis of example 4.This is an example of a limiting dilution problem. The methods used to solve this have been employed in limiting dilution assays to estimate the number of bacteria or viruses as well as the number of specific cell types in cell culture. When there are several dilutions with which PCR is performed multiple times, it is possible to obtain an estimate of the number of copies of the nucleic acid in an individual aliquot by using a computer program that can be downloaded or used with a Java-capable browser from the World Wide Web (3). Assume that an individual aliquot of the positive control can have any integral number of copies of the target sequence: zero, one, two, three… .

The probability that zero copies of the target sequence are present, P(0), is 0.1, since in 10% of the controls the amplification did not yield the amplified PCR product.

In solving this problem, we can use the Poisson distribution:^{−λ}, meaning that ln (0.1) = −λ; therefore, λ = 2.3.

Interpretation of results for example 4.Since λ is the mean of a Poisson distribution, there are an average of 2.3 copies of the target nucleic acid sequence in an aliquot of the positive control.

Example 5: analysis of clusters.In a run of 19 specimens, there are two positive results. Both the positive results, of note, occur in consecutive specimens, forming a cluster. Is this the result of contamination? What if this occurs with three consecutive specimens instead of two?

Analysis of example 5.This is analogous to the problem, frequently presented in elementary probability texts, of removing n balls from an urn (without replacement) in which r balls are white and (n − r) balls are black. What is the probability that the r white balls are all removed consecutively in such a setting?

The general solution for this problem can be calculated by determining the number of ways that the r consecutive positive specimens can occur during a run of n total specimens and dividing this by the number of ways that r positive specimens can be selected from n total specimens.

The general solution to this problem is

By contrast, the probability that all of the three positive results will occur consecutively would be

Interpretation of results for example 5.A cluster in which both positive results occur consecutively could certainly be due to chance alone and would be anticipated to occur approximately one time in nine. A cluster in which all positive results occur consecutively is unlikely to be due to chance alone if there are three positive results among 19 samples, occurring less than 1 time in 50. Rather than immediately discarding the results of the assay as due to contamination, which is certainly a possible cause of the results, the underlying assumptions must be evaluated. Such an evaluation may demonstrate an alternative explanation.

Examples of alternative explanations include (i) a lack of independence of the three positive samples, such as would be the case for three specimens originating from the same patient or from a patient and two sexual contacts of the patient and (ii) the three positive specimens’ being from a population at very high risk of chlamydial infection while the remainder of the specimens are from a low-risk patient population.

Summary.The use of principles of elementary probability theory can help identify incidents of contamination in nucleic acid amplification assays. In practice, a simple table can be constructed based upon the methods used here and the particular probabilities in the given laboratory, which can be incorporated into the amplification procedure. Such a table would give those situations which, due to the low calculated probability of the situation occurring, require additional review.

The methods described to mathematically evaluate these incidents require knowledge of the binomial and Poisson distributions and simple combinatorics, topics that are typically included in the first portion of a standard college-level probability textbook (1) and should, therefore, be easily accessible to microbiologists and molecular biologists. In addition, knowledge of the underlying assumptions such as the population(s) from which the specimens are obtained and independence of the results of the assay for different specimens are essential to further evaluate those assay runs that arouse suspicion of contamination.

## FOOTNOTES

- Received 7 May 1998.
- Returned for modification 17 November 1998.
- Accepted 25 November 1998.

- Copyright © 1999 American Society for Microbiology