The original GIDEON paper (PDF)
A COMPUTER-DRIVEN BAYESIAN MATRIX FOR
THE DIAGNOSIS OF INFECTIOUS DISEASES
Stephen A. Berger 1 and Uri Blackman 2
From the Infectious Diseases Division 1, Tel-Aviv Sourasky Medical
Center; and the Department of Computer Science 2, University of Tel-Aviv,
Tel-Aviv, Israel
Address for correspondence and reprints: Stephen A. Berger, M.D.,
Department of Microbiology, Ichilov Hospital, 6 Weitzman Street,
Tel-Aviv 64239, Israel
Abstract:
Interactive computer data bases representing rates and clinical
probabilities were constructed for 289 infectious diseases; 115
symptoms, signs & laboratory findings; and 205 countries. User
input is processed by a Bayesian matrix, and compatible diagnoses
are presented in order of probability, with interactive
'suggestions' for additional discriminative examinations.
In a multicenter pilot study of 495 blinded cases, correct
diagnoses were included in the program's differential diagnosis
list in 469 (94.7%); and the correct diagnosis was ranked first in
350 (70.7%). The accuracy of diagnosis was highest for parasitic
disease (p .04) and diseases acquired in Africa (p .04) and South
East Asia (p .03); no such differences were noted with respect to
body system and patient age group.
Introduction
One of the unique aspects of infectious disease is it's wide
variety, both in time and place. The specialist practicing in India may
have little or no expertise in Peruvian disease. A colleague in New York
will be called upon to diagnose and treat conditions from Africa, Asia,
South America, Fiji and Papua; and must be familiar with the pathogens
which originate in Texas, Hawaii and Canada. Indeed, even the full time
Infectious Diseases specialist may not be conversant in diseases such as
lagochilascariasis, louping ill and lobomycosis. War, famine, education,
immigration and business travel have contributed to the advent of
specialists in Geographic Medicine and Emporiatrics (Travel medicine).
The 'art' of diagnosis is largely the ability (albeit subconscious)
to rank probabilities based on the incidences of likely diseases and the
chance of encountering given clinical features within each disease. In
theory, Bayesian analysis could be employed to accurately diagnose
disease when given proper input. A multicenter study was undertaken to
test a comprehensive computer driven software program incorporating
worldwide epidemiological and clinical parameters.
Patients and methods
Design of the computer program:
Interactive data bases representing rates and clinical
probabilities were constructed for 289 diseases; 115 symptoms, signs and
laboratory findings; and 205 countries. Reporting statistics published
by the World Health Organization and national health ministries were
used where available, and were supplemented by data for neighboring
countries and previous years where necessary. In cases where the
accuracy of disease reporting was suspect (eg, AIDS in Africa), more
realistic estimates were used.
The data base is limited to infectious diseases (Table 1), and does
not include slow viral illnesses and a number of self-defined and
obvious conditions such as otitis externa, surgical wound infection and
furunculosis. As the program is designed to diagnose clinically-
apparent disease, data regarding asymptomatic carriage or infestation
were adjusted accordingly. Figures regarding the incidence of signs and
symptoms within each specific disease were derived from standard
textbooks and reviews. Clinical and epidemiological data are updated on
a continual basis.
The program user is first requested to indicate the country of
disease origin, and is then presented with a list of 22 basic clinical
parameters, grouped according to body system. A '+' or '-' response to
each of the latter is indicated using any of a variety of computer
keystrokes. A '+' response automatically opens a computer window
requesting further details. Thus, if the user indicates that a rash is
present, he will be asked to further define the nature and distribution
of the skin lesions. An additional window is available for the entry of
laboratory test results (hematological, cerebrospinal, hepatic or renal)
if available.
User input is processed by a Bayesian matrix, and compatible
diagnoses are presented in order of probability in a bar graph and
numerical format. Ancillary clues for all listed diseases are accessed
by specified key strokes: incubation period, clinical hints, geographic
distribution, vector, vehicle, reservoir, etc. Drugs of choice and
dosages for adult or pediatric therapy are listed as well. The
diagnosis list is accompanied by an ancillary screen which indicates
rare (albeit compatible) clinical findings in each disease listed for
the patient in question. An additional interactive screen lists all
additional clinical findings which could improve diagnostic specificity.
Separate computer modules allow the user to study specific diseases
and antiinfective agents without regard to a specific patient. He may,
for example, request a listing for all parasitic diseases acquired in
Togo from the bites of mosquitoes; or of all drugs which interact with
alcohol. In addition to the epidemiological and clinical parameters
outlined above, screens are available which outline the worldwide
distribution of each disease, as well as the current status of AIDS,
malaria, yellow fever and cholera. The therapeutic spectrum, toxicity,
dosage and other characteristics of antiinfective agent are also
available.
Multicenter study:
Questionnaires reflecting the computer input screen were
distributed to six senior full- time infectious disease specialists.
(The authors' own institution was excluded). Participants were requested
to record all positive and negative clinical data for consecutive
patients with established diagnoses. Since the majority of cases were
anticipated to represent disease acquired in the study country (Israel)
a similar number of 'hypothetical' cases acquired abroad was also
elicited. Questionnaires were assigned code numbers and submitted in a
blinded fashion, with diagnoses recorded on a separate sheet. All
results were collated and entered into a data base (dBase III+) prior to
review of the clinical diagnoses. Statistical analysis employed the chi
square test for unpaired proportions.
Results
Four hundred ninety five of 513 cases submitted were suitable for
analysis (Table 2). The computer program accurately identified the
clinical diagnosis in 75.3% of actual cases and in 64.0% of hypothetical
cases (p .009). The clinical diagnosis was included in the computer
differential diagnosis list in 94.7%. The accuracy of diagnosis was
highest for parasitic disease (p .04) and diseases acquired in Africa (p
.04) and South East Asia (p .03); no such differences were noted with
respect to body system and patient age group.
Discussion
The major problem in developing an infectious disease diagnosis
program is difficulty in obtaining reliable and accurate incidence data.
The reporting rate for diseases varies widely between countries, and
among differing diseases within any given country. Furthermore, the
computer program assumes that the patient is a citizen or local resident
of the country in question. Incidence data for tourists and expatriates
may vary from those of the indigenous population. In some cases, the
country of acquisition may not match the country of residence.
Selection of discriminative clinical and laboratory parameters for
the data base is complicated by the fact that individual infections are
quite similar, producing fever, cough, rash, elevated white blood cell
count, etc. Similar abnormalities are also found in a variety of
noninfectious diseases.
An additional difficulty in any diagnostic program is the
reliability of user input. The accuracy of clinical input is only as
good as the accuracy of history taking, physical examination and
laboratory testing. In some instances, more than one disease may be
present, or clinical observations may be factitious or unrelated to the
present illness. In the current study, actual cases were correctly
diagnosed more often than hypothetical cases (eg, acquired overseas),
suggesting relative unfamiliarity of infectious diseases experts with
the clinical features of 'exotic' diseases.
During the period January 1989 - February 1992, Index Medicus
listed 2,063 papers under the subject heading, "Diagnosis, Computer
Assisted," and 7,139 under the heading, "Software"; however, no program
specifically designed for diagnosis in Infectious and Geographical
medicine has been reported in the English language literature to date.
In contrast to other specialties, a complete differential diagnosis list
may be as important as the precise ranking of diagnoses when dealing
with an exotic disease.
Existing computer-driven diagnostic programs have failed to
adequately simulate human intelligence or find widespread practical use
in the field 1-3. Our preliminary study suggests that the program under
study is comprehensive and accurate, and could prove useful in the
diagnosis of infectious and tropical disease. An expanded study among
infectious diseases physicians in the United States will be undertaken
in the near future.
References
1. Aizenstein HJ. Computer systems for medical diagnosis. JAMA 1992;
267:166-170.
2. Waxman HS, Worley WE. Computer-assisted adult medical diagnosis:
subject review and evaluation of a new microcomputer-based system.
Medicine (Baltimore) 1990; 69:125-136.
3. Szolovits P, Patil RS, Schwartz WS. Artificial intelligence in
medical diagnosis. Ann Intern Med 1988; 108: 80-87.
Table 1. Diseases and pathogens included in the data base.
< list of 289 diseases >
Table 2: Evaluation of a computer-driven infectious disease diagnosis
program (percent)
Nature of infection
bacterial parasitic viral fungal total correct *
actual 150 30 100 15 295 222 (75.3)
hypothetical 97 60 33 10 200 128 (64.0)
total 247 90 133 25 495
correct* 186(75.3) 60(66.7) 88(66.2) 16(64.0) 350(70.7)
included in
differential** 236(95.5) 87(96.7) 124(93.2) 22(88.0) 469(94.7)
Country of acquisition
Israel Africa Southeast Europe Latin North Other
Asia America America
Total 308 66 65 24 7 19 6
correct* 205 54 54 16 5 12 4
(66.6) (81.8) (83.1) (66.7) (71.4) (63.2) (66.7)
included in
differential** 295 62 61 23 6 18 4
(95.8) (93.9) (93.8) (95.8) (85.7) (94.7) (66.7)
* concordance between the correct clinical diagnosis and the disease
listed first in the computer-generated differential diagnosis list
** the correct clinical diagnosis is included in the
computer-generated diagnosis list