Original GIDEON paper
A COMPUTER-DRIVEN BAYESIAN MATRIX FOR THE DIAGNOSIS OF INFECTIOUS DISEASES
Stephen A. Berger 1 and Uri Blackman 2
From the Infectious Diseases Division 1, Tel-Aviv Sourasky Medical Center; and the Department of Computer Science 2, University of Tel-Aviv, Tel-Aviv, Israel
Address for correspondence and reprints: Stephen A. Berger, M.D., Department of Microbiology, Ichilov Hospital, 6 Weitzman Street, Tel-Aviv 64239, Israel
Interactive computer data bases representing rates and clinical probabilities were constructed for 289 infectious diseases; 115 symptoms, signs & laboratory findings; and 205 countries. User input is processed by a Bayesian matrix, and compatible diagnoses are presented in order of probability, with interactive ‘suggestions’ for additional discriminative examinations.
In a multicenter pilot study of 495 blinded cases, correct diagnoses were included in the program’s differential diagnosis list in 469 (94.7%); and the correct diagnosis was ranked first in 350 (70.7%). The accuracy of diagnosis was highest for parasitic disease (p .04) and diseases acquired in Africa (p .04) and South East Asia (p .03); no such differences were noted with respect to body system and patient age group.
One of the unique aspects of infectious disease is it’s wide variety, both in time and place. The specialist practicing in India may have little or no expertise in Peruvian disease. A colleague in New York will be called upon to diagnose and treat conditions from Africa, Asia, South America, Fiji and Papua; and must be familiar with the pathogens which originate in Texas, Hawaii and Canada. Indeed, even the full time Infectious Diseases specialist may not be conversant in diseases such as lagochilascariasis, louping ill and lobomycosis. War, famine, education, immigration and business travel have contributed to the advent of specialists in Geographic Medicine and Emporiatrics (Travel medicine).
The ‘art’ of diagnosis is largely the ability (albeit subconscious) to rank probabilities based on the incidences of likely diseases and the chance of encountering given clinical features within each disease. In theory, Bayesian analysis could be employed to accurately diagnose disease when given proper input. A multicenter study was undertaken to test a comprehensive computer driven software program incorporating worldwide epidemiological and clinical parameters.
Patients and methods
Design of the computer program:
Interactive data bases representing rates and clinical probabilities were constructed for 289 diseases; 115 symptoms, signs and laboratory findings; and 205 countries. Reporting statistics published by the World Health Organization and national health ministries were used where available, and were supplemented by data for neighboring countries and previous years where necessary. In cases where the accuracy of disease reporting was suspect (eg, AIDS in Africa), more realistic estimates were used.
The data base is limited to infectious diseases (Table 1), and does not include slow viral illnesses and a number of self-defined and obvious conditions such as otitis externa, surgical wound infection and furunculosis. As the program is designed to diagnose clinically-apparent disease, data regarding asymptomatic carriage or infestation were adjusted accordingly. Figures regarding the incidence of signs and symptoms within each specific disease were derived from standard textbooks and reviews. Clinical and epidemiological data are updated on a continual basis.
The program user is first requested to indicate the country of disease origin, and is then presented with a list of 22 basic clinical parameters, grouped according to body system. A ‘+’ or ‘-‘ response to
each of the latter is indicated using any of a variety of computer keystrokes. A ‘+’ response automatically opens a computer window requesting further details. Thus, if the user indicates that a rash is present, he will be asked to further define the nature and distribution of the skin lesions. An additional window is available for the entry of laboratory test results (hematological, cerebrospinal, hepatic or renal) if available.
User input is processed by a Bayesian matrix, and compatible diagnoses are presented in order of probability in a bar graph and numerical format. Ancillary clues for all listed diseases are accessed by specified key strokes: incubation period, clinical hints, geographic distribution, vector, vehicle, reservoir, etc. Drugs of choice and dosages for adult or pediatric therapy are listed as well. The diagnosis list is accompanied by an ancillary screen which indicates rare (albeit compatible) clinical findings in each disease listed for the patient in question. An additional interactive screen lists all additional clinical findings which could improve diagnostic specificity.
Separate computer modules allow the user to study specific diseases and antiinfective agents without regard to a specific patient. He may, for example, request a listing for all parasitic diseases acquired in Togo from the bites of mosquitoes; or of all drugs which interact with alcohol. In addition to the epidemiological and clinical parameters outlined above, screens are available which outline the worldwide distribution of each disease, as well as the current status of AIDS, malaria, yellow fever and cholera. The therapeutic spectrum, toxicity, dosage and other characteristics of antiinfective agent are also available.
Questionnaires reflecting the computer input screen were distributed to six senior full- time infectious disease specialists. (The authors’ own institution was excluded). Participants were requested to record all positive and negative clinical data for consecutive patients with established diagnoses. Since the majority of cases were anticipated to represent disease acquired in the study country (Israel) a similar number of ‘hypothetical’ cases acquired abroad was also elicited. Questionnaires were assigned code numbers and submitted in a blinded fashion, with diagnoses recorded on a separate sheet. All results were collated and entered into a data base (dBase III+) prior to review of the clinical diagnoses. Statistical analysis employed the chi square test for unpaired proportions.
Four hundred ninety five of 513 cases submitted were suitable for analysis (Table 2). The computer program accurately identified the clinical diagnosis in 75.3% of actual cases and in 64.0% of hypothetical cases (p .009). The clinical diagnosis was included in the computer differential diagnosis list in 94.7%. The accuracy of diagnosis was highest for parasitic disease (p .04) and diseases acquired in Africa (p .04) and South East Asia (p .03); no such differences were noted with respect to body system and patient age group.
The major problem in developing an infectious disease diagnosis program is difficulty in obtaining reliable and accurate incidence data. The reporting rate for diseases varies widely between countries, and among differing diseases within any given country. Furthermore, the computer program assumes that the patient is a citizen or local resident of the country in question. Incidence data for tourists and expatriates may vary from those of the indigenous population. In some cases, the country of acquisition may not match the country of residence.
Selection of discriminative clinical and laboratory parameters for the data base is complicated by the fact that individual infections are quite similar, producing fever, cough, rash, elevated white blood cell count, etc. Similar abnormalities are also found in a variety of noninfectious diseases.
An additional difficulty in any diagnostic program is the reliability of user input. The accuracy of clinical input is only as good as the accuracy of history taking, physical examination and laboratory testing. In some instances, more than one disease may be present, or clinical observations may be factitious or unrelated to the present illness. In the current study, actual cases were correctly diagnosed more often than hypothetical cases (eg, acquired overseas), suggesting relative unfamiliarity of infectious diseases experts with the clinical features of ‘exotic’ diseases.
During the period January 1989 – February 1992, Index Medicus listed 2,063 papers under the subject heading, “Diagnosis, Computer Assisted,” and 7,139 under the heading, “Software”; however, no program specifically designed for diagnosis in Infectious and Geographical medicine has been reported in the English language literature to date. In contrast to other specialties, a complete differential diagnosis list may be as important as the precise ranking of diagnoses when dealing with an exotic disease.
Existing computer-driven diagnostic programs have failed to adequately simulate human intelligence or find widespread practical use in the field 1-3. Our preliminary study suggests that the program under study is comprehensive and accurate, and could prove useful in the diagnosis of infectious and tropical disease. An expanded study among infectious diseases physicians in the United States will be undertaken in the near future.
- Aizenstein HJ. Computer systems for medical diagnosis. JAMA 1992; 267:166-170.
- Waxman HS, Worley WE. Computer-assisted adult medical diagnosis: subject review and evaluation of a new microcomputer-based system. Medicine (Baltimore) 1990; 69:125-136.
- Szolovits P, Patil RS, Schwartz WS. Artificial intelligence in medical diagnosis. Ann Intern Med 1988; 108: 80-87.
Table 1. Diseases and pathogens included in the data base.
< list of 289 diseases >
Table 2: Evaluation of a computer-driven infectious disease diagnosis program (percent)
Nature of infection
|correct *||186 (75.3)||60 (66.7)||88 (66.2)||16 (64.0)||350 (70.7)|
|included in differential **||236 (95.5)||87 (96.7)||124 (93.2)||22 (88.0)||469 (94.7)|
Country of acquisition
|Israel||Africa||Southeast Asia||Europe||Latin America||North America||Other|
|correct *||205 (66.6)||54 (81.8)||54 (83.1)||16 (66.7)||5 (71.4)||12 (63.2)||4 (66.7)|
|included in differential **||295 (95.8)||62 (93.9)||61 (93.8)||23 (95.8)||6 (85.7)||18 (94.7)||4 (66.7)|
* concordance between the correct clinical diagnosis and the disease listed first in the computer-generated differential diagnosis list
** the correct clinical diagnosis is included in the computer-generated diagnosis list