In community-based epidemiological studies, job- and industry-specific 'modules' are often used to systematically obtain details about the subject's work tasks. The module assignment is often made by the interviewer, who may have insufficient occupational hygiene knowledge to assign the correct module. We evaluated, in the context of a case-control study of lymphoid neoplasms in Asia ('AsiaLymph'), the performance of an algorithm that provided automatic, real-time module assignment during a computer-assisted personal interview. AsiaLymph's occupational component began with a lifetime occupational history questionnaire with free-text responses and three solvent exposure screening questions. To assign each job to one of 23 study-specific modules, an algorithm automatically searched the free-text responses to the questions 'job title' and 'product made or services provided by employer' using a list of module-specific keywords, comprising over 5800 keywords in English, Traditional and Simplified Chinese. Hierarchical decision rules were used when the keyword match triggered multiple modules. If no keyword match was identified, a generic solvent module was assigned if the subject responded 'yes' to any of the three solvent screening questions. If these question responses were all 'no', a work location module was assigned, which redirected the subject to the farming, teaching, health professional, solvent, or industry solvent modules or ended the questions for that job, depending on the location response. We conducted a reliability assessment that compared the algorithm-assigned modules to consensus module assignments made by two industrial hygienists for a subset of 1251 (of 11409) jobs selected using a stratified random selection procedure using module-specific strata. Discordant assignments between the algorithm and consensus assignments (483 jobs) were qualitatively reviewed by the hygienists to evaluate the potential information lost from missed questions with using the algorithm-assigned module (none, low, medium, high). The most frequently assigned modules were the work location (33%), solvent (20%), farming and food industry (19%), and dry cleaning and textile industry (6.4%) modules. In the reliability subset, the algorithm assignment had an exact match to the expert consensus-assigned module for 722 (57.7%) of the 1251 jobs. Overall, adjusted for the proportion of jobs in each stratum, we estimated that 86% of the algorithm-assigned modules would result in no information loss, 2% would have low information loss, and 12% would have medium to high information loss. Medium to high information loss occurred for <10% of the jobs assigned the generic solvent module and for 21, 32, and 31% of the jobs assigned the work location module with location responses of 'someplace else', 'factory', and 'don't know', respectively. Other work location responses had ≤8% with medium to high information loss because of redirections to other modules. Medium to high information loss occurred more frequently when a job description matched with multiple keywords pointing to different modules (29-69%, depending on the triggered assignment rule). These evaluations demonstrated that automatically assigned modules can reliably reproduce an expert's module assignment without the direct involvement of an industrial hygienist or interviewer. The feasibility of adapting this framework to other studies will be language- and exposure-specific.
Read full abstract