Semi-supervised Automated Clinical Coding Using International Classification of Diseases

Hlynur D. Hlynsson, Steindór Ellertsson, Jón F. Daðason, Emil L. Sigurdsson, Hrafn Loftsson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Clinical Text Notes (CTNs) contain physicians’ reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors’ diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned data imputation in a semi-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation with a patient. The features are then used to train a classifier for the diagnosis of certain types of diseases. We report the results of an evaluation of this data augmentation method over three tiers of information that are available to a physician. Our data augmentation method shows a significant positive effect, which is diminished when an increasing number of clinical features, from the examination of the patient and diagnostics, are made available. Our method may be used for augmenting scarce datasets for systems that take decisions based on clinical features that do not include examinations or tests.

Original languageEnglish
Title of host publicationICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing
EditorsMourad Abbas, Abed Alhakim Freihat
PublisherAssociation for Computational Linguistics (ACL)
Pages95-106
Number of pages12
ISBN (Electronic)9781959429364
Publication statusPublished - 2022
Event5th International Conference on Natural Language and Speech Processing, ICNLSP 2022 - Virtual, Online
Duration: 16 Dec 202217 Dec 2022

Publication series

NameICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing

Conference

Conference5th International Conference on Natural Language and Speech Processing, ICNLSP 2022
CityVirtual, Online
Period16/12/2217/12/22

Bibliographical note

Funding Information:
This work was funded by the Icelandic Strategic Research and Development Programme for Language Technology 2021, grant no. 200106-5301, and with Cloud TPUs from Google’s TPU Research Cloud (TRC).

Publisher Copyright:
© ICNLSP 2022.All rights reserved

Fingerprint

Dive into the research topics of 'Semi-supervised Automated Clinical Coding Using International Classification of Diseases'. Together they form a unique fingerprint.

Cite this