Abstract
Clinical Text Notes (CTNs) contain physicians’ reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors’ diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned data imputation in a semi-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation with a patient. The features are then used to train a classifier for the diagnosis of certain types of diseases. We report the results of an evaluation of this data augmentation method over three tiers of information that are available to a physician. Our data augmentation method shows a significant positive effect, which is diminished when an increasing number of clinical features, from the examination of the patient and diagnostics, are made available. Our method may be used for augmenting scarce datasets for systems that take decisions based on clinical features that do not include examinations or tests.
Original language | English |
---|---|
Title of host publication | ICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing |
Editors | Mourad Abbas, Abed Alhakim Freihat |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 95-106 |
Number of pages | 12 |
ISBN (Electronic) | 9781959429364 |
Publication status | Published - 2022 |
Event | 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022 - Virtual, Online Duration: 16 Dec 2022 → 17 Dec 2022 |
Publication series
Name | ICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing |
---|
Conference
Conference | 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022 |
---|---|
City | Virtual, Online |
Period | 16/12/22 → 17/12/22 |
Bibliographical note
Funding Information:This work was funded by the Icelandic Strategic Research and Development Programme for Language Technology 2021, grant no. 200106-5301, and with Cloud TPUs from Google’s TPU Research Cloud (TRC).
Publisher Copyright:
© ICNLSP 2022.All rights reserved