NLP analysis of free text notes to investigate COVID19. V1.0 [COVID-19]

  • Research type

    Research Study

  • Full title

    A database and analytics study of free text clinical notes and structured data to investigate phenotype associations with outcomes in patients with COVID-19.

  • IRAS ID

    284078

  • Contact name

    Afzal Chaudhry

  • Contact email

    anc35@cam.ac.uk

  • Sponsor organisation

    Cambridge University Hospital NHS Foundation Trust

  • Clinicaltrials.gov Identifier

    N/A, N/A

  • Duration of Study in the UK

    1 years, 0 months, 1 days

  • Research summary

    The coronavirus SARS-CoV-2, identified following the first reported cases of pneumonia of unknown cause in Wuhan China in December 2019 has become a global pandemic with worldwide cases above 2.5 million and deaths over 150 000. \nThe pathway for these patients admitted to hospital ranges from supportive measures on the ward to deterioration requiring ITU admission. Patients are also at risk of developing complications such as Acute Kidney Injury and blood clots. Identification of the risk factors for these and other outcomes such as the requirement for ventilation remain a challenge and reviewing the clinical data for these patients is critical in the understanding of the relationship between patient characteristics and outcomes. \nThere is data available in structured fields in the EHR, however, this is sometimes incomplete and inaccurate. An assessment of the free text clinical notes provides an opportunity to fill in the gaps and provide a much richer dataset for evaluation. We plan to use Natural Language Processing (NLP) (a field of machine learning that allows computers to analyse human language) to review discharge summaries of patients admitted to hospital with COVID19 and convert free text data into structured data for analysis.\nThe NLP techniques developed by Dr Collier’s team include methods for coding of free texts to SNOMED CT and other biomedical ontologies. These methods, based on statistical machine learning from human annotated texts, have been benchmarked for scientific texts and social media. In this project we intend to adapt these techniques for patient records. \nThe NLP output will be combined with structured data from the EHR and undergo statistical analysis to identify the rates of complications in patients with COVID19 and risk factors associated with these. This may help to guide management decisions by earlier intervention to prevent poor outcomes in these patients.\n\n

  • REC name

    London - Bloomsbury Research Ethics Committee

  • REC reference

    20/HRA/2347

  • Date of REC Opinion

    8 Jun 2020

  • REC opinion

    Favourable Opinion