Skip to contents

This dataset contains information about diagnoses recorded within 5 years prior to healthcare contacts for patients in the SEM cohort from the Melior journal system. The data represents diagnoses linked to healthcare contacts during 2017-2018.

Usage

melior_pre_sem_diagnoses()

Format

A data frame with 23,631,961 observations and 9 variables:

contact_id

Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. Original field name: KontaktId

patient_id

Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias

activity_type

Character. Type of healthcare activity or note. 355 unique values. Most common: "Akutkliniken Läk" (11.0%), "Epikris" (10.2%), "Epikris, tvärprofessionell" (9.1%). Original field name: AktivitetTyp

diagnosis_type

Character. Type of diagnosis. 8 unique values. Most common: "Huvuddiagnos" (60.5%), "Bidiagnos" (31.9%), "huvuddiagnos" (5.9%), "bidiagnos tillägg ICD10" (1.4%). Original field name: Diagnostyp

care_episode_start

POSIXct. Start date/time of the care episode for the diagnosis. Date range: 1971-01-01 to 2020-06-05. Distribution by year: 2017 (26.3%), 2016 (20.0%), 2015 (16.7%), 2018 (15.6%), 2014 (14.5%), 2013 (6.9%). 235 NA values (<0.01%). Original field name: VårdtillfälleFörDiagnos_StartDatum

care_episode_end

POSIXct. End date/time of the care episode for the diagnosis. Date range: 2007-01-27 to 2020-09-21. Distribution by year: 2017 (27.4%), 2016 (19.5%), 2018 (16.8%), 2015 (16.0%), 2014 (13.6%), 2013 (6.5%). 12,393,648 NA values (52.4%), primarily for outpatient episodes where end dates are not typically recorded. Original field name: VårdtillfälleFörDiagnos_SlutDatum

care_form

Character. Form of care. 2 unique values: "Öppenvård" (Outpatient, 54.6%), "Slutenvård" (Inpatient, 45.4%). Original field name: VårdtillfälleFörDiagnos_VardformText

diagnosis_code

Character. Patient diagnosis code (ICD-10 code). 15,453 unique values across the dataset. Original field name: PatientDiagnos_Kod

diagnosis_description

Character. Description of the diagnosis. 14,286 unique values across the dataset. 47,263 NA values (0.2%). Original field name: PatientDiagnos_Beskrivning

diagnosis_modified_date

POSIXct. Date/time when the diagnosis was recorded/modified. Date range: 2013-04-22 to 2018-12-31. Distribution by year: 2017 (26.6%), 2016 (20.0%), 2015 (16.6%), 2018 (16.1%), 2014 (14.4%), 2013 (6.2%). Original field name: PatientDiagnos_ModifieradDatum

Source

Melior

Details

This file was extracted from the Melior electronic health record system. The original filename indicates it contains information about diagnoses (Diagnoser) recorded within 5 years prior to healthcare contacts (5ÅrFöreVårdkontakt) during 2017-2018. The diagnostic codes follow the ICD-10 coding system.

This dataset provides a comprehensive view of patients' diagnostic history within 5 years before their inclusion in the SEM cohort, capturing both inpatient and outpatient diagnoses. The large number of records (23.6 million) reflects the substantial healthcare utilization of this patient population in the years preceding their SEM cohort contact.

Note

  • Several fields from the original dataset have been omitted for efficiency:

    • AktivitetTermId: Numeric identifier that almost perfectly corresponded to activity_type

    • TermId: Numeric identifier that almost perfectly corresponded to diagnosis_type These fields add file size without contributing significant analytical value.

  • The care_form field shows that 54.6% of diagnoses were from outpatient care and 45.4% from inpatient care

  • The care_episode_end field has a high proportion of missing values (52.4%), which is expected for outpatient episodes that typically don't have formal end dates

  • The diagnosis_type field contains inconsistencies in capitalization (e.g., "Huvuddiagnos" vs. "huvuddiagnos") which are standardized to lowercase in the processed data

  • There are some anomalous date values outside the expected range in care_episode_start (e.g., one record from 1971), which appear to be data entry errors

  • Most observations are from the 5-year period 2013-2018 as expected, but there are a small number of records with dates outside this range

  • POSIXct fields are stored in datetime format

  • Original field names are preserved in the documentation for reference

  • Care episode durations in days can be calculated during analysis from care_episode_start and care_episode_end

  • Standard translations of care_form values are:

    • "Slutenvård" = "Inpatient"

    • "Öppenvård" = "Outpatient"

Examples