Skip to contents

This dataset contains information about diagnoses recorded during healthcare contacts for patients in the SEM cohort from the Melior journal system. The data represents diagnoses linked to healthcare contacts during 2017-2018.

Usage

melior_sem_diagnoses()

Format

A data frame with 1,466,052 observations and 9 variables:

contact_id

Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. Original field name: KontaktId

patient_id

Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias

activity_type

Character. Type of healthcare activity or note. 157 unique values. Most common: "Epikris, tvärprofessionell" (36.7%), "Akutkliniken Läk" (30.8%), "Inskrivning Läk" (8.8%). Original field name: AktivitetTyp

diagnosis_type

Character. Type of diagnosis. 7 unique values with main categories: "Huvuddiagnos"/"huvuddiagnos" (66.4%, primary diagnosis), "Bidiagnos" (31.6%, secondary diagnosis), "bidiagnos tillägg ICD10" (2.1%, secondary diagnosis ICD10 supplement), "Diagnos" (<0.1%), "tillägg ICD-10" (<0.1%), "Preliminär diagnos" (<0.1%). Original field name: Diagnostyp

care_episode_start

POSIXct. Start date/time of the care episode for the diagnosis. Date range: 2013-02-11 to 2020-04-22. Distribution by year: 2017 (50.2%), 2018 (49.5%), 2016 (0.2%). Original field name: VårdtillfälleFörDiagnos_StartDatum

care_episode_end

POSIXct. End date/time of the care episode for the diagnosis. Date range: 2013-06-02 to 2020-10-22. Contains 281 NA values (<0.1%). Distribution by year: 2018 (49.9%), 2017 (49.0%), 2019 (1.1%). Original field name: VårdtillfälleFörDiagnos_SlutDatum

diagnosis_code

Character. Patient diagnosis code (ICD-10 code). 10,786 unique values across the dataset. Original field name: PatientDiagnos_Kod

diagnosis_description

Character. Description of the diagnosis. 10,484 unique values across the dataset. Contains 1,914 NA values (0.1%). Original field name: PatientDiagnos_Beskrivning

diagnosis_modified_date

POSIXct. Date/time when the diagnosis was recorded/modified. Date range: 2013-05-31 to 2020-12-03. Distribution by year: 2018 (49.9%), 2017 (48.5%), 2019 (1.4%). Original field name: PatientDiagnos_ModifieradDatum

Source

Melior

Details

This file was extracted from the Melior electronic health record system. The original filename indicates it contains information about diagnoses (Diagnoser) recorded during healthcare contacts (VidVårdkontakt) during 2017-2018. The diagnostic codes follow the ICD-10 coding system.

Although the dataset primarily focuses on the 2017-2018 period (with 99.7% of diagnoses from these years), it contains a small number of records with dates outside this range, including some from 2013-2016 (0.3%) and others from 2019-2020 (<0.1%).

Note

  • Several fields from the original dataset have been omitted for efficiency:

    • AktivitetTermId: Numeric identifier that didn't provide additional clinical information beyond what is captured in activity_type

    • VårdtillfälleFörDiagnos_VardformText (care_form): This field contained a single value ("Slutenvård" = Inpatient) across all records (100%), providing no discriminative information. All records in this dataset are from inpatient care.

  • The dataset contains a few anomalous dates with years 2028-2029 in care_episode_end that have been corrected in the processing script by subtracting 10 years, assuming a data entry error

  • The diagnosis_type field contains inconsistencies in capitalization (e.g., "Huvuddiagnos" vs. "huvuddiagnos") which are standardized to lowercase in the processed data

  • The diagnosis_description field has 1,914 missing values (0.1% of records)

  • POSIXct fields are stored in datetime format

  • Original field names are preserved in the documentation for reference

Examples