Skip to contents

Format

A data frame with 1,064,804 observations and 10 variables:

contact_id

Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. 1,064,804 unique values. Original field name: KontaktId

patient_id

Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias

activity_type

Character. Type of healthcare activity or note. 257 unique values. Most common: "Epikris, tvärprofessionell" (18.7%), "Akutkliniken Läk" (13.1%), "Inskrivning Läk" (6.4%). Original field name: AktivitetTyp

diagnosis_type

Character. Type of diagnosis. 7 unique values with main categories: "Huvuddiagnos"/"huvuddiagnos" (70.0%, primary diagnosis), "Bidiagnos" (28.1%, secondary diagnosis), "bidiagnos tillägg ICD10" (1.6%, secondary diagnosis ICD10 supplement), "Diagnos" (0.2%), "tillägg ICD-10" (<0.1%), "Preliminär diagnos" (<0.1%). Original field name: Diagnostyp

care_episode_start

POSIXct. Start date/time of the care episode for the diagnosis. Date range: 2007-01-27 to 2019-06-08. Distribution by year: 2018 (50.5%), 2017 (47.7%), 2019 (1.3%). 69 NA values (<0.1%). Original field name: VårdtillfälleFörDiagnos_StartDatum

care_episode_end

POSIXct. End date/time of the care episode for the diagnosis. Date range: 2007-01-27 to 2020-07-10. 517,571 NA values (48.6%), primarily for outpatient episodes where this is expected Original field name: VårdtillfälleFörDiagnos_SlutDatum

care_form

Character. Form of care ("Slutenvård" = Inpatient, "Öppenvård" = Outpatient). Distribution: Inpatient (50.4%), Outpatient (49.6%). Original field name: VårdtillfälleFörDiagnos_VardformText

diagnosis_code

Character. Patient diagnosis code (ICD-10 code). 9,199 unique values across the dataset. Original field name: PatientDiagnos_Kod

diagnosis_description

Character. Description of the diagnosis. 9,036 unique values across the dataset. Contains 1,070 NA values (0.1%). Original field name: PatientDiagnos_Beskrivning

diagnosis_modified_date

POSIXct. Date/time when the diagnosis was recorded/modified. Date range: 2017-01-01 to 2019-06-17. Distribution by year: 2018 (51.1%), 2017 (47.0%), 2019 (1.8%).

Melior melior_post_sem_diagnoses_30d() This dataset contains information about diagnoses recorded within 30 days after healthcare contacts for patients in the SEM cohort from the Melior journal system. The data represents diagnoses linked to healthcare contacts during 2017-2018. This file was extracted from the Melior electronic health record system. The original filename indicates it contains information about diagnoses (Diagnoser) recorded within 30 days after healthcare contacts (30DagarEfterVårdkontakt) during 2017-2018. The diagnostic codes follow the ICD-10 coding system. Although the dataset primarily focuses on the 2017-2018 period (with 98.1% of diagnoses from these years), it contains a small number of records with dates outside this range, including some from 2007-2016 (0.6%) and others from 2019 (1.3%).
  • Several fields from the original dataset have been omitted for efficiency:

    • AktivitetTermId: Numeric identifier that almost perfectly corresponded to activity_type (with one exception: 'Bedömning/åtgärd' mapped to multiple IDs)

    • TermId: Numeric identifier that almost perfectly corresponded to diagnosis_type (with one exception: 'Huvuddiagnos' mapped to two different IDs) These fields add file size without contributing significant analytical value.

  • Dates in the dataset that extend into 2020 represent a small fraction of the data (30 records) and appear to be erroneous future dates

  • The diagnosis_type field contains some inconsistencies in capitalization (e.g., "Huvuddiagnos" vs. "huvuddiagnos") which are standardized

  • The diagnosis_description field has 1,070 missing values (0.1% of records) that is not an issue since the relevant information is within the diagnosis_code.

  • POSIXct fields are stored in datetime format

  • Original field names are preserved in the documentation for reference

# Load the raw data #'