Post-SEM Diagnoses Dataset (30 Days After 2017-2018 Contacts)
melior_post_sem_diagnoses_30d.Rd
Format
A data frame with 1,064,804 observations and 10 variables:
- contact_id
Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. 1,064,804 unique values. Original field name: KontaktId
- patient_id
Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias
- activity_type
Character. Type of healthcare activity or note. 257 unique values. Most common: "Epikris, tvärprofessionell" (18.7%), "Akutkliniken Läk" (13.1%), "Inskrivning Läk" (6.4%). Original field name: AktivitetTyp
- diagnosis_type
Character. Type of diagnosis. 7 unique values with main categories: "Huvuddiagnos"/"huvuddiagnos" (70.0%, primary diagnosis), "Bidiagnos" (28.1%, secondary diagnosis), "bidiagnos tillägg ICD10" (1.6%, secondary diagnosis ICD10 supplement), "Diagnos" (0.2%), "tillägg ICD-10" (<0.1%), "Preliminär diagnos" (<0.1%). Original field name: Diagnostyp
- care_episode_start
POSIXct. Start date/time of the care episode for the diagnosis. Date range: 2007-01-27 to 2019-06-08. Distribution by year: 2018 (50.5%), 2017 (47.7%), 2019 (1.3%). 69 NA values (<0.1%). Original field name: VårdtillfälleFörDiagnos_StartDatum
- care_episode_end
POSIXct. End date/time of the care episode for the diagnosis. Date range: 2007-01-27 to 2020-07-10. 517,571 NA values (48.6%), primarily for outpatient episodes where this is expected Original field name: VårdtillfälleFörDiagnos_SlutDatum
- care_form
Character. Form of care ("Slutenvård" = Inpatient, "Öppenvård" = Outpatient). Distribution: Inpatient (50.4%), Outpatient (49.6%). Original field name: VårdtillfälleFörDiagnos_VardformText
- diagnosis_code
Character. Patient diagnosis code (ICD-10 code). 9,199 unique values across the dataset. Original field name: PatientDiagnos_Kod
- diagnosis_description
Character. Description of the diagnosis. 9,036 unique values across the dataset. Contains 1,070 NA values (0.1%). Original field name: PatientDiagnos_Beskrivning
- diagnosis_modified_date
POSIXct. Date/time when the diagnosis was recorded/modified. Date range: 2017-01-01 to 2019-06-17. Distribution by year: 2018 (51.1%), 2017 (47.0%), 2019 (1.8%).
Melior
melior_post_sem_diagnoses_30d()
This dataset contains information about diagnoses recorded within 30 days after
healthcare contacts for patients in the SEM cohort from the Melior journal system.
The data represents diagnoses linked to healthcare contacts during 2017-2018.
This file was extracted from the Melior electronic health record system.
The original filename indicates it contains information about diagnoses (Diagnoser)
recorded within 30 days after healthcare contacts (30DagarEfterVårdkontakt) during 2017-2018.
The diagnostic codes follow the ICD-10 coding system.
Although the dataset primarily focuses on the 2017-2018 period (with 98.1% of diagnoses
from these years), it contains a small number of records with dates outside this range,
including some from 2007-2016 (0.6%) and others from 2019 (1.3%).
Several fields from the original dataset have been omitted for efficiency:
AktivitetTermId: Numeric identifier that almost perfectly corresponded to activity_type (with one exception: 'Bedömning/åtgärd' mapped to multiple IDs)
TermId: Numeric identifier that almost perfectly corresponded to diagnosis_type (with one exception: 'Huvuddiagnos' mapped to two different IDs) These fields add file size without contributing significant analytical value.
Dates in the dataset that extend into 2020 represent a small fraction of the data (30 records) and appear to be erroneous future dates
The diagnosis_type field contains some inconsistencies in capitalization (e.g., "Huvuddiagnos" vs. "huvuddiagnos") which are standardized
The diagnosis_description field has 1,070 missing values (0.1% of records) that is not an issue since the relevant information is within the diagnosis_code.
POSIXct fields are stored in datetime format
Original field names are preserved in the documentation for reference