Post-SEM Patient Diagnoses Dataset (After Discharge through Dec 31, 2019)
melior_post_sem_diagnoses_all.Rd
This dataset contains information about diagnoses recorded after discharge for patients in the SEM cohort from the Melior journal system. The data represents diagnoses with dates up to December 31, 2019.
Format
A data frame with 11,681,602 observations and 8 variables:
- contact_id
Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. Original field name: KontaktId
- patient_id
Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias
- activity_type
Character. Type of healthcare activity or note. 287 unique values. Most common: "Epikris, tvärprofessionell" (19.7%), "Akutkliniken Läk" (10.3%), "Mott Ögon Läk" (5.2%). Original field name: AktivitetTyp
- diagnosis_type
Character. Type of diagnosis. 8 unique values with main categories: "Huvuddiagnos"/"huvuddiagnos" (65.5%, primary diagnosis), "Bidiagnos" (32.7%, secondary diagnosis), "bidiagnos tillägg ICD10" (1.5%, secondary diagnosis ICD10 supplement). Original field name: Diagnostyp
- diagnosis_code
Character. Patient diagnosis code (ICD-10 code). 12,902 unique values across the dataset. Original field name: PatientDiagnos_Kod
- diagnosis_description
Character. Description of the diagnosis. 12,421 unique values across the dataset. Contains 10,386 NA values (0.1%). Original field name: PatientDiagnos_Beskrivning
- diagnosis_modified_date
POSIXct. Date/time when the diagnosis was recorded/modified. Date range: 2017-01-01 to 2019-12-31. Distribution by year: 2019 (42.1%), 2018 (40.7%), 2017 (17.2%). Original field name: PatientDiagnos_ModifieradDatum
- care_form
Character. Form of care ("Slutenvård" = Inpatient, "Öppenvård" = Outpatient). Distribution: Outpatient (54.8%), Inpatient (45.2%). Original field name: VårdtillfälleFörDiagnos_VardformText
Details
This file was extracted from the Melior electronic health record system. The original filename indicates it contains information about patient diagnoses (PatientDiagnoser) performed after discharge (Efter_Vardkontakt) with discharge dates (UtskrivningDatum) up to December 31, 2019 (Till_20191231). The diagnostic codes follow the ICD-10 coding system.
Note
Unlike some other diagnosis datasets, this dataset does not include care episode start and end dates
The care_form field contains two values: "Öppenvård" (Outpatient, 54.8%) and "Slutenvård" (Inpatient, 45.2%)
The diagnosis_description field has 10,386 missing values (0.1% of the dataset) but since the diagnosis_type has no missing this is likely no issue.
Several fields from the original dataset have been omitted for efficiency:
AktivitetTermId: Numeric identifier that in all but one case corresponded to activity_type
TermId: Numeric identifier that in all but one case corresponded to diagnosis_type
This dataset represents diagnoses recorded after the initial SEM cohort contact/discharge and tracks them through the end of 2019