SEM Emergency Department Diagnoses Dataset (2017-2018)
melior_sem_ed_diagnoses.Rd
This dataset contains information about diagnoses recorded at the emergency department during healthcare contacts for patients in the SEM cohort from the Melior journal system. The data represents diagnoses linked to emergency department visits during 2017-2018.
Format
A data frame with 551,657 observations and 6 variables:
- contact_id
Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. Original field name: KontaktId
- patient_id
Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias
- diagnosis_type
Character. Type of diagnosis. Values: "Huvuddiagnos"/"huvuddiagnos" (92.3%, primary diagnosis), "Bidiagnos" (6.3%, secondary diagnosis), "bidiagnos tillägg ICD10" (1.4%, secondary diagnosis ICD10 supplement). Original field name: Diagnostyp
- diagnosis_code
Character. Patient diagnosis code (ICD-10 code). 7,730 unique values across the dataset. Original field name: PatientDiagnos_Kod
- diagnosis_description
Character. Description of the diagnosis. 7,598 unique values across the dataset. Contains 285 NA values (0.1%). Original field name: PatientDiagnos_Beskrivning
- diagnosis_modified_date
POSIXct. Date/time when the diagnosis was recorded/modified. Date range: 2013-06-12 to 2020-11-27, with 99.1% of entries from 2017-2018. Original field name: PatientDiagnos_ModifieradDatum
Details
This file was extracted from the Melior electronic health record system. The original filename indicates it contains information about preliminary assessment diagnoses (PreliminärBedömningDiagnos) recorded at the emergency department (PåAkuten) during healthcare contacts (VidVårdkontakt) during 2017-2018. The diagnostic codes follow the ICD-10 coding system.
These diagnoses represent the initial assessment made at the emergency department, which may differ from the final diagnoses recorded after a complete evaluation. The dataset includes 551,657 diagnosis entries across emergency department contacts.
Note
Several fields from the original dataset have been omitted for efficiency and clarity:
AktivitetTyp (activity_type): This field contained two values ("Akutkliniken Läk", 82% and "Akutmottagning Läk", 18%) that merely represented organizational differences between hospitals in how they record emergency department visits, with no clinical significance.
VårdtillfälleFörDiagnos_VardformText (care_form): This field contained a single value ("Slutenvård" = Inpatient) across all records, providing no discriminative information.
Despite the dataset's 2017-2018 focus, it contains a small number of records with modification dates outside this range (0.9% of records), with some dating back to 2013 and others up to 2020
The dataset doesn't include care episode start and end dates that are present in some other diagnosis datasets - this is captured with the link to the care event (contact_id)
Examples
# Load the raw data
library(readr)
library(here)
#> here() starts at /Users/an1583jo/Documents/Forsk/SEM
ed_diagnoses_file <- "MELIOR_PreliminärBedömningDiagnosPåAkutenVidVårdkontakt_2017_2018.csv"
data <- read_delim(here("data", "raw", ed_diagnoses_file),
delim = "|",
locale = locale(encoding = "ISO-8859-1"))
#> Rows: 551657 Columns: 8
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "|"
#> chr (6): KontaktId, AktivitetTyp, Diagnostyp, VårdtillfälleFörDiagnos_Vardf...
#> dbl (1): Alias
#> dttm (1): PatientDiagnos_ModifieradDatum
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.