SEM Laboratory Test Results Dataset (Within 24 Hours of Arrival)
melior_sem_lab_tests_24h.Rd
This dataset contains information about laboratory test results obtained within 24 hours of patient arrival for patients in the SEM cohort from the Melior journal system.
Format
A data frame with 10,961,795 observations and 12 variables:
- contact_id
Character. Unique identifier for each healthcare contact/encounter, serves as a foreign key to link with other datasets. Original field name: KontaktId
- patient_id
Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias
- term_name
Character. Term name for the type of laboratory analysis. Contains a single value: "Kemlab analys" for 100% of records. Original field name: Term_Namn
- lab_test_name
Character. Abbreviated name of the laboratory test. 732 unique values across the dataset. Most common tests: P-Kalium (5.2%), P-Kreatini (4.8%), B-Hemoglob (4.8%), P-Natrium (4.8%), P-Glukos (4.2%). Original field name: Labanalys_Namn
- lab_test_description
Character. Full description of the laboratory test. 822 unique values across the dataset. Most common descriptions: P-Kalium (5.2%), P-Kreatinin (enz) (4.8%), B-Hemoglobin (Hb) (4.8%), P-Natrium (4.8%), P-Glukos (4.2%). Original field name: Labanalys_Beskrivning
- test_result_value
Character. Value of the laboratory test result. 14,022 unique values across the dataset. Stored as character since it may contain non-numeric values such as "positive" or "<3". Original field name: Analyssvar_Varde
- test_result_unit
Character. Unit of measurement for the test result. 80 unique values across the dataset. 1,189,333 NA values (10.8% of records), primarily for qualitative tests. Most common units: mmol/L (38.7%), % (8.8%), kPa (7.8%), µmol/L (6.3%), µkat/L (5.5%). Original field name: Analyssvar_Enhet
- test_result_reference_min
Numeric. Minimum normal reference value for the test. 3,451,693 NA values (31.5% of records). Range: -3 to 1000, with median of 4. Original field name: Analyssvar_ReferensvardeMin
- test_result_reference_max
Numeric. Maximum normal reference value for the test. 2,172,705 NA values (19.8% of records). Range: 0 to 3000, with median of 6.3. Original field name: Analyssvar_ReferensvardeMax
- test_sampling_date
POSIXct. Date and time when the sample was taken. Date range: 2017-01-01 00:20:00 to 2018-12-31 22:10:00. Distribution by year: 2017 (50.2%), 2018 (49.8%). Original field name: Analyssvar_ProvtagningDatum
- sample_type
Character. Type of sample used for the test (e.g., "Blood", "Urine", "Plasma"). Derived from lab_test_name, including handling of "Avd" prefix cases.
- pna_test
Logical. Indicates whether the test was performed at point-of-care (PNA) rather than at the central lab. Identified by "Avd" prefix in lab_test_name.
Details
This file was extracted from the Melior electronic health record system. The original filename indicates it contains information about laboratory analyses (Labanalyser) performed within 24 hours from patient arrival (Inom24TimmarFrånAnkomst). The dataset provides comprehensive laboratory test results for the SEM cohort patients, with nearly 11 million observations covering a wide range of clinical laboratory tests.
The data spans exactly two years (2017-01-01 to 2018-12-31) with an almost equal distribution between both years. This suggests comprehensive and consistent data collection throughout the study period.
Note
The test_result_value is stored as character since it may contain non-numeric values such as "positive" or "<3"
test_result_unit may contain NA values for qualitative tests that don't have units (10.8% of records)
test_result_reference_min and test_result_reference_max may contain NA values when reference ranges are not applicable or not established (31.5% and 19.8% of records, respectively)
Several original columns have been removed due to unclear meaning or constant values:
lab_test_type (Labanalys_AnalysTyp): Almost entirely "A" with only 3 "B" values for test "CYLHYAL"
lab_test_order_type (Labanalys_BestallningTyp): Constant value of 7 for all records
test_result_attribute (Analyssvar_Attribut): Values 0 (73.1%), 3 (26.9%), with trace amounts of 1 and 2
test_result_reference_type (Analyssvar_Referensvarde): Values 3 (92.1%), 0 (7.9%), with trace amounts of 1 and 2
Two derived columns have been added to facilitate analysis:
sample_type: Identifies the biological sample type or test category based on the lab_test_name patterns
pna_test: Flags whether the test was performed at point-of-care (identified by "Avd" prefix)
lab_test_description provides a more complete version of lab_test_name, often with the full name of the test rather than the abbreviated form
Common sample type prefixes include:
P- (Plasma), B- (Blood), S- (Serum), U- (Urine), Csv- (Cerebrospinal fluid)
aB- (Arterial blood), vB- (Venous blood), kB- (Capillary blood)
Ledv- (Joint fluid), F- (Feces), tU- (Timed urine collection)
These can also appear with an "Avd" prefix for point-of-care tests
The most common laboratory tests reflect typical practice in acute care settings: electrolytes (Potassium, Sodium), renal function (Creatinine), blood counts (Hemoglobin), and metabolic parameters (Glucose)
For proper analysis, test_result_value should be processed according to its specific test type and converted to numeric values when applicable