Socialstyrelsen Death Registry Dataset (2017-2019)
socialstyrelsen_death_registry.Rd
This dataset contains information about deaths from Socialstyrelsen's death registry (dödsregistret), including the primary and contributing causes of death, death circumstances, and demographic information about the deceased. The data spans from 2017 to 2019.
Format
A data frame with 25,273 observations and 23 variables:
- patient_id
Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: LopNr
- death_year
Integer. Year of death (2017-2019). Original field name: AR
- death_date
Character. Date of death in format YYYYMMDD. Original field name: DODSDAT
- gender
Integer. Gender code: 1=male, 2=female. Original field name: KON
- age
Integer. Age at death in years. Original field name: alder
- municipality_code
Character. Municipality code where the person was registered at time of death, typically four digits with first two representing county code. Original field name: LK
- icd_version
Integer. Version of ICD (International Classification of Diseases) used for coding. All records in dataset use ICD-10 (value=10). Original field name: ICD
- underlying_cause
Character. The disease or injury that initiated the chain of events leading directly to death (ICD-10 code). Original field name: ULORSAK
- injury_code
Character. Main injury or poisoning type code from ICD-10 chapter 19, used when underlying cause is an external cause. Contains 94.8% NA values. Original field name: KAP19
- contributing_cause_1
Character. First contributing cause of death (ICD-10 code). Original field name: MORSAK1
- contributing_cause_2
Character. Second contributing cause of death (ICD-10 code). 19.7% NA values. Original field name: MORSAK2
- contributing_cause_3
Character. Third contributing cause of death (ICD-10 code). 38.8% NA values. Original field name: MORSAK3
- contributing_cause_4
Character. Fourth contributing cause of death (ICD-10 code). 57.1% NA values. Original field name: MORSAK4
- contributing_cause_5
Character. Fifth contributing cause of death (ICD-10 code). 72.3% NA values. Original field name: MORSAK5
- age_group
Integer. Age group at death in five-year intervals. Original field name: DALDKL5
- autopsy_type
Integer. Type of autopsy if performed: 1=Clinical autopsy, 3=Forensic autopsy. Contains 91.9% NA values. Original field name: DBGRUND1
- hospital_examination
Integer. Examination before death at hospital (value=5). Contains 19.6% NA values. Original field name: DBGRUND5
- alcohol_related
Logical. TRUE if an alcohol-related diagnosis is mentioned on the death certificate (as underlying or contributing cause). Contains 98% NA values. Original field name: ALKOHOL
- diabetes_related
Logical. TRUE if diabetes is mentioned as an underlying or contributing cause of death. Contains 89% NA values. Original field name: DIABETES
- nationality
Character. Nationality of the deceased. Original field name: NATION
- birth_country
Character. Country of birth of the deceased. Original field name: FODLAND
- death_place
Integer. Place of death: 1=Hospital, 2=Special housing, 3=Private home, 4=Other/unknown. Contains 5.2% NA values. Original field name: DODSPL
- death_municipality
Character. Four-digit municipality code where death occurred. Contains 10.1% NA values. Original field name: DOD_KOMMUN
- birth_date_truncated
Character. Birth date in format YYYY-MM. Original field name: FODDATN
Details
This file contains data from Socialstyrelsen's death registry (dödsregistret) for deaths occurring between 2017 and 2019. The dataset includes information about the causes of death (both underlying and contributing), demographic information, and circumstances of death.
The causes of death are coded according to ICD-10 (International Classification of Diseases, 10th revision). The dataset includes the underlying cause of death (the disease or injury that initiated the events leading to death) and up to 5 contributing causes.
Additional information about examination before death, autopsy, alcohol and diabetes-related deaths is also included, though with substantial missing values for some fields.
Note
The dataset covers deaths from 2017 (30.3%), 2018 (38.9%), and 2019 (30.8%)
The gender distribution is almost equal: 50.2% male and 49.8% female
Age at death ranges from 18 to 106 years, with a median of 82 years
Several variables have high percentages of missing values, including injury_code (94.8%), autopsy_type (91.9%), alcohol_related (98%), and diabetes_related (89%)
The contributing_cause fields have progressively higher percentages of missing values, from 0% in contributing_cause_1 to 99.9% in later causes (which are excluded from this dataset)
Most deaths occurred in Sweden (96.5% based on nationality field)
Death locations were primarily hospitals (46.4%), special housing (28.6%), and private homes (17.4%)