Skip to contents

This dataset contains information about deaths from Socialstyrelsen's death registry (dödsregistret), including the primary and contributing causes of death, death circumstances, and demographic information about the deceased. The data spans from 2017 to 2019.

Usage

socialstyrelsen_death_registry()

Format

A data frame with 25,273 observations and 23 variables:

patient_id

Integer. Patient pseudonym identifier, serves as a foreign key to link with other patient-level data. Original field name: LopNr

death_year

Integer. Year of death (2017-2019). Original field name: AR

death_date

Character. Date of death in format YYYYMMDD. Original field name: DODSDAT

gender

Integer. Gender code: 1=male, 2=female. Original field name: KON

age

Integer. Age at death in years. Original field name: alder

municipality_code

Character. Municipality code where the person was registered at time of death, typically four digits with first two representing county code. Original field name: LK

icd_version

Integer. Version of ICD (International Classification of Diseases) used for coding. All records in dataset use ICD-10 (value=10). Original field name: ICD

underlying_cause

Character. The disease or injury that initiated the chain of events leading directly to death (ICD-10 code). Original field name: ULORSAK

injury_code

Character. Main injury or poisoning type code from ICD-10 chapter 19, used when underlying cause is an external cause. Contains 94.8% NA values. Original field name: KAP19

contributing_cause_1

Character. First contributing cause of death (ICD-10 code). Original field name: MORSAK1

contributing_cause_2

Character. Second contributing cause of death (ICD-10 code). 19.7% NA values. Original field name: MORSAK2

contributing_cause_3

Character. Third contributing cause of death (ICD-10 code). 38.8% NA values. Original field name: MORSAK3

contributing_cause_4

Character. Fourth contributing cause of death (ICD-10 code). 57.1% NA values. Original field name: MORSAK4

contributing_cause_5

Character. Fifth contributing cause of death (ICD-10 code). 72.3% NA values. Original field name: MORSAK5

age_group

Integer. Age group at death in five-year intervals. Original field name: DALDKL5

autopsy_type

Integer. Type of autopsy if performed: 1=Clinical autopsy, 3=Forensic autopsy. Contains 91.9% NA values. Original field name: DBGRUND1

hospital_examination

Integer. Examination before death at hospital (value=5). Contains 19.6% NA values. Original field name: DBGRUND5

alcohol_related

Logical. TRUE if an alcohol-related diagnosis is mentioned on the death certificate (as underlying or contributing cause). Contains 98% NA values. Original field name: ALKOHOL

diabetes_related

Logical. TRUE if diabetes is mentioned as an underlying or contributing cause of death. Contains 89% NA values. Original field name: DIABETES

nationality

Character. Nationality of the deceased. Original field name: NATION

birth_country

Character. Country of birth of the deceased. Original field name: FODLAND

death_place

Integer. Place of death: 1=Hospital, 2=Special housing, 3=Private home, 4=Other/unknown. Contains 5.2% NA values. Original field name: DODSPL

death_municipality

Character. Four-digit municipality code where death occurred. Contains 10.1% NA values. Original field name: DOD_KOMMUN

birth_date_truncated

Character. Birth date in format YYYY-MM. Original field name: FODDATN

Source

Socialstyrelsen (The National Board of Health and Welfare in Sweden), Dödsorsaksregistret

Details

This file contains data from Socialstyrelsen's death registry (dödsregistret) for deaths occurring between 2017 and 2019. The dataset includes information about the causes of death (both underlying and contributing), demographic information, and circumstances of death.

The causes of death are coded according to ICD-10 (International Classification of Diseases, 10th revision). The dataset includes the underlying cause of death (the disease or injury that initiated the events leading to death) and up to 5 contributing causes.

Additional information about examination before death, autopsy, alcohol and diabetes-related deaths is also included, though with substantial missing values for some fields.

Note

  • The dataset covers deaths from 2017 (30.3%), 2018 (38.9%), and 2019 (30.8%)

  • The gender distribution is almost equal: 50.2% male and 49.8% female

  • Age at death ranges from 18 to 106 years, with a median of 82 years

  • Several variables have high percentages of missing values, including injury_code (94.8%), autopsy_type (91.9%), alcohol_related (98%), and diabetes_related (89%)

  • The contributing_cause fields have progressively higher percentages of missing values, from 0% in contributing_cause_1 to 99.9% in later causes (which are excluded from this dataset)

  • Most deaths occurred in Sweden (96.5% based on nationality field)

  • Death locations were primarily hospitals (46.4%), special housing (28.6%), and private homes (17.4%)

Examples