Skip to contents

National Registry Data Overview

This vignette provides a comprehensive overview of the data from Swedish national registries used in the SEM cohort study. These datasets from Socialstyrelsen (The National Board of Health and Welfare) and Statistics Sweden (SCB) complement the clinical data from the Melior electronic health record system by providing information about healthcare utilization, sociodemographic factors, and outcomes.

Data Source Categories

The national registry data is organized into three main categories based on source and content:

  1. National Patient Register Data (Socialstyrelsen): Healthcare encounters, diagnoses, and procedures
  2. Specialized Registries (Socialstyrelsen): Medications, causes of death
  3. Socioeconomic & Demographic Data (Statistics Sweden): Socioeconomic indicators, education, income, geographic and living conditions

Each dataset follows standardized variable naming conventions and processing procedures to ensure consistency across sources.

National Patient Register Data (Socialstyrelsen)

These datasets capture healthcare encounters within the Swedish healthcare system.

Dataset Description Key Variables Timeframe
Inpatient Care Register Hospital admissions with diagnoses, procedures, and lengths of stay admission_date, discharge_date, main_diagnosis, procedure_code, length_of_stay 2012-2019
Outpatient Care Register Specialized outpatient visits including emergency department contacts contact_date, main_diagnosis, procedure_code, emergency_department 2012-2019

Specialized Registries (Socialstyrelsen)

These datasets provide detailed information on specific health-related factors.

Dataset Description Key Variables Timeframe
Prescription Drug Register Dispensed medications from pharmacies dispensation_date, atc_code, quantity, strength_numeric 2010-2018
Death Registry Causes of death and related circumstances death_date, underlying_cause, contributing_cause_1-5, death_place 2017-2019

Socioeconomic & Demographic Data (Statistics Sweden)

These datasets provide information about socioeconomic and demographic factors.

Dataset Description Key Variables Timeframe
LISA Database Comprehensive socioeconomic information education_level, employment_status, income measures, benefits, household composition 2012-2018 annual data
DeSO Residence Data Geographic area of residence using DeSO classification deso_code, year 2012-2018 annual data
Country of Birth Data Country of birth classification for patients and parents birth_country_group, mother/father_birth_country_group Single timepoint

LISA Socioeconomic Data

The LISA (Longitudinal Integration Database for Health Insurance and Labour Market Studies) dataset contains comprehensive socioeconomic information for each patient. This dataset is particularly valuable for understanding social determinants of health.

Key Variable Categories:

  1. Demographics
    • Gender
    • Birth year
    • Civil status
    • Citizenship
  2. Household Composition
    • Number of children in different age groups
    • Family structure indicators
  3. Education
    • Highest education level
    • Field of education
    • Graduation year
  4. Employment
    • Employment status
    • Occupational position
    • Detailed occupation codes
    • Socioeconomic classification
  5. Income and Benefits
    • Disposable income
    • Capital income
    • Sickness benefits
    • Unemployment benefits
    • Parental leave payments
    • Disability pension
    • Social assistance
  6. Health-Related Variables
    • Sickness days
    • Rehabilitation benefits
    • Sickness compensation days

DeSO Geographic Data

The DeSO (Demographic Statistical Areas) dataset contains information about patients’ residential geographic areas. DeSO is a geographic subdivision introduced by Statistics Sweden in 2018 to enable statistical analysis at a detailed geographic level.

Key Features:

  • Annual residence information from 2012-2018
  • Geographic areas defined using DeSO codes
  • Each DeSO area typically has 700-2,700 inhabitants
  • DeSO codes have the format of a 4-digit municipality code followed by an alphanumeric identifier

Coding Systems

The national registry datasets use several standardized coding systems:

  1. ICD-10: International Classification of Diseases, 10th revision for diagnoses
  2. KVÅ: Swedish Classification of Health Interventions for procedures
  3. ATC: Anatomical Therapeutic Chemical Classification System for medications
  4. DeSO: Demographic Statistical Areas for geographic classification
  5. SUN2000: Swedish Educational Nomenclature for education levels
  6. SSYK: Swedish Standard Classification of Occupations

Data Integration

All datasets can be linked through the common patient_id field, which is a pseudonymized identifier consistent across all data sources. This enables comprehensive analysis of:

  • Patterns of healthcare utilization
  • Medication usage
  • Socioeconomic factors and their relationship to health outcomes
  • Geographic variations in health
  • Mortality and causes of death

Temporal Coverage

  • Inpatient data: 2012-2019
  • Outpatient data: 2012-2019
  • Prescription data: 2010-2018
  • Death registry: 2017-2019
  • Socioeconomic data (LISA): 2012-2018, annual snapshots
  • Geographic data (DeSO): 2012-2018, annual snapshots
  • Birth country data: Time-invariant (single snapshot)

Data Characteristics and Quality Considerations

National Patient Register Data

  • Completeness: The National Patient Register has very high completeness for inpatient care (>99%) but somewhat lower for outpatient care
  • Diagnostic accuracy: Varies by condition, with higher accuracy for severe conditions and lower for milder conditions
  • Procedure recording: More complete for surgical than for medical procedures

Prescription Drug Register

  • Includes only dispensed prescriptions, not over-the-counter medications or medications given during hospital stays
  • Very high completeness for outpatient dispensed medications (>99%)
  • Strength information is provided in both text format (strength_text) and as separate numeric value (strength_numeric) and unit (strength_unit) fields

Death Registry

  • High completeness for fact of death and date
  • Varying accuracy for specific causes, with higher accuracy for major causes like cancer and cardiovascular disease
  • Autopsy information available for a subset of deaths
  • Information about place of death and circumstances

SCB Socioeconomic and Demographic Data

  • Missing data patterns:
    • Birth country data may be missing for patients not born in Sweden or whose parents were not born in Sweden
    • LISA data may have missing values for certain variables depending on employment status, age, and other factors
    • DeSO data may be missing if a patient’s residence could not be mapped to a DeSO area
  • Categorical coding:
    • Education Level (Sun2000niva) uses the Swedish educational nomenclature
    • Occupation (SSYK) follows the Swedish Standard Classification of Occupations
    • Country Groups use standardized groupings where code “00” typically represents Sweden
  • Monetary values:
    • All monetary values (income, benefits) are in Swedish Krona (SEK)
    • Values are annual sums for the specified year

Usage Recommendations

For cohort studies using this data, we recommend:

  1. Temporal alignment: Be aware of different temporal coverage between datasets
  2. Coding consistency: Note that coding practices may change over time, particularly for procedures
  3. Missing data handling: Several datasets have variables with substantial missing values
  4. Complex healthcare pathways: Use the patient register data (inpatient/outpatient) together with the specialized registries to construct comprehensive healthcare trajectories
  5. Socioeconomic context: Incorporate the SCB data to understand social determinants of health

For most analyses, the recommended approach is to:

  1. Start with the SEM cohort data that defines your study population
  2. Link to appropriate healthcare utilization data (inpatient/outpatient/prescriptions)
  3. Incorporate socioeconomic (LISA) and geographic (DeSO) factors
  4. Add birth country information for migration and demographic context
  5. Include mortality outcomes when relevant

Each dataset is accompanied by detailed documentation explaining the variables, data quality issues, and appropriate analytical approaches.