national_registrys.Rmd
National Registry Data Overview
This vignette provides a comprehensive overview of the data from Swedish national registries used in the SEM cohort study. These datasets from Socialstyrelsen (The National Board of Health and Welfare) and Statistics Sweden (SCB) complement the clinical data from the Melior electronic health record system by providing information about healthcare utilization, sociodemographic factors, and outcomes.
Data Source Categories
The national registry data is organized into three main categories based on source and content:
- National Patient Register Data (Socialstyrelsen): Healthcare encounters, diagnoses, and procedures
- Specialized Registries (Socialstyrelsen): Medications, causes of death
- Socioeconomic & Demographic Data (Statistics Sweden): Socioeconomic indicators, education, income, geographic and living conditions
Each dataset follows standardized variable naming conventions and processing procedures to ensure consistency across sources.
National Patient Register Data (Socialstyrelsen)
These datasets capture healthcare encounters within the Swedish healthcare system.
Dataset | Description | Key Variables | Timeframe |
---|---|---|---|
Inpatient Care Register | Hospital admissions with diagnoses, procedures, and lengths of stay | admission_date, discharge_date, main_diagnosis, procedure_code, length_of_stay | 2012-2019 |
Outpatient Care Register | Specialized outpatient visits including emergency department contacts | contact_date, main_diagnosis, procedure_code, emergency_department | 2012-2019 |
Specialized Registries (Socialstyrelsen)
These datasets provide detailed information on specific health-related factors.
Dataset | Description | Key Variables | Timeframe |
---|---|---|---|
Prescription Drug Register | Dispensed medications from pharmacies | dispensation_date, atc_code, quantity, strength_numeric | 2010-2018 |
Death Registry | Causes of death and related circumstances | death_date, underlying_cause, contributing_cause_1-5, death_place | 2017-2019 |
Socioeconomic & Demographic Data (Statistics Sweden)
These datasets provide information about socioeconomic and demographic factors.
Dataset | Description | Key Variables | Timeframe |
---|---|---|---|
LISA Database | Comprehensive socioeconomic information | education_level, employment_status, income measures, benefits, household composition | 2012-2018 annual data |
DeSO Residence Data | Geographic area of residence using DeSO classification | deso_code, year | 2012-2018 annual data |
Country of Birth Data | Country of birth classification for patients and parents | birth_country_group, mother/father_birth_country_group | Single timepoint |
LISA Socioeconomic Data
The LISA (Longitudinal Integration Database for Health Insurance and Labour Market Studies) dataset contains comprehensive socioeconomic information for each patient. This dataset is particularly valuable for understanding social determinants of health.
Key Variable Categories:
-
Demographics
- Gender
- Birth year
- Civil status
- Citizenship
-
Household Composition
- Number of children in different age groups
- Family structure indicators
-
Education
- Highest education level
- Field of education
- Graduation year
-
Employment
- Employment status
- Occupational position
- Detailed occupation codes
- Socioeconomic classification
-
Income and Benefits
- Disposable income
- Capital income
- Sickness benefits
- Unemployment benefits
- Parental leave payments
- Disability pension
- Social assistance
-
Health-Related Variables
- Sickness days
- Rehabilitation benefits
- Sickness compensation days
DeSO Geographic Data
The DeSO (Demographic Statistical Areas) dataset contains information about patients’ residential geographic areas. DeSO is a geographic subdivision introduced by Statistics Sweden in 2018 to enable statistical analysis at a detailed geographic level.
Coding Systems
The national registry datasets use several standardized coding systems:
- ICD-10: International Classification of Diseases, 10th revision for diagnoses
- KVÅ: Swedish Classification of Health Interventions for procedures
- ATC: Anatomical Therapeutic Chemical Classification System for medications
- DeSO: Demographic Statistical Areas for geographic classification
- SUN2000: Swedish Educational Nomenclature for education levels
- SSYK: Swedish Standard Classification of Occupations
Data Integration
All datasets can be linked through the common patient_id
field, which is a pseudonymized identifier consistent across all data
sources. This enables comprehensive analysis of:
- Patterns of healthcare utilization
- Medication usage
- Socioeconomic factors and their relationship to health outcomes
- Geographic variations in health
- Mortality and causes of death
Temporal Coverage
- Inpatient data: 2012-2019
- Outpatient data: 2012-2019
- Prescription data: 2010-2018
- Death registry: 2017-2019
- Socioeconomic data (LISA): 2012-2018, annual snapshots
- Geographic data (DeSO): 2012-2018, annual snapshots
- Birth country data: Time-invariant (single snapshot)
Data Characteristics and Quality Considerations
National Patient Register Data
- Completeness: The National Patient Register has very high completeness for inpatient care (>99%) but somewhat lower for outpatient care
- Diagnostic accuracy: Varies by condition, with higher accuracy for severe conditions and lower for milder conditions
- Procedure recording: More complete for surgical than for medical procedures
Prescription Drug Register
- Includes only dispensed prescriptions, not over-the-counter medications or medications given during hospital stays
- Very high completeness for outpatient dispensed medications (>99%)
- Strength information is provided in both text format (strength_text) and as separate numeric value (strength_numeric) and unit (strength_unit) fields
Death Registry
- High completeness for fact of death and date
- Varying accuracy for specific causes, with higher accuracy for major causes like cancer and cardiovascular disease
- Autopsy information available for a subset of deaths
- Information about place of death and circumstances
SCB Socioeconomic and Demographic Data
-
Missing data patterns:
- Birth country data may be missing for patients not born in Sweden or whose parents were not born in Sweden
- LISA data may have missing values for certain variables depending on employment status, age, and other factors
- DeSO data may be missing if a patient’s residence could not be mapped to a DeSO area
-
Categorical coding:
- Education Level (Sun2000niva) uses the Swedish educational nomenclature
- Occupation (SSYK) follows the Swedish Standard Classification of Occupations
- Country Groups use standardized groupings where code “00” typically represents Sweden
-
Monetary values:
- All monetary values (income, benefits) are in Swedish Krona (SEK)
- Values are annual sums for the specified year
Usage Recommendations
For cohort studies using this data, we recommend:
- Temporal alignment: Be aware of different temporal coverage between datasets
- Coding consistency: Note that coding practices may change over time, particularly for procedures
- Missing data handling: Several datasets have variables with substantial missing values
- Complex healthcare pathways: Use the patient register data (inpatient/outpatient) together with the specialized registries to construct comprehensive healthcare trajectories
- Socioeconomic context: Incorporate the SCB data to understand social determinants of health
For most analyses, the recommended approach is to:
- Start with the SEM cohort data that defines your study population
- Link to appropriate healthcare utilization data (inpatient/outpatient/prescriptions)
- Incorporate socioeconomic (LISA) and geographic (DeSO) factors
- Add birth country information for migration and demographic context
- Include mortality outcomes when relevant
Each dataset is accompanied by detailed documentation explaining the variables, data quality issues, and appropriate analytical approaches.