Skip to contents

This dataset contains information about patients' and their parents' countries of birth, categorized into groups according to Statistics Sweden's (SCB) classification system.

Usage

scb_country_of_birth()

Format

A data frame with 316,980 observations and 7 variables:

patient_id

Integer. Patient identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias in the key file, mapped from lopnr

latest_personal_id

Logical. Flag indicating if this is the latest personal ID number for the individual (TRUE for 99.99% of records). Original field name: SenPNr

reused_personal_id

Logical. Flag indicating if this personal ID number has been reused (TRUE for only 0.1% of records). Original field name: AterPNr

coordination_number

Logical. Flag indicating if this is a coordination number rather than a standard personal ID number. All records are FALSE. Original field name: SamOrdnNr

incorrect_personal_id

Logical. Flag indicating if there are known errors with the ID. All records are FALSE. Original field name: FelPersonnr

birth_country_group_detailed

Character. Detailed country group classification of the patient's birth country. 11 unique values. Most common: "00" (75.7%). Original field name: FodGrEg4

mother_birth_country_group_detailed

Character. Detailed country group classification of the patient's mother's birth country. 11 unique values. Most common: "00" (62.0%), "11" (26.6%). Original field name: FodGrMor4

father_birth_country_group_detailed

Character. Detailed country group classification of the patient's father's birth country. 11 unique values. Most common: "00" (59.6%), "11" (29.8%). Original field name: FodGrFar4

birth_country_group

Character. Aggregated country group classification of the patient's birth country. 11 unique values. Most common: "00" (75.7%). Original field name: FodGrEg

mother_birth_country_group

Character. Aggregated country group classification of the patient's mother's birth country. 11 unique values. Most common: "00" (62.0%), "11" (26.6%). Original field name: FodGrMor

father_birth_country_group

Character. Aggregated country group classification of the patient's father's birth country. 11 unique values. Most common: "00" (59.6%), "11" (29.8%). Original field name: FodGrFar

Source

SCB (Statistics Sweden)

Details

This file contains data about the country of birth groups for patients in the SEM cohort and their parents. The country groups are represented by numeric codes, where:

  • "00" likely represents Sweden (75.7% of patients, 62.0% of mothers, 59.6% of fathers)

  • "11" is common for parents (26.6% of mothers, 29.8% of fathers) but not for patients, and might represent "unknown" or another specific category

  • Other codes (01-09) likely represent different regions or country groups

The dataset includes both detailed classifications (suffix "4" in the original field names) and aggregated classifications (without suffix in the original field names). The exact meaning of these classification systems would require reference to SCB documentation.

The patient identifiers in the SCB files (lopnr) are mapped to the standard patient_id used in the SEM cohort study using a key file (SCB_Ekelund_LEV_Nyckel.csv), ensuring consistent patient identification across all datasets.

Note

  • Without specific documentation from SCB, the exact meaning of the country group codes is uncertain, though patterns in the data suggest code "00" is likely Sweden

  • Almost all records (99.99%) have latest_personal_id = TRUE, indicating that most IDs are current

  • A small percentage (0.1%) have reused_personal_id = TRUE, indicating potential ID reuse situations

  • coordination_number and incorrect_personal_id are FALSE for all records in the dataset but are retained for completeness

  • The difference between the detailed (*4) and standard classification is not explicitly documented but may represent different levels of geographic aggregation