SCB Country of Birth Dataset
scb_country_of_birth.Rd
This dataset contains information about patients' and their parents' countries of birth, categorized into groups according to Statistics Sweden's (SCB) classification system.
Format
A data frame with 316,980 observations and 7 variables:
- patient_id
Integer. Patient identifier, serves as a foreign key to link with other patient-level data. Original field name: Alias in the key file, mapped from lopnr
- latest_personal_id
Logical. Flag indicating if this is the latest personal ID number for the individual (TRUE for 99.99% of records). Original field name: SenPNr
- reused_personal_id
Logical. Flag indicating if this personal ID number has been reused (TRUE for only 0.1% of records). Original field name: AterPNr
- coordination_number
Logical. Flag indicating if this is a coordination number rather than a standard personal ID number. All records are FALSE. Original field name: SamOrdnNr
- incorrect_personal_id
Logical. Flag indicating if there are known errors with the ID. All records are FALSE. Original field name: FelPersonnr
- birth_country_group_detailed
Character. Detailed country group classification of the patient's birth country. 11 unique values. Most common: "00" (75.7%). Original field name: FodGrEg4
- mother_birth_country_group_detailed
Character. Detailed country group classification of the patient's mother's birth country. 11 unique values. Most common: "00" (62.0%), "11" (26.6%). Original field name: FodGrMor4
- father_birth_country_group_detailed
Character. Detailed country group classification of the patient's father's birth country. 11 unique values. Most common: "00" (59.6%), "11" (29.8%). Original field name: FodGrFar4
- birth_country_group
Character. Aggregated country group classification of the patient's birth country. 11 unique values. Most common: "00" (75.7%). Original field name: FodGrEg
- mother_birth_country_group
Character. Aggregated country group classification of the patient's mother's birth country. 11 unique values. Most common: "00" (62.0%), "11" (26.6%). Original field name: FodGrMor
- father_birth_country_group
Character. Aggregated country group classification of the patient's father's birth country. 11 unique values. Most common: "00" (59.6%), "11" (29.8%). Original field name: FodGrFar
Details
This file contains data about the country of birth groups for patients in the SEM cohort and their parents. The country groups are represented by numeric codes, where:
"00" likely represents Sweden (75.7% of patients, 62.0% of mothers, 59.6% of fathers)
"11" is common for parents (26.6% of mothers, 29.8% of fathers) but not for patients, and might represent "unknown" or another specific category
Other codes (01-09) likely represent different regions or country groups
The dataset includes both detailed classifications (suffix "4" in the original field names) and aggregated classifications (without suffix in the original field names). The exact meaning of these classification systems would require reference to SCB documentation.
The patient identifiers in the SCB files (lopnr) are mapped to the standard patient_id used in the SEM cohort study using a key file (SCB_Ekelund_LEV_Nyckel.csv), ensuring consistent patient identification across all datasets.
Note
Without specific documentation from SCB, the exact meaning of the country group codes is uncertain, though patterns in the data suggest code "00" is likely Sweden
Almost all records (99.99%) have latest_personal_id = TRUE, indicating that most IDs are current
A small percentage (0.1%) have reused_personal_id = TRUE, indicating potential ID reuse situations
coordination_number and incorrect_personal_id are FALSE for all records in the dataset but are retained for completeness
The difference between the detailed (*4) and standard classification is not explicitly documented but may represent different levels of geographic aggregation