Automatically determine primary key — determine

Uses confirm_distinct in an iterative fashion to determine the primary keys.

determine_distinct(df, ..., listviewer = TRUE)

Arguments

df: a data frame
...: columns or a tidyselect specification. defaults to everything
listviewer: logical. defaults to TRUE to view output using the listviewer package

Value

list

Details

The goal of this function is to automatically determine which columns uniquely identify the rows of a dataframe. The output is a printed description of the combination of columns that form unique identifiers at each level. At level 1, the function tests if individual columns are primary keys At level 2, the function tests n C 2 combinations of columns to see if they form primary keys. The final level is testing all columns at once.

For completely unique columns, they are recorded in level 1, but then dropped from the data frame to facilitate the determination of multi-column primary keys.
If the dataset contains duplicated rows, they are eliminated before proceeding.

Examples


sample_data1 %>%
head
#> # A tibble: 6 × 6
#>   ID_COL1 ID_COL2 ID_COL3     VAL1   VAL2   VAL3
#>     <dbl>   <dbl>   <dbl>    <dbl>  <dbl>  <dbl>
#> 1    2413    1034    1014 -0.0639  -1.16  -0.302
#> 2    2413    1034    1322  0.363    1.62   0.165
#> 3    2413    1034    2999 -0.00466  1.23   0.819
#> 4    2413    1034    3544  1.83    -2.58  -0.525
#> 5    2413    1034    9901  0.837   -0.442 -0.341
#> 6    2413    1122    1014 -0.894   -1.11   0.768


## on level 1, each column is tested as a unique identifier. the VAL columns have no
## duplicates and hence qualify, even though they normally would be considered as IDs
## on level 3, combinations of 3 columns are tested. implying that ID_COL 1,2,3 form a unique key
## level 2 does not appear, implying that combinations of any 2 ID_COLs do not form a unique key

sample_data1 %>%
determine_distinct(listviewer = FALSE)
#> $`LEVEL 3`
#> [1] "ID_COL1, ID_COL2, ID_COL3"
#> 
#> $`LEVEL 1`
#> $`LEVEL 1`[[1]]
#> [1] "VAL1"
#> 
#> $`LEVEL 1`[[2]]
#> [1] "VAL2"
#> 
#> $`LEVEL 1`[[3]]
#> [1] "VAL3"
#> 
#>