Data imported from excel and csv in business situations can have messy characteristics and data formats. This package provides functions to tidy your data frame using the power of tidyselect.

create sample data

tibble::tibble(
  date = c("20190101", "20190305", "20201012"),
  numeric_val = c(1, NA, 5),
  char_val = c("", "    val ", "-")
) -> sample_table

sample_table
## # A tibble: 3 × 3
##   date     numeric_val char_val  
##   <chr>          <dbl> <chr>     
## 1 20190101           1 ""        
## 2 20190305          NA "    val "
## 3 20201012           5 "-"

set nas

Data occasionally has different ways to represent NA values. set_na checks as default c("-", "", " ", "null") but any values can be supplied to automatically be set to NA. This is helpful when you want to check the NA profile of a data frame using validata::diagnose

sample_table %>% 
  make_na()
## # A tibble: 3 × 3
##   date     numeric_val char_val  
##   <chr>          <dbl> <chr>     
## 1 20190101           1  NA       
## 2 20190305          NA "    val "
## 3 20201012           5  NA

remove whitespace

remove whitespace from the ends of character variables that may be otherwise undetectable by inspection.

sample_table %>% 
  remove_whitespace()
## # A tibble: 3 × 3
##   date     numeric_val char_val
##   <chr>          <dbl> <chr>   
## 1 20190101           1 ""      
## 2 20190305          NA "val"   
## 3 20201012           5 "-"

set dates

automatically convert character columns that should be dates.

sample_table %>% 
  set_date()
## # A tibble: 3 × 3
##   date       numeric_val char_val  
##   <date>           <dbl> <chr>     
## 1 2019-01-01           1 ""        
## 2 2019-03-05          NA "    val "
## 3 2020-10-12           5 "-"

relocate all

relocates an unorganized dataframe using heuristics such as putting character and date columns first, and organizing by alphabetical order.

sample_table %>% 
  relocate_all()
## # A tibble: 3 × 3
##   date     char_val   numeric_val
##   <chr>    <chr>            <dbl>
## 1 20190101 ""                   1
## 2 20190305 "    val "          NA
## 3 20201012 "-"                  5

clean frame

Wrapper function to apply all cleaning operations to a data frame using sensible defaults.

sample_table %>% 
  clean_frame()
## # A tibble: 3 × 3
##       DATE NUMERIC_VAL CHAR_VAL  
##      <int>       <int> <chr>     
## 1 20190101           1  NA       
## 2 20190305          NA "    val "
## 3 20201012           5  NA

fill nas

use tidyselect to fill NAs with a single value

sample_table %>% 
  fill_na()
## # A tibble: 3 × 3
##   date     numeric_val char_val  
##   <chr>          <dbl> <chr>     
## 1 20190101           1 ""        
## 2 20190305           0 "    val "
## 3 20201012           5 "-"