Package 'dbGaPCheckup' reference manual

Title:	dbGaP Checkup
Description:	Contains functions that check for formatting of the Subject Phenotype data set and data dictionary as specified by the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP) <https://www.ncbi.nlm.nih.gov/gap/docs/submissionguide/>.
Authors:	Lacey W. Heinsberg [aut, cre], Daniel E. Weeks [aut], University of Pittsburgh [cph]
Maintainer:	Lacey W. Heinsberg <[email protected]>
License:	GPL-2
Version:	1.1.0
Built:	2025-02-20 05:16:46 UTC
Source:	https://github.com/lwheinsberg/dbgapcheckup

Add Missing Fields

Description

This function adds additional fields required by this package including variable type (TYPE), minimum value (MIN), and maximum value (MAX).

Usage

add_missing_fields(DD.dict, DS.data)
add_missing_fields(DD.dict, DS.data)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.

Details

Even though MIN, MAX, and TYPE are not required by dbGaP, our package was created to use these variables in a series of other checks and awareness functions (e.g., render_report, values_check, etc.). MIN/MAX columns will be added as empty columns as dbGaP instructions state that the MIN and MAX should be the "logical" MIN/MAX for the data, not necessarily the observed MIN/MAX, which would be study and variable specific. TYPE will be inferred from the data set and data dictionary VALUES columns. Note however, that if the VALUES columns are not set up correctly, then this function can't properly infer the data TYPE from the data set and data dictionary.

Value

A data frame containing the updated data dictionary with missing fields added in, or NULL if any required pre-checks fail.

Examples

# Example
data(ExampleD)
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
# Example
data(ExampleD)
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)

Check Report

Description

This function generates a user-readable report of the checks run by the complete_check function.

Usage

check_report(DD.dict, DS.data, non.NA.missing.codes = NA, compact = TRUE)
check_report(DD.dict, DS.data, non.NA.missing.codes = NA, compact = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).
`compact`	When TRUE, the function prints a compact report, listing information from only the non-passed checks.

Value

Tibble, returned invisibly, containing the following information for each check: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).

Examples

# Example 1: Incorrectly showing as pass check on first attempt
data(ExampleB)
report <- check_report(DD.dict.B, DS.data.B)
# Addition of missing value codes calls attention to error
# at missing_value_check
report <- check_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-4444, -9999))

# Example 2: Several fail checks or not attempted
data(ExampleC)
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999))
# Note you can also run report using compact=FALSE
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999), compact = FALSE)
# Example 1: Incorrectly showing as pass check on first attempt
data(ExampleB)
report <- check_report(DD.dict.B, DS.data.B)
# Addition of missing value codes calls attention to error
# at missing_value_check
report <- check_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-4444, -9999))

# Example 2: Several fail checks or not attempted
data(ExampleC)
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999))
# Note you can also run report using compact=FALSE
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999), compact = FALSE)

Complete Check

Description

This function runs a full workflow check including field_check, pkg_field_check, dimension_check, name_check, id_check, row_check, NA_check, type_check, values_check, integer_check, decimal_check, misc_format_check, description_check, minmax_check, and missing_value_check.

Usage

complete_check(
  DD_dict,
  DS_data,
  non.NA.missing.codes = NA,
  reorder.dict = FALSE,
  name.correct = FALSE
)
complete_check(
  DD_dict,
  DS_data,
  non.NA.missing.codes = NA,
  reorder.dict = FALSE,
  name.correct = FALSE
)

Arguments

`DD_dict`	Data dictionary.
`DS_data`	Data set.
`non.NA.missing.codes`	A user-defined vector of encoded, numerical (i.e., non-NA) missing value codes (e.g., -9999).
`reorder.dict`	When TRUE, and only if the names between the data and data dictionary match perfectly but are in the wrong order, the function will reorder the rows of the dictionary to match the columns of the data; note please use with caution: we recommend first running the function with the default set to FALSE to understand potential errors.
`name.correct`	When TRUE, if name mismatches are identified, the function will rename the variable names in the data set to match the data dictionary; note please use with caution: we recommend first running the function with the default set to FALSE to identify order/dimension mismatches (vs. name mismatches).

Value

Tibble containing the following information for each check: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed/Warning); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).

Examples

# Example 1
# Note in this example, the missing value codes are not defined,
# so the last check ('missing_value_check') doesn't know to
# to check for encoded values
data(ExampleB)
complete_check(DD.dict.B, DS.data.B)
# Rerun check after defining missing value codes
complete_check(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999, -4444))

# Example 2
data(ExampleA)
complete_check(DD.dict.A, DS.data.A, non.NA.missing.codes=c(-9999, -4444))

# Example 3
data(ExampleD)
results <- complete_check(DD.dict.D, DS.data.D, non.NA.missing.codes=c(-9999, -4444))  
# View output in greater detail
results$Message[2] # Recommend using add_missing_fields
results$Information$pkg_field_check.Info # We see that MIN, MAX, and TYPE are all missing
# Use the add_missing_fields function to add in data
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
# Be sure to call in the new version of the dictionary (DD.dict.updated)
complete_check(DD.dict.updated, DS.data.D)
# Example 1
# Note in this example, the missing value codes are not defined,
# so the last check ('missing_value_check') doesn't know to
# to check for encoded values
data(ExampleB)
complete_check(DD.dict.B, DS.data.B)
# Rerun check after defining missing value codes
complete_check(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999, -4444))

# Example 2
data(ExampleA)
complete_check(DD.dict.A, DS.data.A, non.NA.missing.codes=c(-9999, -4444))

# Example 3
data(ExampleD)
results <- complete_check(DD.dict.D, DS.data.D, non.NA.missing.codes=c(-9999, -4444))  
# View output in greater detail
results$Message[2] # Recommend using add_missing_fields
results$Information$pkg_field_check.Info # We see that MIN, MAX, and TYPE are all missing
# Use the add_missing_fields function to add in data
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
# Be sure to call in the new version of the dictionary (DD.dict.updated)
complete_check(DD.dict.updated, DS.data.D)

Create Awareness Report

Description

This function generates an awareness report in HTML format, and optionally opens it in the web browser.

Usage

create_awareness_report(
  DD.dict,
  DS.data,
  non.NA.missing.codes = NA,
  threshold = 95,
  output.path = tempdir(),
  open.html = TRUE,
  fn.stem = "AwarenessReport"
)
create_awareness_report(
  DD.dict,
  DS.data,
  non.NA.missing.codes = NA,
  threshold = 95,
  output.path = tempdir(),
  open.html = TRUE,
  fn.stem = "AwarenessReport"
)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).
`threshold`	Threshold for missingness of concern (as a percent).
`output.path`	Path to the folder in which to create the HTML report document.
`open.html`	If TRUE, open the HTML report document in the web browser.
`fn.stem`	File name stem.

Value

Full path to the HTML report document.

Examples


data(ExampleB)
create_awareness_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999),
   output.path= tempdir(), open.html = FALSE)

data(ExampleB)
create_awareness_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999),
   output.path= tempdir(), open.html = FALSE)

Create Report

Description

This function calls eval_function to generate a textual and graphical report of the selected variables in HTML format, and optionally opens it in the web browser.

Usage

create_report(
  DD.dict,
  DS.data,
  sex.split = FALSE,
  sex.name = NULL,
  start = 1,
  end = 1,
  non.NA.missing.codes = NA,
  output.path = tempdir(),
  open.html = TRUE,
  fn.stem = "Report"
)
create_report(
  DD.dict,
  DS.data,
  sex.split = FALSE,
  sex.name = NULL,
  start = 1,
  end = 1,
  non.NA.missing.codes = NA,
  output.path = tempdir(),
  open.html = TRUE,
  fn.stem = "Report"
)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`sex.split`	When TRUE, split reports by the field named as defined by the sex.name variable.
`sex.name`	Character string specifying the name of the sex field.
`start`	Staring index of the first select trait.
`end`	Ending index of the last selected trait.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).
`output.path`	Path to the folder in which to create the HTML report document.
`open.html`	If TRUE, open the HTML report document in the web browser.
`fn.stem`	File name stem.

Value

Full path to the HTML report document.

Examples


data(ExampleB)
create_report(DD.dict.B, DS.data.B, sex.split=TRUE, sex.name= "SEX",
   start = 3, end = 7, non.NA.missing.codes=c(-9999,-4444),
   output.path= tempdir(), open.html = FALSE)

data(ExampleB)
create_report(DD.dict.B, DS.data.B, sex.split=TRUE, sex.name= "SEX",
   start = 3, end = 7, non.NA.missing.codes=c(-9999,-4444),
   output.path= tempdir(), open.html = FALSE)

Data Utility Function

Description

This function calls eval_function to generate a textual and graphical report of the selected variables.

Usage

dat_function(
  DS.dataset,
  DD.dictionary,
  sex.split = FALSE,
  sex.name = NULL,
  DS.dataset.na
)
dat_function(
  DS.dataset,
  DD.dictionary,
  sex.split = FALSE,
  sex.name = NULL,
  DS.dataset.na
)

Arguments

`DS.dataset`	Data set.
`DD.dictionary`	Data dictionary.
`sex.split`	When TRUE, split reports by the field named by the sex.name string.
`sex.name`	Character string giving the name of the sex field.
`DS.dataset.na`	Data set with missing values set to NA.

Value

Invisible NULL, called for its side effects.

Data Selected Utility Function

Description

This function calls eval_function to generate a textual and graphical report of the selected variables.

Usage

dat_function_selected(
  dataset,
  dictionary,
  sex.split = FALSE,
  sex.name = NULL,
  start = 1,
  end = 1,
  dataset.na,
  h.level = 2
)
dat_function_selected(
  dataset,
  dictionary,
  sex.split = FALSE,
  sex.name = NULL,
  start = 1,
  end = 1,
  dataset.na,
  h.level = 2
)

Arguments

`dataset`	Data set.
`dictionary`	Data dictionary.
`sex.split`	When TRUE, split reports by the field named 'Sex'.
`sex.name`	Character string giving the name of the sex field.
`start`	Staring index of the first selected trait.
`end`	Ending index of the last selected trait.
`dataset.na`	Data set with missing values set to NA.
`h.level`	Header level for pandoc function.

Usage

decimal_check(DD.dict, DS.data, verbose = TRUE)
decimal_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as a list of variables that may be incorrectly labeled as TYPE decimal.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of variables that are listed as TYPE decimal, but do not appear to be decimals).

Examples

# Example 1: Fail check
data(ExampleF)
decimal_check(DD.dict.F, DS.data.F)
print(integer_check(DD.dict.F, DS.data.F, verbose=FALSE))

# Example 2: Required pre-check fails
data(ExampleE)
decimal_check(DD.dict.E, DS.data.E)
print(decimal_check(DD.dict.E, DS.data.E, verbose=FALSE))

# Example 3: Pass check
data(ExampleA)
decimal_check(DD.dict.A, DS.data.A)
print(decimal_check(DD.dict.A, DS.data.A, verbose=FALSE))
# Example 1: Fail check
data(ExampleF)
decimal_check(DD.dict.F, DS.data.F)
print(integer_check(DD.dict.F, DS.data.F, verbose=FALSE))

# Example 2: Required pre-check fails
data(ExampleE)
decimal_check(DD.dict.E, DS.data.E)
print(decimal_check(DD.dict.E, DS.data.E, verbose=FALSE))

# Example 3: Pass check
data(ExampleA)
decimal_check(DD.dict.A, DS.data.A)
print(decimal_check(DD.dict.A, DS.data.A, verbose=FALSE))

Description Check

Description

This function checks that there is a unique description for every variable in the data dictionary (VARDESC column).

Usage

description_check(DD.dict, verbose = TRUE)
description_check(DD.dict, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`verbose`	When TRUE, the function prints the Message out, as well as a list of the variables that are missing a `VARDESC` or have a duplicated `VARDESC`.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of the variables with missing or duplicated descriptions).

Examples

# Example 1: Fail check 
data(ExampleG)
description_check(DD.dict.G)
print(description_check(DD.dict.G, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
description_check(DD.dict.A)
print(description_check(DD.dict.A, verbose=FALSE))
# Example 1: Fail check 
data(ExampleG)
description_check(DD.dict.G)
print(description_check(DD.dict.G, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
description_check(DD.dict.A)
print(description_check(DD.dict.A, verbose=FALSE))

Data Dictionary Search

Description

This awareness function helps you search the data dictionary for a specific term; intended for use as an investigative aid to supplement other checks in this package.

Usage

dictionary_search(
  DD.dict,
  search.term = c("blood pressure"),
  search.column = c("VARDESC")
)
dictionary_search(
  DD.dict,
  search.term = c("blood pressure"),
  search.column = c("VARDESC")
)

Arguments

`DD.dict`	Data dictionary.
`search.term`	Search term.
`search.column`	Column of the data dictionary to search.

Value

Tibble containing dictionary rows in which the search term was detected in specified column or an error message if the search column could not be detected.

Examples

# Successful search
data(ExampleB)
dictionary_search(DD.dict.B, search.term=c("skinfold"), search.column=c("VARDESC"))
# Attempted search in wrong column
dictionary_search(DD.dict.B, search.term=c("skinfold"), search.column=c("VARIABLE_DESCRIPTION"))
# Successful search
data(ExampleB)
dictionary_search(DD.dict.B, search.term=c("skinfold"), search.column=c("VARDESC"))
# Attempted search in wrong column
dictionary_search(DD.dict.B, search.term=c("skinfold"), search.column=c("VARIABLE_DESCRIPTION"))

Dimension Check

Description

This function checks that the number of variables match between the data set and the data dictionary.

Usage

dimension_check(DD.dict, DS.data, verbose = TRUE)
dimension_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as the number of variables in the data set and data dictionary.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (number of variables in the data and dictionary and names of mismatched variables if applicable).

Examples

# Example 1: Fail check
data(ExampleG)
dimension_check(DD.dict.G, DS.data.G)
print(dimension_check(DD.dict=DD.dict.G, DS.data=DS.data.G,verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
dimension_check(DD.dict.A, DS.data.A)
print(dimension_check(DD.dict.A, DS.data.A,verbose=FALSE))
# Example 1: Fail check
data(ExampleG)
dimension_check(DD.dict.G, DS.data.G)
print(dimension_check(DD.dict=DD.dict.G, DS.data=DS.data.G,verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
dimension_check(DD.dict.A, DS.data.A)
print(dimension_check(DD.dict.A, DS.data.A,verbose=FALSE))

DS.data.A

Description

Data set embedded in ExampleA.

Usage

data(ExampleA)
data(ExampleA)

DS.data.B

Description

Data set embedded in ExampleB.

Usage

data(ExampleB)
data(ExampleB)

DS.data.C

Description

Data set embedded in ExampleC.

Usage

data(ExampleC)
data(ExampleC)

DS.data.D

Description

Data set embedded in ExampleD.

Usage

data(ExampleD)
data(ExampleD)

DS.data.E

Description

Data set embedded in ExampleE.

Usage

data(ExampleE)
data(ExampleE)

DS.data.F

Description

Data set embedded in ExampleF.

Usage

data(ExampleF)
data(ExampleF)

DS.data.G

Description

Data set embedded in ExampleG.

Usage

data(ExampleG)
data(ExampleG)

DS.data.H

Description

Data set embedded in ExampleH.

Usage

data(ExampleH)
data(ExampleH)

DS.data.I

Description

Data set embedded in ExampleI.

Usage

data(ExampleI)
data(ExampleI)

DS.data.J

Description

Data set embedded in ExampleJ.

Usage

data(ExampleJ)
data(ExampleJ)

DS.data.K

Description

Data set embedded in ExampleK.

Usage

data(ExampleK)
data(ExampleK)

DS.data.L

Description

Data set embedded in ExampleL.

Usage

data(ExampleL)
data(ExampleL)

DS.data.M

Description

Data set embedded in ExampleM.

Usage

data(ExampleM)
data(ExampleM)

DS.data.N

Description

Data set embedded in ExampleN.

Usage

data(ExampleN)
data(ExampleN)

DS.data.O

Description

Data set embedded in ExampleO.

Usage

data(ExampleO)
data(ExampleO)

DS.data.P

Description

Data set embedded in ExampleP.

Usage

data(ExampleP)
data(ExampleP)

DS.data.Q

Description

Data set embedded in ExampleQ.

Usage

data(ExampleQ)
data(ExampleQ)

DS.data.R

Description

Data set embedded in ExampleR.

Usage

data(ExampleR)
data(ExampleR)

DS.data.S

Description

Data set embedded in ExampleS.

Usage

data(ExampleS)
data(ExampleS)

Duplicate Values Function

Description

This function checks for duplicate VALUES column names in the data dictionary.

Usage

dup_values(DD.dict)
dup_values(DD.dict)

Arguments

DD.dict

Data dictionary.

Value

Logical, TRUE if only one VALUES column is detected.

Evaluation Utility Function

Description

This function generates a textual and graphical report of the selected variables.

Usage

eval_function(
  dataset,
  dictionary,
  sex.split = FALSE,
  sex.name = NULL,
  dataset.na,
  h.level = 2
)
eval_function(
  dataset,
  dictionary,
  sex.split = FALSE,
  sex.name = NULL,
  dataset.na,
  h.level = 2
)

Arguments

`dataset`	Data set.
`dictionary`	Data dictionary.
`sex.split`	When TRUE, split reports by the field named 'Sex'.
`sex.name`	Name of the Sex field.
`dataset.na`	Data set with missing values set to NA.
`h.level`	Header level for pandoc function.

Value

Invisible NULL, called for its side effects.

ExampleA

Description

Example data set and data dictionary with no errors.

Usage

data(ExampleA)
data(ExampleA)

Format

R data file that contains two objects:

DD.dict.A: Data dictionary
DS.data.A: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example1.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.A <- readxl::read_xlsx(DD.path)
path <- system.file("extdata", "DS_Example.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.A <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.A, DS.data.A, file = "ExampleA.rda")

ExampleB

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleB)
data(ExampleB)

Format

R data file that contains two objects:

DD.dict.B: Data dictionary
DS.data.B: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example1b.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.B <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example1b.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.B <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.B, DS.data.B, file = "ExampleB.rda")

ExampleC

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleC)
data(ExampleC)

Format

R data file that contains two objects:

DD.dict.C: Data dictionary
DS.data.C: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2d.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.C <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example1b.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.C <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.C, DS.data.C, file = "ExampleC.rda")

ExampleD

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleD)
data(ExampleD)

Format

R data file that contains two objects:

DD.dict.D: Data dictionary
DS.data.D: Data set

Source

path <- system.file("extdata", "3b_SSM_DD_Example2f.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.D <- readxl::read_xlsx(path)
DS.path <- system.file("extdata", "DS_Example.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.D <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.D, DS.data.D, file = "ExampleD.rda")

ExampleE

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleE)
data(ExampleE)

Format

R data file that contains two objects:

DD.dict.E: Data dictionary
DS.data.E: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2b.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.E <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example2.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.E <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.E, DS.data.E, file = "ExampleE.rda")

ExampleF

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleF)
data(ExampleF)

Format

R data file that contains two objects:

DD.dict.F: Data dictionary
DS.data.F: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example4.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.F <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example3d.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.F <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.F, DS.data.F, file = "ExampleF.rda")

ExampleG

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleG)
data(ExampleG)

Format

R data file that contains two objects:

DD.dict.G: Data dictionary
DS.data.G: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.G <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.G <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.G, DS.data.G, file = "ExampleG.rda")

ExampleH

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleH)
data(ExampleH)

Format

R data file that contains two objects:

DD.dict.H: Data dictionary
DS.data.H: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example1.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.H <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example3c.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.H <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.H, DS.data.H, file = "ExampleH.rda")

ExampleI

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleI)
data(ExampleI)

Format

R data file that contains two objects:

DD.dict.I: Data dictionary
DS.data.I: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2c.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.I <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example2c.txt",package = "dbGaPCheckup", mustWork=TRUE)
DS.data.I <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.I, DS.data.I, file = "ExampleI.rda")

ExampleJ

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleJ)
data(ExampleJ)

Format

R data file that contains two objects:

DD.dict.J: Data dictionary
DS.data.J: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2d.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.J <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example2.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.J <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.J, DS.data.J, file = "ExampleJ.rda")

ExampleK

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleK)
data(ExampleK)

Format

R data file that contains two objects:

DD.dict.K: Data dictionary
DS.data.K: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2d.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.K <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example2b.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.K <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.K, DS.data.K, file = "ExampleK.rda")

ExampleL

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleL)
data(ExampleL)

Format

R data file that contains two objects:

DD.dict.L: Data dictionary
DS.data.L: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2b.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.L <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example2c.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.L <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.L, DS.data.L, file = "ExampleL.rda")

ExampleM

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleM)
data(ExampleM)

Format

R data file that contains two objects:

DD.dict.M: Data dictionary
DS.data.M: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2b.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.M <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.M <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.M, DS.data.M, file = "ExampleM.rda")

ExampleN

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleN)
data(ExampleN)

Format

R data file that contains two objects:

DD.dict.N: Data dictionary
DS.data.N: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example2e.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.N <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.N <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.N, DS.data.N, file = "ExampleN.rda")

ExampleO

Description

Example data set with intentional errors.

Usage

data(ExampleO)
data(ExampleO)

Format

R data file that contains a single object:

DS.data.O: Data set

Source

DS.path <- system.file("extdata", "DS_Example3.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.O <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DS.data.O, file = "ExampleO.rda")

ExampleP

Description

Example data set with intentional errors.

Usage

data(ExampleP)
data(ExampleP)

Format

R data file that contains a single object:

DS.data.P: Data set

Source

DS.path <- system.file("extdata", "DS_Example3b.txt", package = "dbGaPCheckup", mustWork=TRUE)
DS.data.P <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DS.data.P, file = "ExampleP.rda")

ExampleQ

Description

Example data set and data dictionary with no errors.

Usage

data(ExampleQ)
data(ExampleQ)

Format

R data file that contains two objects:

DD.dict.Q: Data dictionary
DS.data.Q: Data set

Source

DD.path <- system.file("extdata", "3b_SSM_DD_Example5.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.Q <- readxl::read_xlsx(DD.path)
DS.path <- system.file("extdata", "DS_Example5.txt", package = "dbGaPCheckup", mustWork=TRUE) ### FIX THIS 
DS.data.Q <- read.table(DS.path, header=TRUE, sep="\t", quote="", as.is = TRUE)
save(DD.dict.Q, DS.data.Q, file = "ExampleQ.rda")

ExampleR

Description

Example data set and data dictionary with no errors.

Usage

data(ExampleR)
data(ExampleR)

Format

R data file that contains two objects:

DD.dict.R: Data dictionary
DS.data.R: Data set

Source

library(tidyverse)
DD.dict.R <- DD.dict.A
DS.data.R <- DS.data.A
# Change SUBJECT_ID to a string
DS.data.R$SUBJECT_ID <- paste0("A",DS.data.R$SUBJECT_ID)
DD.dict.R$TYPE[DD.dict.R$VARNAME=="SUBJECT_ID"] <- "string"
# Change HX_DEPRESSION to a string
DS.data.R <- DS.data.R %>% mutate(HX_DEPRESSION = recode(HX_DEPRESSION, '0' = 'no','1'='yes','-9999' = '-9999'))
DD.dict.R$TYPE[DD.dict.R$VARNAME=="HX_DEPRESSION"] <- "string"
DD.dict.R$VALUES[DD.dict.R$VARNAME=="HX_DEPRESSION"] <- "-9999=missing value"
# Set the extra VALUES column names to blank
DD.dict.R$`...18`[DD.dict.R$VARNAME=="HX_DEPRESSION"] <- NA
DD.dict.R$`...19`[DD.dict.R$VARNAME=="HX_DEPRESSION"] <- NA
nval <- which(names(DD.dict.R) == "VALUES")
names(DD.dict.R)[(nval + 1):ncol(DD.dict.R)] <- ""
save(DD.dict.R, DS.data.R, file="ExampleR.rda")

ExampleS

Description

Example data set and data dictionary with intentional errors.

Usage

data(ExampleS)
data(ExampleS)

Format

R data file that contains two objects:

DD.dict.S: Data dictionary
DS.data.S: Data set

Source

DS.path <- system.file("extdata", "DS_Example6.txt", package = "dbGaPCheckup", mustWork=TRUE)  
DS.data.S <- read.table(DS.path, header=TRUE, sep="\t", quote="")
DD.path <- system.file("extdata", "DD_Example5b.xlsx", package = "dbGaPCheckup", mustWork=TRUE)
DD.dict.S1 <- readxl::read_xlsx(DD.path)
DD.dict.S <- reorder_dictionary(DD.dict.S1, DS.data.S)
save(DD.dict.S, DS.data.S, file = "ExampleS.rda")

Field Check

Description

This function checks for dbGaP required fields variable name (VARNAME), variable description (VARDESC), units (UNITS), and variable value and meaning (VALUES).

Usage

field_check(DD.dict, verbose = TRUE)
field_check(DD.dict, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`verbose`	When TRUE, the function prints the Message out, as well as a list of the fields not found in the data dictionary.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Named vector of TRUE/FALSE values alerting user if checks passed (TRUE) or failed (FALSE) for VARNAME, VARDESC, UNITS, and VALUE).

Examples

data(ExampleA)
field_check(DD.dict.A)
print(field_check(DD.dict.A, verbose=FALSE))
data(ExampleA)
field_check(DD.dict.A)
print(field_check(DD.dict.A, verbose=FALSE))

ID Check

Description

This function checks that the first column of the data set is the primary ID for each participant labeled as SUBJECT_ID, that values contain no illegal characters or padded zeros, and that each participant has an ID.

Usage

id_check(DS.data, verbose = TRUE)
id_check(DS.data, verbose = TRUE)

Arguments

`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as more detailed diagnostic information.

Details

Subject IDs should be an integer or string value. Integers should not have zero padding. IDs should not have spaces. Specifically, only the following characters can be included in the ID: English letters, Arabic numerals, period (.), hyphen (-), underscore (_), at symbol (@), and the pound sign (#). All IDs should be filled in (i.e., no misisng IDs are allowed).

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Detailed information about the four ID checks that were performed).

Examples

# Example 1: Fail check, 'SUBJECT_ID' not present
data(ExampleO)
id_check(DS.data.O)
print(id_check(DS.data.O, verbose=FALSE))

# Example 2: Fail check, 'SUBJECT_ID' includes illegal spaces
data(ExampleP)
id_check(DS.data.P)
results <- id_check(DS.data.P)
results$Information[[1]]$details
print(id_check(DS.data.P, verbose=FALSE))

# Example 3: Pass check
data(ExampleA)
id_check(DS.data.A)
print(id_check(DS.data.A, verbose=FALSE))
# Example 1: Fail check, 'SUBJECT_ID' not present
data(ExampleO)
id_check(DS.data.O)
print(id_check(DS.data.O, verbose=FALSE))

# Example 2: Fail check, 'SUBJECT_ID' includes illegal spaces
data(ExampleP)
id_check(DS.data.P)
results <- id_check(DS.data.P)
results$Information[[1]]$details
print(id_check(DS.data.P, verbose=FALSE))

# Example 3: Pass check
data(ExampleA)
id_check(DS.data.A)
print(id_check(DS.data.A, verbose=FALSE))

Relocate SUBJECT_ID to First Column of Data Set

Description

This utility function reorders the data set so that SUBJECT_ID comes first.

Usage

id_first_data(DS.data)
id_first_data(DS.data)

Arguments

DS.data

Data set.

Details

SUBJECT_ID is required to be the first column of the data set and first variable listed in the data dictionary.

Value

Updated data set with SUBJECT_ID as first column.

Examples

data(ExampleQ)
head(DS.data.Q)
DS.data.updated <- id_first_data(DS.data.Q)
head(DS.data.updated)
data(ExampleQ)
head(DS.data.Q)
DS.data.updated <- id_first_data(DS.data.Q)
head(DS.data.updated)

Relocate SUBJECT_ID to First Column of Data Dictionary

Description

This utility function reorders the data dictionary so that SUBJECT_ID comes first.

Usage

id_first_dict(DD.dict)
id_first_dict(DD.dict)

Arguments

DD.dict

Data dictionary.

Details

SUBJECT_ID is required to be the first column of the data set and first variable listed in the data dictionary.

Value

Updated data dictionary with SUBJECT_ID as first variable.

Examples

data(ExampleQ)
head(DD.dict.Q)
DD.dict.updated <- id_first_dict(DD.dict.Q)
head(DD.dict.updated)
data(ExampleQ)
head(DD.dict.Q)
DD.dict.updated <- id_first_dict(DD.dict.Q)
head(DD.dict.updated)

Integer Check Base Function

Description

This function checks for integer values.

Usage

int_check(data)
int_check(data)

Arguments

data

Number or vector of numbers.

Value

Logical, TRUE if all non-missing entries in the input vector are integers.

Integer Check

Description

This function searches for variables that appear to be incorrectly listed as TYPE integer.

Usage

integer_check(DD.dict, DS.data, verbose = TRUE)
integer_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as a list of variables that may be incorrectly labeled as TYPE integer.

Value

Examples

# Example 1: Fail check
data(ExampleH)
integer_check(DD.dict.H, DS.data.H)
print(integer_check(DD.dict.H, DS.data.H, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
integer_check(DD.dict.A, DS.data.A)
print(integer_check(DD.dict.A, DS.data.A, verbose=FALSE))

data(ExampleR)
integer_check(DD.dict.R, DS.data.R)
print(integer_check(DD.dict.R, DS.data.R, verbose=FALSE))
# Example 1: Fail check
data(ExampleH)
integer_check(DD.dict.H, DS.data.H)
print(integer_check(DD.dict.H, DS.data.H, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
integer_check(DD.dict.A, DS.data.A)
print(integer_check(DD.dict.A, DS.data.A, verbose=FALSE))

data(ExampleR)
integer_check(DD.dict.R, DS.data.R)
print(integer_check(DD.dict.R, DS.data.R, verbose=FALSE))

Label the data

Description

Using the information in the data dictionary, this function adds non-missing information from the data dictionary as attributes to the data.

Usage

label_data(DD.dict, DS.data, non.NA.missing.codes = NA)
label_data(DD.dict, DS.data, non.NA.missing.codes = NA)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).

Value

A tibble containing the labelled data set, with the data dictionary information embedded as attributes and variables labelled using Haven SPSS conventions.

Examples

data(ExampleB)
DS_labelled_data <- label_data(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999))
labelled::var_label(DS_labelled_data$SEX)
labelled::val_labels(DS_labelled_data$SEX)
attributes(DS_labelled_data$SEX)
labelled::na_values(DS_labelled_data$HX_DEPRESSION)
data(ExampleB)
DS_labelled_data <- label_data(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999))
labelled::var_label(DS_labelled_data$SEX)
labelled::val_labels(DS_labelled_data$SEX)
attributes(DS_labelled_data$SEX)
labelled::na_values(DS_labelled_data$HX_DEPRESSION)

Minimum and Maximum Values Check

Description

This function flags variables that have values exceeding the MIN or MAX listed in the data dictionary.

Usage

minmax_check(DD.dict, DS.data, verbose = TRUE, non.NA.missing.codes = NA)
minmax_check(DD.dict, DS.data, verbose = TRUE, non.NA.missing.codes = NA)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as a list of variables that violate the listed `MIN` or `MAX`.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (A sorted list of unique values that are either less than the MIN value or greater than the MAX value).

Examples

# Example 1
# Fail check (incorrectly flagging NA value codes -9999
# and -4444 as outside of the min max range)
data(ExampleA)
minmax_check(DD.dict.A, DS.data.A)
# View out of range values:
details <- minmax_check(DD.dict.A, DS.data.A)$Information
details[[1]]$OutOfRangeValues
# Attempt 2, specifying -9999 and -4444 as missing value
# codes so check works correctly
minmax_check(DD.dict.A, DS.data.A, non.NA.missing.codes=c(-9999, -4444))

# Example 2
data(ExampleI)
minmax_check(DD.dict.I, DS.data.I, non.NA.missing.codes=c(-9999, -4444))
# View out of range values:
details <- minmax_check(DD.dict.I, DS.data.I, non.NA.missing.codes=c(-9999, -4444))$Information
details[[1]]$OutOfRangeValues
# Example 1
# Fail check (incorrectly flagging NA value codes -9999
# and -4444 as outside of the min max range)
data(ExampleA)
minmax_check(DD.dict.A, DS.data.A)
# View out of range values:
details <- minmax_check(DD.dict.A, DS.data.A)$Information
details[[1]]$OutOfRangeValues
# Attempt 2, specifying -9999 and -4444 as missing value
# codes so check works correctly
minmax_check(DD.dict.A, DS.data.A, non.NA.missing.codes=c(-9999, -4444))

# Example 2
data(ExampleI)
minmax_check(DD.dict.I, DS.data.I, non.NA.missing.codes=c(-9999, -4444))
# View out of range values:
details <- minmax_check(DD.dict.I, DS.data.I, non.NA.missing.codes=c(-9999, -4444))$Information
details[[1]]$OutOfRangeValues

Miscellaneous Format Check

Description

This function checks miscellaneous dbGaP formatting requirements to ensure (1) no empty variable names; (2) no duplicate variable names; (3) variable names do not contain "dbgap"; (4) there are no duplicate column names in the dictionary; and (5) column names falling after VALUES column are unnamed.

Usage

misc_format_check(DD.dict, DS.data, verbose = TRUE)
misc_format_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as more detailed information about which formatting checks failed.

Details

Note that this check will return a WARNING for Check #5 depending on how the data set is read into R. Depending on the method used, R will automatically fill in column names after VALUES with "...col_number". This is allowed by the package, but it is NOT allowed by dbGaP, so please use caution if you write out a data set after making adjustments directly in R.

Value

Tibble, returned invisibly, containing: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of variables that fail one of these checks).

Examples

# Example 1: Fail check 
data(ExampleJ)
misc_format_check(DD.dict.J, DS.data.J)
print(misc_format_check(DD.dict.J, DS.data.J, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
misc_format_check(DD.dict.A, DS.data.A)
print(misc_format_check(DD.dict.A, DS.data.A, verbose=FALSE))
# Example 1: Fail check 
data(ExampleJ)
misc_format_check(DD.dict.J, DS.data.J)
print(misc_format_check(DD.dict.J, DS.data.J, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
misc_format_check(DD.dict.A, DS.data.A)
print(misc_format_check(DD.dict.A, DS.data.A, verbose=FALSE))

Missing Value Check

Description

This function flags variables that have non-encoded missing value codes.

Usage

missing_value_check(
  DD.dict,
  DS.data,
  verbose = TRUE,
  non.NA.missing.codes = NA
)
missing_value_check(
  DD.dict,
  DS.data,
  verbose = TRUE,
  non.NA.missing.codes = NA
)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as a list of variables that have non-encoded missing values.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (A list of variables where a missing value code is not properly encoded).

Examples

data(ExampleB)
missing_value_check(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999,-4444))

data(ExampleS)
missing_value_check(DD.dict.S, DS.data.S, non.NA.missing.codes = c(-9999,-4444))
data(ExampleB)
missing_value_check(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999,-4444))

data(ExampleS)
missing_value_check(DD.dict.S, DS.data.S, non.NA.missing.codes = c(-9999,-4444))

Missingness Summary

Description

This awareness function summarizes the amount of missingness in the data set.

Usage

missingness_summary(DS.data, non.NA.missing.codes = NA, threshold = 95)
missingness_summary(DS.data, non.NA.missing.codes = NA, threshold = 95)

Arguments

`DS.data`	Data set.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).
`threshold`	Threshold for missingness of concern (as a percent).

Value

Tibble containing: (1) Message containing information on the number of variables with a % missingness greater than the threshold; (2) Missingness by variable summary; and (3) Summary of missingness for variables with a missingness level greater than the threshold.

Examples

# Correct useage
data(ExampleA)
missingness_summary(DS.data.A, non.NA.missing.codes=c(-4444, -9999))
# Correct useage
data(ExampleA)
missingness_summary(DS.data.A, non.NA.missing.codes=c(-4444, -9999))

Min Max Required Pre-checks

Description

This function runs a workflow of the minimum number of checks required for a user to run minmax_check; the checks include pkg_field_check, dimension_check, and name_check.

Usage

mm_precheck(dict, data)
mm_precheck(dict, data)

Arguments

`dict`	Data dictionary.
`data`	Data set.

Value

Tibble containing the following information for each check: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).

Examples

data(ExampleB)
mm_precheck(DD.dict.B, DS.data.B)
data(ExampleB)
mm_precheck(DD.dict.B, DS.data.B)

Missing Values Required Pre-checks

Description

This function runs a workflow of the minimum number of checks required for a user to run missing_value_check; the checks include field_check and pkg_field_check.

Usage

mv_precheck(dict, data)
mv_precheck(dict, data)

Arguments

`dict`	Data dictionary.
`data`	Data set.

Value

Examples

data(ExampleB)
mv_precheck(DD.dict.B, DS.data.B)
data(ExampleB)
mv_precheck(DD.dict.B, DS.data.B)

Missing Value (NA) Check

Description

Checks for NA values in the data set; if NA values are present, also performs check for NA value=meaning.

Usage

NA_check(DD.dict, DS.data, verbose = TRUE)
NA_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as the number of NA values observed in the data set.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (the number of NA values in the data set and information on if NA is a properly encoded value).

Examples

# Example 1: Fail check
data(ExampleK)
NA_check(DD.dict.K, DS.data.K)
print(NA_check(DD.dict.K, DS.data.K, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
NA_check(DD.dict.A, DS.data.A)
print(NA_check(DD.dict.A, DS.data.A, verbose=FALSE))

# Example 3: Pass check (though missing_value_check detects a more specific error)
data(ExampleS)
NA_check(DD.dict.S, DS.data.S)
# Example 1: Fail check
data(ExampleK)
NA_check(DD.dict.K, DS.data.K)
print(NA_check(DD.dict.K, DS.data.K, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
NA_check(DD.dict.A, DS.data.A)
print(NA_check(DD.dict.A, DS.data.A, verbose=FALSE))

# Example 3: Pass check (though missing_value_check detects a more specific error)
data(ExampleS)
NA_check(DD.dict.S, DS.data.S)

Min Max Required Pre-checks

Description

This function runs a workflow of the minimum number of checks required for a user to run minmax_check; the checks include pkg_field_check, dimension_check, and name_check.

Usage

NA_precheck(dict, data)
NA_precheck(dict, data)

Arguments

`dict`	Data dictionary.
`data`	Data set.

Value

Examples

data(ExampleB)
NA_precheck(DD.dict.B, DS.data.B)
data(ExampleB)
NA_precheck(DD.dict.B, DS.data.B)

Name Check

Description

This function checks if the variable names match between the data dictionary and the data.

Usage

name_check(DD.dict, DS.data, verbose = TRUE)
name_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as a list of the non-matching variable names.

Value

Examples

# Example 1: Fail check (name mismatch)
data(ExampleM)
name_check(DD.dict.M, DS.data.M)
DS.data_updated <- name_correct(DD.dict.M, DS.data.M)
name_check(DD.dict.M, DS.data_updated)

# Example 2: Pass check
data(ExampleA)
name_check(DD.dict.A, DS.data.A)
print(name_check(DD.dict.A, DS.data.A, verbose=FALSE))
# Example 1: Fail check (name mismatch)
data(ExampleM)
name_check(DD.dict.M, DS.data.M)
DS.data_updated <- name_correct(DD.dict.M, DS.data.M)
name_check(DD.dict.M, DS.data_updated)

# Example 2: Pass check
data(ExampleA)
name_check(DD.dict.A, DS.data.A)
print(name_check(DD.dict.A, DS.data.A, verbose=FALSE))

Name Correction Utility Function

Description

This utility function updates the data set so variable names match those listed in the data dictionary.

Usage

name_correct(DD.dict, DS.data)
name_correct(DD.dict, DS.data)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.

Details

Recommend use with caution; perform name_check first.

Value

Updated data set with variables renamed to match the data dictionary.

Examples

data(ExampleM)
name_check(DD.dict.M, DS.data.M)
DS.data_updated <- name_correct(DD.dict.M, DS.data.M)
name_check(DD.dict.M, DS.data_updated)
data(ExampleM)
name_check(DD.dict.M, DS.data.M)
DS.data_updated <- name_correct(DD.dict.M, DS.data.M)
name_check(DD.dict.M, DS.data_updated)

Name Pre-checks

Description

This function runs a workflow of the minimum number of checks required for a user to run minmax_check; the checks include pkg_field_check, dimension_check, and name_check.

Usage

name_precheck(dict, data)
name_precheck(dict, data)

Arguments

`dict`	Data dictionary.
`data`	Data set.

Value

Examples

data(ExampleB)
name_precheck(DD.dict.B, DS.data.B)
data(ExampleB)
name_precheck(DD.dict.B, DS.data.B)

Package Required Field Check

Description

This function checks for additional fields required by this package including variable type (TYPE), minimum value (MIN), and maximum value (MAX).

Usage

pkg_field_check(DD.dict, DS.data, verbose = TRUE)
pkg_field_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as a list of the fields not found in the data dictionary.

Details

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Named vector of TRUE/FALSE values alerting user if checks passed (TRUE) or failed (FALSE) for TYPE, MIN, and MAX).

Examples

# Example 1: Fail check
data(ExampleD)
pkg_field_check(DD.dict.D, DS.data.D)
# Use the add_missing_fields function to add in data
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
# Be sure to call in the new version of the dictionary (DD.dict.updated)
pkg_field_check(DD.dict.updated, DS.data.D) 

# Example 2: Pass check
data(ExampleA)
pkg_field_check(DD.dict.A, DS.data.A)
print(pkg_field_check(DD.dict.A, DS.data.A, verbose=FALSE))
# Example 1: Fail check
data(ExampleD)
pkg_field_check(DD.dict.D, DS.data.D)
# Use the add_missing_fields function to add in data
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
# Be sure to call in the new version of the dictionary (DD.dict.updated)
pkg_field_check(DD.dict.updated, DS.data.D) 

# Example 2: Pass check
data(ExampleA)
pkg_field_check(DD.dict.A, DS.data.A)
print(pkg_field_check(DD.dict.A, DS.data.A, verbose=FALSE))

Reorder Data Set Utility Function

Description

This utility function reorders the data set to match the data dictionary.

Usage

reorder_data(DD.dict, DS.data)
reorder_data(DD.dict, DS.data)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.

Value

Updated data set with variables reordered to match the data dictionary.

Examples

data(ExampleN)
name_check(DD.dict.N, DS.data.N)
DS.data_updated <- reorder_data(DD.dict.N, DS.data.N)
name_check(DD.dict.N, DS.data_updated)
data(ExampleN)
name_check(DD.dict.N, DS.data.N)
DS.data_updated <- reorder_data(DD.dict.N, DS.data.N)
name_check(DD.dict.N, DS.data_updated)

Reorder Data Dictionary Utility Function

Description

This utility function reorders the data dictionary to match the data set.

Usage

reorder_dictionary(DD.dict, DS.data)
reorder_dictionary(DD.dict, DS.data)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.

Value

Updated data dictionary with variables reordered to match the data set.

Examples

data(ExampleN)
name_check(DD.dict.N, DS.data.N)
DD.dict_updated <- reorder_dictionary(DD.dict.N, DS.data.N)
name_check(DD.dict_updated, DS.data.N)
data(ExampleN)
name_check(DD.dict.N, DS.data.N)
DD.dict_updated <- reorder_dictionary(DD.dict.N, DS.data.N)
name_check(DD.dict_updated, DS.data.N)

Row Check

Description

This function checks for empty or duplicate rows in the data set and data dictionary.

Usage

row_check(DD.dict, DS.data, verbose = TRUE)
row_check(DD.dict, DS.data, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`verbose`	When TRUE, the function prints the Message out, as well as the row numbers of any problematic rows.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (A list of problematic row and participant ID numbers).

Examples

# Example 1: Fail check
data(ExampleK)
row_check(DD.dict.K, DS.data.K)
print(row_check(DD.dict.K, DS.data.K, verbose=FALSE))

# Example 2: Pass check
data(ExampleC)
row_check(DD.dict.C, DS.data.C)
print(row_check(DD.dict.C, DS.data.C, verbose=FALSE))
# Example 1: Fail check
data(ExampleK)
row_check(DD.dict.K, DS.data.K)
print(row_check(DD.dict.K, DS.data.K, verbose=FALSE))

# Example 2: Pass check
data(ExampleC)
row_check(DD.dict.C, DS.data.C)
print(row_check(DD.dict.C, DS.data.C, verbose=FALSE))

Truncated Field Check

Description

This function checks for dbGaP required fields variable name (VARNAME), and variable description (VARDESC) as a pre-check embedded in name_check.

Usage

short_field_check(DD.dict, verbose = TRUE)
short_field_check(DD.dict, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`verbose`	When TRUE, the function prints the Message out, as well as a list of the fields not found in the data dictionary.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Named vector of TRUE/FALSE values alerting user if checks passed (TRUE) or failed (FALSE) for VARNAME, VARDESC, UNITS, and VALUE).

Examples

data(ExampleA)
short_field_check(DD.dict.A)
data(ExampleA)
short_field_check(DD.dict.A)

Truncated Pre-check

Description

This function runs a workflow of the minimum number of checks required for a user to run dbGaPCheckup_required_field_check; the checks include dbGaP_required_field_check, dimension_check, and name_check.

Usage

short_precheck(dict, data)
short_precheck(dict, data)

Arguments

`dict`	Data dictionary.
`data`	Data set.

Value

Examples

data(ExampleB)
short_precheck(DD.dict.B, DS.data.B)
data(ExampleB)
short_precheck(DD.dict.B, DS.data.B)

Very Truncated Pre-check

Description

Usage

super_short_precheck(dict, data)
super_short_precheck(dict, data)

Arguments

`dict`	Data dictionary.
`data`	Data set.

Value

Examples

# Example 1: Pass check
data(ExampleB)
super_short_precheck(DD.dict.B, DS.data.B)
# Example 1: Pass check
data(ExampleB)
super_short_precheck(DD.dict.B, DS.data.B)

Type Check

Description

If a TYPE field exists, this function checks for any TYPE entries that aren't allowable per dbGaP instructions.

Usage

type_check(DD.dict, verbose = TRUE)
type_check(DD.dict, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`verbose`	When TRUE, the function prints the Message out, as well as more detailed diagnostic information.

Details

Allowable entries in TYPE column include: integer; decimal; encoded value; or string. For mixed values, list all types present using commas to separate (e.g., integer, encoded value).

Value

Examples

data(ExampleB)
type_check(DD.dict.B)
print(type_check(DD.dict.B, verbose=FALSE))
data(ExampleB)
type_check(DD.dict.B)
print(type_check(DD.dict.B, verbose=FALSE))

Value-Meaning Table

Description

This function generates a value-meaning table by parsing the VALUES fields.

Usage

value_meaning_table(DD.dict)
value_meaning_table(DD.dict)

Arguments

DD.dict

Data dictionary.

Value

A data frame with the columns VARNAME, TYPE, VALUE, MEANING.

Examples

data(ExampleB)
head(value_meaning_table(DD.dict.B))
data(ExampleB)
head(value_meaning_table(DD.dict.B))

Values Missing Table Awareness Function

Description

This function checks for consistent usage of encoded values and missing value codes between the data dictionary and the data itself.

Usage

value_missing_table(DD.dict, DS.data, non.NA.missing.codes = NA)
value_missing_table(DD.dict, DS.data, non.NA.missing.codes = NA)

Arguments

`DD.dict`	Data dictionary.
`DS.data`	Data set.
`non.NA.missing.codes`	A user-defined vector of numerical missing value codes (e.g., -9999).

Details

For each variable, we have three sets of possible values: the set D of all the unique values observed in the data, the set V of all the values explicitly encoded in the VALUES columns of the data dictionary, and the set M of the missing value codes defined by the user via the non.NA.missing.codes argument. This function examines various intersections of these three sets, providing awareness checks to the user about possible issues of concern. While ideally all defined values in set V should be observed in the data (e.g., in set D), it is not necessarily an error if one does not. This function checks for:

(A) In Set M and Not in Set D: If the user defines a missing value code that is not present in the data.

(B) In Set V and Not in Set D: If a VALUES entry defines an encoded code value, but that code value is not present in the data.

(D) M in Set D and Not in Set V: If a defined global missing value code is present in the data for a given variable, but that variable does not have a corresponding VALUES entry.

(E) (Set V values that are not in Set M) that are NOT in Set D = (Set V not in M) not in D: If a VALUES entry is not defined as a missing value code AND is not detected in the data.

Value

A list, returned invisibly,with two components:

"report"Tibble containing: (1) Name (Name of the function) and (2) Information (Details of all potential flagged variables).
"tb"Tibble with detailed information used to construct the Information.

Examples

data(ExampleB)
value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
print(value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999)))
results <- value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
results$report$Information$details
data(ExampleB)
value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
print(value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999)))
results <- value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
results$report$Information$details

Values Check

Description

This function checks for potential errors in the VALUES columns by ensuring (1) required format of VALUE=MEANING (e.g., 0=Yes or 1=No); (2) no leading/trailing spaces near the equals sign; (3) all variables of TYPE encoded have VALUES entries; and (4) all variables with VALUES entries are listed as TYPE encoded.

Usage

values_check(DD.dict, verbose = TRUE)
values_check(DD.dict, verbose = TRUE)

Arguments

`DD.dict`	Data dictionary.
`verbose`	When TRUE, the function prints the Message out, as well as a list of variables that fail one of the values checks.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Details of which checks passed/failed for which value=meaning instances).

Examples

# Example 1: Fail check
data(ExampleE)
values_check(DD.dict.E)
print(values_check(DD.dict.E, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
values_check(DD.dict.A)
print(values_check(DD.dict.A, verbose=FALSE))
# Example 1: Fail check
data(ExampleE)
values_check(DD.dict.E)
print(values_check(DD.dict.E, verbose=FALSE))

# Example 2: Pass check
data(ExampleA)
values_check(DD.dict.A)
print(values_check(DD.dict.A, verbose=FALSE))

Values Pre-Check

Description

This function runs a workflow of the minimum number of checks required for a user to run values_check; the checks include field_check, and type_check.

Usage

values_precheck(dict)
values_precheck(dict)

Arguments

dict

Data dictionary.

Value

Examples

data(ExampleB)
values_precheck(DD.dict.B)
data(ExampleB)
values_precheck(DD.dict.B)

Package 'dbGaPCheckup'

Help Index

Add Missing Fields

Description

Usage

Arguments

Details

Value

Examples

Check Report

Description

Usage

Arguments

Value

See Also

Examples

Complete Check

Description

Usage

Arguments

Value

See Also

Examples

Create Awareness Report

Description

Usage

Arguments

Value

See Also

Examples

Create Report

Description

Usage

Arguments

Value

Examples

Data Utility Function

Description

Usage

Arguments

Value

Data Selected Utility Function

Description

Usage

Arguments

Value

DD.dict.A

Description

Usage

See Also

DD.dict.B

Description

Usage

See Also

DD.dict.C

Description

Usage

See Also

DD.dict.D

Description

Usage

See Also

DD.dict.E

Description

Usage

See Also

DD.dict.F

Description

Usage

See Also

DD.dict.G

Description

Usage

See Also

DD.dict.H

Description

Usage

See Also

DD.dict.I

Description