Fit Ordered Beta Regression Model

Description

This function allows you to estimate an ordered beta regression model via a formula syntax.

Usage

ordbetareg(
  formula = NULL,
  data = NULL,
  true_bounds = NULL,
  phi_reg = "none",
  use_brm_multiple = FALSE,
  coef_prior_mean = 0,
  coef_prior_SD = 5,
  intercept_prior_mean = NULL,
  intercept_prior_SD = NULL,
  phi_prior = 0.1,
  dirichlet_prior = c(1, 1, 1),
  phi_coef_prior_mean = 0,
  phi_coef_prior_SD = 5,
  phi_intercept_prior_mean = NULL,
  phi_intercept_prior_SD = NULL,
  extra_prior = NULL,
  manual_prior = NULL,
  init = "0",
  return_stancode = FALSE,
  ...
)

Arguments

formula Either an R formula in the form response/DV ~ var1 + var2 etc. or formula object as created/called by the brms brms::bf function.
data An R data frame or tibble containing the variables in the formula
true_bounds If the true bounds of the outcome/response don’t exist in the data, pass a length 2 numeric vector of the minimum and maximum bounds to properly normalize the outcome/response
phi_reg Whether you are including a linear model predicting the dispersion parameter, phi, and/or for the response. If you are including models for both, pass option ‘both’. If you only have an intercept for the outcome (i.e. a 1 in place of covariates), pass ‘only’. If you only have intercepts for phi (such as a varying intercepts/random effects) model, pass the value "intercepts". To set priors on these intercepts, use the extra-prior option with the brms::set_prior function (class="sd"). If no model of any kind for phi, the default, pass ‘none’.
use_brm_multiple (T/F) Whether the model should use brms::brm_multiple for multiple imputation over multiple dataframes passed as a list to the data argument
coef_prior_mean The mean of the Normal distribution prior on the regression coefficients (for predicting the mean of the response). Default is 0.
coef_prior_SD The SD of the Normal distribution prior on the regression coefficients (for predicting the mean of the response). Default is 5, which makes the prior weakly informative on the logit scale.
intercept_prior_mean The mean of the Normal distribution prior for the intercept. By default is NULL, which means the intercept receives the same prior as coef_prior_mean. To zero out the intercept, set this parameter to 0 and coef_prior_SD to a very small number (0.01 or smaller). NOTE: the default intercept in brms is centered (mean-subtracted) by default. To use a traditional intercept, either add 0 + Intercept to the formula or specify center=FALSE in the bf formula function for brms. See brms::brmsformula() for more info.
intercept_prior_SD The SD of the Normal distribution prior for the intercept. By default is NULL, which means the intercept receives the same prior SD as coef_prior_SD.
phi_prior The mean parameter of the exponential prior on phi, which determines the dispersion of the beta distribution. The default is .1, which equals a mean of 10 and is thus weakly informative on the interval (0.4, 30). If the response has very low variance (i.e. tightly) clusters around a specific value, then decreasing this prior (and increasing the expected value) may be helpful. Checking the value of phi in the output of the model command will reveal if a value of 0.1 (mean of 10) is too small.
dirichlet_prior A vector of three integers corresponding to the prior parameters for the dirchlet distribution (alpha parameter) governing the location of the cutpoints between the components of the response (continuous vs. degenerate). The default is 1 which puts equal probability on degenerate versus continuous responses. Likely only needs to be changed in a repeated sampling situation to stabilize the cutpoint locations across samples.
phi_coef_prior_mean The mean of the Normal distribution prior on the regression coefficients for predicting phi, the dispersion parameter. Only useful if a linear model is being fit to phi. Default is 0.
phi_coef_prior_SD The SD of the Normal distribution prior on the regression coefficients for predicting phi, the dispersion parameter. Only useful if a linear model is being fit to phi. Default is 5, which makes the prior weakly informative on the exponential scale.
phi_intercept_prior_mean The mean of the Normal distribution prior for the phi (dispersion) regression intercept. By default is NULL, which means the intercept receives the same prior as phi_coef_prior_mean. To zero out the intercept, set this parameter to 0 and phi_coef_prior_SD to a very small number (0.01 or smaller).
phi_intercept_prior_SD The SD of the Normal distribution prior for the phi (dispersion) regression intercept. By default is NULL, which means the intercept receives the same prior SD as phi_coef_prior_SD.
extra_prior An additional prior, such as a prior for a specific regression coefficient, added to the outcome regression by passing one of the brms functions brms::set_prior or brms::prior_string with appropriate values.
manual_prior If you want to set your own custom priors with brms, use this option to pass any valid brms priors such as those created with brms::set_prior or brms::prior_string. Note that this option replaces any other priors set. Useful especially when doing something unorthodox like modeling cutpoints.
init This parameter is used to determine starting values for the Stan sampler to begin Markov Chain Monte Carlo sampling. It is set by default at 0 because the non-linear nature of beta regression means that it is possible to begin with extreme values depending on the scale of the covariates. Setting this to 0 helps the sampler find starting values. It does, on the other hand, limit the ability to detect convergence issues with Rhat statistics. If that is a concern, such as with an experimental feature of brms, set this to “random” to get more robust starting values (just be sure to scale the covariates so they are not too large in absolute size).
return_stancode If TRUE, will pass back the only the Stan code for the model as a character vector rather than fitting the model.
All other arguments passed on to the brm function

Details

This function is a wrapper around the brms::brm function, which is a powerful Bayesian regression modeling engine using Stan. To fully explore the options available, including dynamic and hierarchical modeling, please see the documentation for the brm function above. As the ordered beta regression model is currently not available in brms natively, this modeling function allows a brms model to be fit with the ordered beta regression distribution.

For more information about the model, see the paper here: https://osf.io/preprints/socarxiv/2sx6y/.

This function allows you to set priors on the dispersion parameter, the cutpoints, and the regression coefficients (see below for options). However, to add specific priors on individual covariates, you would need to use the brms::set_prior function by specifying an individual covariate (see function documentation) and passing the result of the function call to the extra_prior argument.

This function will also automatically normalize the outcome so that it lies in the [0,1] interval, as required by beta regression. For furthur information, see the documentation for the normalize function.

Priors can be set on a variety of coefficients in the model, see the description of parameters coef_prior_mean and intercept_prior_mean, in addition to setting a custom prior with the extra_prior option. When setting priors on intercepts, it is important to note that by default, all intercepts in brms are centered (the means are subtracted from the data). As a result, a prior set on the default intercept will have a different interpretation than a traditional intercept (i.e. the value of the outcome when the covariates are all zero). To change this setting, use the brms::bf() function as a wrapper around the formula with the option center=FALSE to set priors on a traditional non-centered intercept.

Note that while brms also supports adding 0 + Intercept to the formula to address this issue, ordbetareg does not support this syntax. Instead, use center=FALSE as an option to brms::bf().

Value

A brms object fitted with the ordered beta regression distribution.

Examples

library(ordbetareg)

# load survey data that comes with the package

library(dplyr)
data("pew")

# prepare data

model_data <- select(pew,therm,
             education="F_EDUCCAT2_FINAL",
             region="F_CREGION_FINAL",
             income="F_INCOME_FINAL")

# It takes a while to fit the models. Run the code
# below if you want to load a saved fitted model from the
# package, otherwise use the model-fitting code

data("ord_fit_mean")

  
  # fit the actual model

  if(.Platform$OS.type!="windows") {

    ord_fit_mean <- ordbetareg(formula=therm ~ education + income +
      (1|region),
      data=model_data,
      cores=2,chains=2)

  }
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.3)’
using SDK: ‘’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
# access values of the coefficients

summary(ord_fit_mean)
 Family: ord_beta_reg 
  Links: mu = identity; phi = identity; cutzero = identity; cutone = identity 
Formula: therm ~ education + income + (1 | region) 
   Data: data (Number of observations: 2525) 
  Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 2000

Multilevel Hyperparameters:
~region (Number of levels: 4) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.12      0.11     0.01     0.42 1.05       64       30

Regression Coefficients:
                                      Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept                                 0.31      0.18    -0.06     0.62 1.05
educationHighschoolgraduate              -0.20      0.13    -0.46     0.06 1.02
educationSomecollegenodegree             -0.07      0.13    -0.33     0.17 1.03
educationAssociate’sdegree               -0.05      0.14    -0.31     0.22 1.01
educationCollegegraduateDsomepostgrad     0.23      0.13    -0.04     0.48 1.02
educationPostgraduate                     0.52      0.13     0.26     0.78 1.02
educationDon’tknowDRefused               -0.20      0.41    -1.00     0.58 1.01
income10tounder$20000                    -0.08      0.13    -0.34     0.15 1.02
income20tounder$30000                     0.10      0.13    -0.16     0.32 1.01
income30tounder$40000                    -0.07      0.12    -0.32     0.15 1.00
income40tounder$50000                    -0.13      0.13    -0.37     0.11 1.01
income50tounder$75000                    -0.15      0.11    -0.38     0.06 1.01
income75tounder$100000                   -0.25      0.12    -0.49    -0.05 1.02
income100tounder$150000OR                -0.28      0.11    -0.51    -0.05 1.01
income$150000ormore                      -0.25      0.12    -0.50    -0.03 1.01
incomeVOLDontknowDRefused                -0.47      0.15    -0.77    -0.20 1.02
                                      Bulk_ESS Tail_ESS
Intercept                                   41       16
educationHighschoolgraduate                679      959
educationSomecollegenodegree               678      920
educationAssociate’sdegree                 454      939
educationCollegegraduateDsomepostgrad      627      860
educationPostgraduate                      592      904
educationDon’tknowDRefused                 977     1155
income10tounder$20000                      172      607
income20tounder$30000                      166      943
income30tounder$40000                      398      568
income40tounder$50000                      196      789
income50tounder$75000                      337      791
income75tounder$100000                     144      208
income100tounder$150000OR                  492      567
income$150000ormore                        151      421
incomeVOLDontknowDRefused                  130      626

Further Distributional Parameters:
        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi         2.96      0.08     2.81     3.12 1.01     1143      653
cutzero    -2.74      0.10    -2.93    -2.54 1.02      133      481
cutone      1.68      0.02     1.64     1.72 1.02      140      960

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).