Create data to run IRT model

Description

To run an IRT model using idealstan, you must first process your data using the id_make function.

Usage

id_make(
  score_data = NULL,
  outcome_disc = "outcome_disc",
  outcome_cont = "outcome_cont",
  person_id = "person_id",
  item_id = "item_id",
  time_id = "time_id",
  group_id = "group_id",
  model_id = "model_id",
  ordered_id = "ordered_id",
  ignore_id = "ignore_id",
  simul_data = NULL,
  person_cov = NULL,
  item_cov = NULL,
  item_cov_miss = NULL,
  remove_cov_int = FALSE,
  unbounded = FALSE,
  exclude_level = NA,
  simulation = FALSE
)

Arguments

score_data A data frame in long form, i.e., one row in the data for each measured score or vote in the data or a rollcall data object from package pscl.
outcome_disc Column name of the outcome with discrete values in score_data, default is “outcome_disc”
outcome_cont Column name of the outcome with discrete values in score_data, default is “outcome_disc”
person_id Column name of the person/legislator ID index in score_data, default is ‘person_id’. Should be integer, character or factor.
item_id Column name of the item/bill ID index in score_data, default is ‘item_id’. Should be integer, character or factor.
time_id Column name of the time values in score_data: optional, default is ‘time_id’. Should be a date or date-time class, but can be an integer (i.e., years in whole numbers).
group_id Optional column name of a person/legislator group IDs (i.e., parties) in score_data. Optional, default is ‘group_id’. Should be integer, character or factor.
model_id Column name of the model/response types in the data. Default is “model_id”. Only necessary if a model with multiple response types (i.e., binary + continuous outcomes). Must be a column with a series of integers matching the model types in id_estimate showing which row of the data matches which outcome.
ordered_id Column name of the variable showing the count of categories for ordinal/categorical items (must be at least 3 categories)
ignore_id Optional column for identifying observations that should not be modeled (i.e., not just treated as missing, rather removed during estimation). Should be a binary vector (0 for remove and 1 for include). Useful for time-varying models where persons may not be present during particular periods and missing data is ignorable.
simul_data Optionally, data that has been generated by the id_sim_gen function.
person_cov A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the person/legislator ideal points
item_cov A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the item/bill discrimination parameters for the regular model
item_cov_miss A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model.
remove_cov_int Whether to remove constituent terms from hierarchical covariates that interact covariates with IDs like person_id or item_id. Set to TRUE if including these constituent terms would cause multi-collinearity with other terms in the model (such as running a group-level model with a group-level interaction or a person-level model with a person-level interaction).
unbounded Whether or not the outcome/response is unbounded (i.e., continuous or Poisson). If it is, missing value is recoded as the maximum of the outcome + 1.
exclude_level A vector of any values that should be treated as NA in the response matrix. Unlike missing values, these values will be dropped from the data before estimation rather than modeled explicitly.
simulation If TRUE, simulated values are saved in the idealdata object for later plotting with the id_plot_sims function

Details

This function accepts a long data frame where one row equals one item-person (bill-legislator) observation with associated continuous or discrete outcomes/responses. You either need to include columns with specific names as required by the id_make function such as person_id for person IDs and item_id for item IDs or specify the names of the columns containing the IDs to the id_make function for each column name (see examples). The only required columns are the item/bill ID and the person/legislator ID along with an outcome column, outcome_disc for discrete variables and outcome_cont for continuous variables. If both columns are included, then any value can be included for outcome_disc if there are values for outcome_cont and vice versa.

If items of multiple types are included, a column model_id must be included with the model type (see id_estimate function documentation for list of model IDs) for the response distribution, such as 1 for binary non-inflated, etc. If an ordinal outcome is included, an additional column ordered_id must be included that has the total count of categories for that ordinal variable (i.e., 3 for 3 categories).

For discrete data, it is recommended to include a numeric variable that starts at 0, such as values of 0 and 1 for binary data and 0,1,2 for ordinal/categorical data. For continuous (unbounded) data, it is recommended to standardize the outcome to improve model convergence and fit.

Missing data should be passed as NA values in either outcome_disc or outcome_cont and will be processed internally.

Value

A idealdata object that can then be used in the id_estimate function to fit a model.

Time-Varying Models

To run a time-varying model, you need to include the name of a column with dates (or integers) that is passed to the time_id option.

Continuous Outcomes

If the outcome is continuous, you need to pass a dataframe with one column named "outcome_disc" or pass the name of the column with the continuous data to the outcome_disc argument.

Hierarchical Covariates

Covariates can be fit on the person-level ideal point parameters as well as item discrimination parameters for either the inflated (missing) or non-inflated (observed) models. These covariates must be columns that were included with the data fed to the id_make function. The covariate relationships are specified as one-sided formulas, i.e. ~cov1 + cov2 + cov1cov2. To interact covariates with the person-level ideal points you can use ~cov1 + person_id + cov1person_id and for group-level ideal poins you can use ~cov1 + group_id + cov1*group_id where group_id or person_id is the same name as the name of the column for these options that you passed to id_make (i.e., the names of the columns in the original data). If you are also going to model these intercepts–i.e. you are interacting the covariate with person_id and the model is estimating ideal points at the person level–then set remove_cov_int to TRUE to avoid multicollinearity with the ideal point intercepts.

Examples

library(idealstan)

# You can either use a pscl rollcall object or a vote/score matrix 
# where persons/legislators are in the rows
# and items/bills are in the columns

library(dplyr)

# First, using a rollcall object with the 114th Senate's rollcall votes:

data('senate114')

to_idealstan <-   id_make(score_data = senate114,
               outcome_disc = 'cast_code',
               person_id = 'bioname',
               item_id = 'rollnumber',
               group_id= 'party_code',
               time_id='date')