id_make – idealstan

Create data to run IRT model

Description

To run an IRT model using idealstan, you must first process your data using the id_make function.

Usage

id_make(
  score_data = NULL,
  outcome_disc = "outcome_disc",
  outcome_cont = "outcome_cont",
  person_id = "person_id",
  item_id = "item_id",
  time_id = "time_id",
  group_id = "group_id",
  model_id = "model_id",
  ordered_id = "ordered_id",
  ignore_id = "ignore_id",
  simul_data = NULL,
  person_cov = NULL,
  item_cov = NULL,
  item_cov_miss = NULL,
  remove_cov_int = FALSE,
  unbounded = FALSE,
  exclude_level = NA,
  simulation = FALSE
)

Arguments

`score_data`	A data frame in long form, i.e., one row in the data for each measured score or vote in the data or a `rollcall` data object from package `pscl`.
`outcome_disc`	Column name of the outcome with discrete values in `score_data`, default is `“outcome_disc”`
`outcome_cont`	Column name of the outcome with discrete values in `score_data`, default is `“outcome_disc”`
`person_id`	Column name of the person/legislator ID index in `score_data`, default is `‘person_id’`. Should be integer, character or factor.
`item_id`	Column name of the item/bill ID index in `score_data`, default is `‘item_id’`. Should be integer, character or factor.
`time_id`	Column name of the time values in `score_data`: optional, default is `‘time_id’`. Should be a date or date-time class, but can be an integer (i.e., years in whole numbers).
`group_id`	Optional column name of a person/legislator group IDs (i.e., parties) in `score_data`. Optional, default is `‘group_id’`. Should be integer, character or factor.
`model_id`	Column name of the model/response types in the data. Default is `“model_id”`. Only necessary if a model with multiple response types (i.e., binary + continuous outcomes). Must be a column with a series of integers matching the model types in `id_estimate` showing which row of the data matches which outcome.
`ordered_id`	Column name of the variable showing the count of categories for ordinal/categorical items (must be at least 3 categories)
`ignore_id`	Optional column for identifying observations that should not be modeled (i.e., not just treated as missing, rather removed during estimation). Should be a binary vector (0 for remove and 1 for include). Useful for time-varying models where persons may not be present during particular periods and missing data is ignorable.
`simul_data`	Optionally, data that has been generated by the `id_sim_gen` function.
`person_cov`	A one-sided formula that specifies the covariates in `score_data` that will be used to hierarchically model the person/legislator ideal points
`item_cov`	A one-sided formula that specifies the covariates in `score_data` that will be used to hierarchically model the item/bill discrimination parameters for the regular model
`item_cov_miss`	A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model.
`remove_cov_int`	Whether to remove constituent terms from hierarchical covariates that interact covariates with IDs like `person_id` or `item_id`. Set to `TRUE` if including these constituent terms would cause multi-collinearity with other terms in the model (such as running a group-level model with a group-level interaction or a person-level model with a person-level interaction).
`unbounded`	Whether or not the outcome/response is unbounded (i.e., continuous or Poisson). If it is, missing value is recoded as the maximum of the outcome + 1.
`exclude_level`	A vector of any values that should be treated as `NA` in the response matrix. Unlike missing values, these values will be dropped from the data before estimation rather than modeled explicitly.
`simulation`	If `TRUE`, simulated values are saved in the `idealdata` object for later plotting with the `id_plot_sims` function

Details

This function accepts a long data frame where one row equals one item-person (bill-legislator) observation with associated continuous or discrete outcomes/responses. You either need to include columns with specific names as required by the id_make function such as person_id for person IDs and item_id for item IDs or specify the names of the columns containing the IDs to the id_make function for each column name (see examples). The only required columns are the item/bill ID and the person/legislator ID along with an outcome column, outcome_disc for discrete variables and outcome_cont for continuous variables. If both columns are included, then any value can be included for outcome_disc if there are values for outcome_cont and vice versa.

If items of multiple types are included, a column model_id must be included with the model type (see id_estimate function documentation for list of model IDs) for the response distribution, such as 1 for binary non-inflated, etc. If an ordinal outcome is included, an additional column ordered_id must be included that has the total count of categories for that ordinal variable (i.e., 3 for 3 categories).

For discrete data, it is recommended to include a numeric variable that starts at 0, such as values of 0 and 1 for binary data and 0,1,2 for ordinal/categorical data. For continuous (unbounded) data, it is recommended to standardize the outcome to improve model convergence and fit.

Missing data should be passed as NA values in either outcome_disc or outcome_cont and will be processed internally.

Value

A idealdata object that can then be used in the id_estimate function to fit a model.

Time-Varying Models

To run a time-varying model, you need to include the name of a column with dates (or integers) that is passed to the time_id option.

Continuous Outcomes

If the outcome is continuous, you need to pass a dataframe with one column named "outcome_disc" or pass the name of the column with the continuous data to the outcome_disc argument.

Hierarchical Covariates

Covariates can be fit on the person-level ideal point parameters as well as item discrimination parameters for either the inflated (missing) or non-inflated (observed) models. These covariates must be columns that were included with the data fed to the id_make function. The covariate relationships are specified as one-sided formulas, i.e. ~cov1 + cov2 + cov1cov2. To interact covariates with the person-level ideal points you can use ~cov1 + person_id + cov1person_id and for group-level ideal poins you can use ~cov1 + group_id + cov1*group_id where group_id or person_id is the same name as the name of the column for these options that you passed to id_make (i.e., the names of the columns in the original data). If you are also going to model these intercepts–i.e. you are interacting the covariate with person_id and the model is estimating ideal points at the person level–then set remove_cov_int to TRUE to avoid multicollinearity with the ideal point intercepts.

Examples

library(idealstan)

# You can either use a pscl rollcall object or a vote/score matrix 
# where persons/legislators are in the rows
# and items/bills are in the columns

library(dplyr)

# First, using a rollcall object with the 114th Senate's rollcall votes:

data('senate114')

to_idealstan <-   id_make(score_data = senate114,
               outcome_disc = 'cast_code',
               person_id = 'bioname',
               item_id = 'rollnumber',
               group_id= 'party_code',
               time_id='date')