library(idealstan)
# You can either use a pscl rollcall object or a vote/score matrix
# where persons/legislators are in the rows
# and items/bills are in the columns
library(dplyr)
# First, using a rollcall object with the 114th Senate's rollcall votes:
data('senate114')
<- id_make(score_data = senate114,
to_idealstan outcome_disc = 'cast_code',
person_id = 'bioname',
item_id = 'rollnumber',
group_id= 'party_code',
time_id='date')
Create data to run IRT model
Description
To run an IRT model using idealstan
, you must first process your data using the id_make
function.
Usage
id_make(
score_data = NULL,
outcome_disc = "outcome_disc",
outcome_cont = "outcome_cont",
person_id = "person_id",
item_id = "item_id",
time_id = "time_id",
group_id = "group_id",
model_id = "model_id",
ordered_id = "ordered_id",
ignore_id = "ignore_id",
simul_data = NULL,
person_cov = NULL,
item_cov = NULL,
item_cov_miss = NULL,
remove_cov_int = FALSE,
unbounded = FALSE,
exclude_level = NA,
simulation = FALSE
)
Arguments
score_data
|
A data frame in long form, i.e., one row in the data for each measured score or vote in the data or a rollcall data object from package pscl .
|
outcome_disc
|
Column name of the outcome with discrete values in score_data , default is “outcome_disc”
|
outcome_cont
|
Column name of the outcome with discrete values in score_data , default is “outcome_disc”
|
person_id
|
Column name of the person/legislator ID index in score_data , default is ‘person_id’ . Should be integer, character or factor.
|
item_id
|
Column name of the item/bill ID index in score_data , default is ‘item_id’ . Should be integer, character or factor.
|
time_id
|
Column name of the time values in score_data : optional, default is ‘time_id’ . Should be a date or date-time class, but can be an integer (i.e., years in whole numbers).
|
group_id
|
Optional column name of a person/legislator group IDs (i.e., parties) in score_data . Optional, default is ‘group_id’ . Should be integer, character or factor.
|
model_id
|
Column name of the model/response types in the data. Default is “model_id” . Only necessary if a model with multiple response types (i.e., binary + continuous outcomes). Must be a column with a series of integers matching the model types in id_estimate showing which row of the data matches which outcome.
|
ordered_id
|
Column name of the variable showing the count of categories for ordinal/categorical items (must be at least 3 categories) |
ignore_id
|
Optional column for identifying observations that should not be modeled (i.e., not just treated as missing, rather removed during estimation). Should be a binary vector (0 for remove and 1 for include). Useful for time-varying models where persons may not be present during particular periods and missing data is ignorable. |
simul_data
|
Optionally, data that has been generated by the id_sim_gen function.
|
person_cov
|
A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the person/legislator ideal points
|
item_cov
|
A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the item/bill discrimination parameters for the regular model
|
item_cov_miss
|
A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model. |
remove_cov_int
|
Whether to remove constituent terms from hierarchical covariates that interact covariates with IDs like person_id or item_id . Set to TRUE if including these constituent terms would cause multi-collinearity with other terms in the model (such as running a group-level model with a group-level interaction or a person-level model with a person-level interaction).
|
unbounded
|
Whether or not the outcome/response is unbounded (i.e., continuous or Poisson). If it is, missing value is recoded as the maximum of the outcome + 1. |
exclude_level
|
A vector of any values that should be treated as NA in the response matrix. Unlike missing values, these values will be dropped from the data before estimation rather than modeled explicitly.
|
simulation
|
If TRUE , simulated values are saved in the idealdata object for later plotting with the id_plot_sims function
|
Details
This function accepts a long data frame where one row equals one item-person (bill-legislator) observation with associated continuous or discrete outcomes/responses. You either need to include columns with specific names as required by the id_make
function such as person_id
for person IDs and item_id
for item IDs or specify the names of the columns containing the IDs to the id_make
function for each column name (see examples). The only required columns are the item/bill ID and the person/legislator ID along with an outcome column, outcome_disc
for discrete variables and outcome_cont
for continuous variables. If both columns are included, then any value can be included for outcome_disc
if there are values for outcome_cont
and vice versa.
If items of multiple types are included, a column model_id
must be included with the model type (see id_estimate
function documentation for list of model IDs) for the response distribution, such as 1 for binary non-inflated, etc. If an ordinal outcome is included, an additional column ordered_id
must be included that has the total count of categories for that ordinal variable (i.e., 3 for 3 categories).
For discrete data, it is recommended to include a numeric variable that starts at 0, such as values of 0 and 1 for binary data and 0,1,2 for ordinal/categorical data. For continuous (unbounded) data, it is recommended to standardize the outcome to improve model convergence and fit.
Missing data should be passed as NA
values in either outcome_disc
or outcome_cont
and will be processed internally.
Value
A idealdata
object that can then be used in the id_estimate
function to fit a model.
Time-Varying Models
To run a time-varying model, you need to include the name of a column with dates (or integers) that is passed to the time_id
option.
Continuous Outcomes
If the outcome is continuous, you need to pass a dataframe with one column named "outcome_disc" or pass the name of the column with the continuous data to the outcome_disc
argument.
Hierarchical Covariates
Covariates can be fit on the person-level ideal point parameters as well as item discrimination parameters for either the inflated (missing) or non-inflated (observed) models. These covariates must be columns that were included with the data fed to the id_make
function. The covariate relationships are specified as one-sided formulas, i.e. ~cov1 + cov2 + cov1cov2
. To interact covariates with the person-level ideal points you can use ~cov1 + person_id + cov1
person_id
and for group-level ideal poins you can use ~cov1 + group_id + cov1*group_id
where group_id
or person_id
is the same name as the name of the column for these options that you passed to id_make
(i.e., the names of the columns in the original data). If you are also going to model these intercepts–i.e. you are interacting the covariate with person_id
and the model is estimating ideal points at the person level–then set remove_cov_int
to TRUE to avoid multicollinearity with the ideal point intercepts.