1.Usefulness of the DETECT program for assessing the internal structure of dimensionality in simulated data and results of the Korean nursing licensing examination.
Dong Gi SEO ; Younyoung CHOI ; Sun HUH
Journal of Educational Evaluation for Health Professions 2017;14(1):32-
PURPOSE: The dimensionality of examinations provides empirical evidence of the internal test structure underlying the responses to a set of items. In turn, the internal structure is an important piece of evidence of the validity of an examination. Thus, the aim of this study was to investigate the performance of the DETECT program and to use it to examine the internal structure of the Korean nursing licensing examination. METHODS: Non-parametric methods of dimensional testing, such as the DETECT program, have been proposed as ways of overcoming the limitations of traditional parametric methods. A non-parametric method (the DETECT program) was investigated using simulation data under several conditions and applied to the Korean nursing licensing examination. RESULTS: The DETECT program performed well in terms of determining the number of underlying dimensions under several different conditions in the simulated data. Further, the DETECT program correctly revealed the internal structure of the Korean nursing licensing examination, meaning that it detected the proper number of dimensions and appropriately clustered the items within each dimension. CONCLUSION: The DETECT program performed well in detecting the number of dimensions and in assigning items for each dimension. This result implies that the DETECT method can be useful for examining the internal structure of assessments, such as licensing examinations, that possess relatively many domains and content areas.
Korea
;
Licensure*
;
Methods
;
Nursing*
2.Overview and current management of computerized adaptive testing in licensing/certification examinations.
Journal of Educational Evaluation for Health Professions 2017;14(1):17-
Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees' ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.
Animals
;
Cats
;
Certification
;
Emergency Medical Technicians
;
Humans
;
Licensure
;
Psychometrics
;
United States
3.Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination
Journal of Educational Evaluation for Health Professions 2018;15(1):14-
PURPOSE: Computerized adaptive testing (CAT) has been adopted in licensing examinations because it improves the efficiency and accuracy of the tests, as shown in many studies. This simulation study investigated CAT scoring and item selection methods for the Korean Medical Licensing Examination (KMLE). METHODS: This study used a post-hoc (real data) simulation design. The item bank used in this study included all items from the January 2017 KMLE. All CAT algorithms for this study were implemented using the ‘catR’ package in the R program. RESULTS: In terms of accuracy, the Rasch and 2-parametric logistic (PL) models performed better than the 3PL model. The ‘modal a posteriori’ and ‘expected a posterior’ methods provided more accurate estimates than maximum likelihood estimation or weighted likelihood estimation. Furthermore, maximum posterior weighted information and minimum expected posterior variance performed better than other item selection methods. In terms of efficiency, the Rasch model is recommended to reduce test length. CONCLUSION: Before implementing live CAT, a simulation study should be performed under varied test conditions. Based on a simulation study, and based on the results, specific scoring and item selection methods should be predetermined.
Animals
;
Cats
;
Korea
;
Licensure
;
Logistic Models
;
Research Design
4.Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination
Journal of Educational Evaluation for Health Professions 2018;15():14-
PURPOSE:
Computerized adaptive testing (CAT) has been adopted in licensing examinations because it improves the efficiency and accuracy of the tests, as shown in many studies. This simulation study investigated CAT scoring and item selection methods for the Korean Medical Licensing Examination (KMLE).
METHODS:
This study used a post-hoc (real data) simulation design. The item bank used in this study included all items from the January 2017 KMLE. All CAT algorithms for this study were implemented using the ‘catR’ package in the R program.
RESULTS:
In terms of accuracy, the Rasch and 2-parametric logistic (PL) models performed better than the 3PL model. The ‘modal a posteriori’ and ‘expected a posterior’ methods provided more accurate estimates than maximum likelihood estimation or weighted likelihood estimation. Furthermore, maximum posterior weighted information and minimum expected posterior variance performed better than other item selection methods. In terms of efficiency, the Rasch model is recommended to reduce test length.
CONCLUSION
Before implementing live CAT, a simulation study should be performed under varied test conditions. Based on a simulation study, and based on the results, specific scoring and item selection methods should be predetermined.
5.The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examination
Journal of Educational Evaluation for Health Professions 2021;18(1):15-
Purpose:
Diagnostic classification models (DCMs) were developed to identify the mastery or non-mastery of the attributes required for solving test items, but their application has been limited to very low-level attributes, and the accuracy and consistency of high-level attributes using DCMs have rarely been reported compared with classical test theory (CTT) and item response theory models. This paper compared the accuracy of high-level attribute mastery between deterministic inputs, noisy “and” gate (DINA) and Rasch models, along with sub-scores based on CTT.
Methods:
First, a simulation study explored the effects of attribute length (number of items per attribute) and the correlations among attributes with respect to the accuracy of mastery. Second, a real-data study examined model and item fit and investigated the consistency of mastery for each attribute among the 3 models using the 2017 Korean Medical Licensing Examination with 360 items.
Results:
Accuracy of mastery increased with a higher number of items measuring each attribute across all conditions. The DINA model was more accurate than the CTT and Rasch models for attributes with high correlations (>0.5) and few items. In the real-data analysis, the DINA and Rasch models generally showed better item fits and appropriate model fit. The consistency of mastery between the Rasch and DINA models ranged from 0.541 to 0.633 and the correlations of person attribute scores between the Rasch and DINA models ranged from 0.579 to 0.786.
Conclusion
Although all 3 models provide a mastery decision for each examinee, the individual mastery profile using the DINA model provides more accurate decisions for attributes with high correlations than the CTT and Rasch models. The DINA model can also be directly applied to tests with complex structures, unlike the CTT and Rasch models, and it provides different diagnostic information from the CTT and Rasch models.
6.The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examination
Journal of Educational Evaluation for Health Professions 2021;18(1):15-
Purpose:
Diagnostic classification models (DCMs) were developed to identify the mastery or non-mastery of the attributes required for solving test items, but their application has been limited to very low-level attributes, and the accuracy and consistency of high-level attributes using DCMs have rarely been reported compared with classical test theory (CTT) and item response theory models. This paper compared the accuracy of high-level attribute mastery between deterministic inputs, noisy “and” gate (DINA) and Rasch models, along with sub-scores based on CTT.
Methods:
First, a simulation study explored the effects of attribute length (number of items per attribute) and the correlations among attributes with respect to the accuracy of mastery. Second, a real-data study examined model and item fit and investigated the consistency of mastery for each attribute among the 3 models using the 2017 Korean Medical Licensing Examination with 360 items.
Results:
Accuracy of mastery increased with a higher number of items measuring each attribute across all conditions. The DINA model was more accurate than the CTT and Rasch models for attributes with high correlations (>0.5) and few items. In the real-data analysis, the DINA and Rasch models generally showed better item fits and appropriate model fit. The consistency of mastery between the Rasch and DINA models ranged from 0.541 to 0.633 and the correlations of person attribute scores between the Rasch and DINA models ranged from 0.579 to 0.786.
Conclusion
Although all 3 models provide a mastery decision for each examinee, the individual mastery profile using the DINA model provides more accurate decisions for attributes with high correlations than the CTT and Rasch models. The DINA model can also be directly applied to tests with complex structures, unlike the CTT and Rasch models, and it provides different diagnostic information from the CTT and Rasch models.
7.Validation of the Short Form of Korean-Everyday Cognition (K-ECog)
Minji SONG ; Dong Gi SEO ; Seong Yoon KIM ; Yeonwook KANG
Journal of Korean Medical Science 2023;38(44):e370-
Background:
Evaluating the activities of daily living (ADL) is an important factor for diagnosing dementia. The Everyday Cognition (ECog) scale was developed to measure ADL changes that were correlated with specific neuropsychological impairments. A short form of the ECog (ECog-12) was also developed, consisting of 12 items, two from each of the six cognitive domains of the ECog. The Korean full version of ECog (K-ECog) has recently been standardized, but the need for a shortened version has been raised in clinical practice. The purpose of this study was to develop a Korean version of ECog-12 (K-ECog-12) and to verify its reliability and validity by comparing those to the full version of K-ECog.
Methods:
The participants were 267 cognitively normal older adults (CN), 183 patients with mild cognitive impairment (MCI), and 89 patients with dementia. The Korean-Mini Mental State Examination (K-MMSE), Korean-Montreal Cognitive Assessment (K-MoCA), and Short form of Geriatric Depression Scale (SGDS) were administered to all participants. The K-ECog and Korean-Instrumental Activities of Daily Living (K-IADL) were rated by the informants of patients.
Results:
K-ECog-12 was newly constructed by replacing one item for the visuospatial function in the original ECog-12 with another one through an item response theory analysis on Korean data. The internal consistencies (Cronbach’s α) of K-ECog-12 and K-ECog were 0.95 and 0.99, respectively. The test–retest reliabilities (Pearson’s r) were 0.67 for K-ECog-12 and 0.73 for K-ECog. The K-ECog-12 was significantly correlated with K-ECog as well as K-IADL, K-MMSE, and K-MoCA. The K-ECog-12 scores differed significantly between the CN, MCI, and dementia groups, as did the K-ECog scores. Receiver operating characteristic curve analyses showed that K-ECog-12, like K-ECog, could differentiate MCI and dementia patients from CN as well.
Conclusion
The K-ECog-12 is as reliable and valid as the K-ECog in assessing ADL.Therefore, K-ECog-12 can be used as an alternative to the K-ECog in clinical and community settings in Korea.
8.Estimation of item parameters and examinees’ mastery probability in each domain of the Korean Medical Licensing Examination using a deterministic inputs, noisy “and” gate (DINA) model
Journal of Educational Evaluation for Health Professions 2020;17(1):35-
Purpose:
The deterministic inputs, noisy “and” gate (DINA) model is a promising statistical method for providing useful diagnostic information about students’ level of achievement, as educators often want to receive diagnostic information on how examinees did on each content strand, which is referred to as a diagnostic profile. The purpose of this paper was to classify examinees of the Korean Medical Licensing Examination (KMLE) in different content domains using the DINA model.
Methods:
This paper analyzed data from the KMLE, with 360 items and 3,259 examinees. An application study was conducted to estimate examinees’ parameters and item characteristics. The guessing and slipping parameters of each item were estimated, and statistical analysis was conducted using the DINA model.
Results:
The output table shows examples of some items that can be used to check item quality. The probabilities of mastery of each content domain were also estimated, indicating the mastery profile of each examinee. The classification accuracy and consistency for 8 content domains ranged from 0.849 to 0.972 and from 0.839 to 0.994, respectively. As a result, the classification reliability of the cognitive diagnosis model was very high for the 8 content domains of the KMLE.
Conclusion
This mastery profile can provide useful diagnostic information for each examinee in terms of each content domain of the KMLE. Individual mastery profiles allow educators and examinees to understand which domain(s) should be improved in order to master all domains in the KMLE. In addition, all items showed reasonable results in terms of item parameters.
9.Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
Dong Gi SEO ; Myeong Gi KIM ; Na Hui KIM ; Hye Sook SHIN ; Hyun Jung KIM
Journal of Educational Evaluation for Health Professions 2018;15(1):26-
PURPOSE: This study aimed to identify the best way of developing equivalent item sets and to propose a stable and effective management plan for periodical licensing examinations. METHODS: Five pre-equated item sets were developed based on the predicted correct answer rate of each item using linear programming. These pre-equated item sets were compared to the ones that were developed with a random item selection method based on the actual correct answer rate (ACAR) and difficulty from item response theory (IRT). The results with and without common items were also compared in the same way. ACAR and the IRT difficulty were used to determine whether there was a significant difference between the pre-equating conditions. RESULTS: There was a statistically significant difference in IRT difficulty among the results from different pre-equated conditions. The predicted correct answer rate was divided using 2 or 3 difficulty categories, and the ACAR and IRT difficulty parameters of the 5 item sets were equally constructed. Comparing the item set conditions with and without common items, including common items did not make a significant contribution to the equating of the 5 item sets. CONCLUSION: This study suggested that the linear programming method is applicable to construct equated-item sets that reflect each content area. The suggested best method to construct equated item sets is to divide the predicted correct answer rate using 2 or 3 difficulty categories, regardless of common items. If pre-equated item sets are required to construct a test based on the actual data, several methods should be considered by simulation studies to determine which is optimal before administering a real test.
Licensure
;
Methods
;
Programming, Linear
10.Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
Dong Gi SEO ; Myeong Gi KIM ; Na Hui KIM ; Hye Sook SHIN ; Hyun Jung KIM
Journal of Educational Evaluation for Health Professions 2018;15():26-
PURPOSE:
This study aimed to identify the best way of developing equivalent item sets and to propose a stable and effective managementplan for periodical licensing examinations.
METHODS:
Five pre-equated item sets were developed based on the predicted correct answer rate of each item using linear programming. These pre-equated item sets were compared to the ones that were developed with a random item selection method based on the actual correct answer rate (ACAR) and difficulty from item response theory (IRT). The results with and without common items were also compared in the same way. ACAR and the IRT difficulty were used to determine whether there was a significant difference between the pre-equating conditions.
RESULTS:
There was a statistically significant difference in IRT difficulty among the results from different pre-equated conditions. The predicted correct answer rate was divided using 2 or 3 difficulty categories, and the ACAR and IRT difficulty parameters of the 5 item sets were equally constructed. Comparing the item set conditions with and without common items, including common items did not make a significant contribution to the equating of the 5 item sets.
CONCLUSION
This study suggested that the linear programming method is applicable to construct equated-item sets that reflect each content area. The suggested best method to construct equated item sets is to divide the predicted correct answer rate using 2 or 3 difficulty categories, regardless of common items. If pre-equated item sets are required to construct a test based on the actual data, several methods should be considered by simulation studies to determine which is optimal before administering a real test.