1.API Driven On-Demand Participant ID Pseudonymization in Heterogeneous Multi-Study Research
Shorabuddin SYED ; Mahanazuddin SYED ; Hafsa Bareen SYEDA ; Maryam GARZA ; William BENNETT ; Jonathan BONA ; Salma BEGUM ; Ahmad BAGHAL ; Meredith ZOZUS ; Fred PRIOR
Healthcare Informatics Research 2021;27(1):39-47
Objectives:
To facilitate clinical and translational research, imaging and non-imaging clinical data from multiple disparate systems must be aggregated for analysis. Study participant records from various sources are linked together and to patient records when possible to address research questions while ensuring patient privacy. This paper presents a novel tool that pseudonymizes participant identifiers (PIDs) using a researcher-driven automated process that takes advantage of application-programming interface (API) and the Perl Open-Source Digital Imaging and Communications in Medicine Archive (POSDA) to further de-identify PIDs. The tool, on-demand cohort and API participant identifier pseudonymization (O-CAPP), employs a pseudonymization method based on the type of incoming research data.
Methods:
For images, pseudonymization of PIDs is done using API calls that receive PIDs present in Digital Imaging and Communications in Medicine (DICOM) headers and returns the pseudonymized identifiers. For non-imaging clinical research data, PIDs provided by study principal investigators (PIs) are pseudonymized using a nightly automated process. The pseudonymized PIDs (P-PIDs) along with other protected health information is further de-identified using POSDA.
Results:
A sample of 250 PIDs pseudonymized by O-CAPP were selected and successfully validated. Of those, 125 PIDs that were pseudonymized by the nightly automated process were validated by multiple clinical trial investigators (CTIs). For the other 125, CTIs validated radiologic image pseudonymization by API request based on the provided PID and P-PID mappings.
Conclusions
We developed a novel approach of an ondemand pseudonymization process that will aide researchers in obtaining a comprehensive and holistic view of study participant data without compromising patient privacy.
2.Toolkit to Compute Time-Based Elixhauser Comorbidity Indices and Extension to Common Data Models
Shorabuddin SYED ; Ahmad BAGHAL ; Fred PRIOR ; Meredith ZOZUS ; Shaymaa AL-SHUKRI ; Hafsa Bareen SYEDA ; Maryam GARZA ; Salma BEGUM ; Kim GATES ; Mahanazuddin SYED ; Kevin W. SEXTON
Healthcare Informatics Research 2020;26(3):193-200
Objectives:
The time-dependent study of comorbidities provides insight into disease progression and trajectory. We hypothesize that understanding longitudinal disease characteristics can lead to more timely intervention and improve clinical outcomes. As a first step, we developed an efficient and easy-to-install toolkit, the Time-based Elixhauser Comorbidity Index (TECI), which pre-calculates time-based Elixhauser comorbidities and can be extended to common data models (CDMs).
Methods:
A Structured Query Language (SQL)-based toolkit, TECI, was built to pre-calculate time-specific Elixhauser comorbidity indices using data from a clinical data repository (CDR). Then it was extended to the Informatics for Integrating Biology and the Bedside (I2B2) and Observational Medical Outcomes Partnership (OMOP) CDMs.
Results:
At the University of Arkansas for Medical Sciences (UAMS), the TECI toolkit was successfully installed to compute the indices from CDR data, and the scores were integrated into the I2B2 and OMOP CDMs. Comorbidity scores calculated by TECI were validated against: scores available in the 2015 quarter 1–3 Nationwide Readmissions Database (NRD) and scores calculated using the comorbidities using a previously validated algorithm on the 2015 quarter 4 NRD. Furthermore, TECI identified 18,846 UAMS patients that had changes in comorbidity scores over time (year 2013 to 2019). Comorbidities for a random sample of patients were independently reviewed, and in all cases, the results were found to be 100% accurate.
Conclusions
TECI facilitates the study of comorbidities within a time-dependent context, allowing better understanding of disease associations and trajectories, which has the potential to improve clinical outcomes.