Construction and application of an integrated scientific research big data platform based on the data lakehouse architecture
10.3760/cma.j.cn111325-20240808-00647
- VernacularTitle:基于湖仓一体架构的科研大数据平台建设与应用
- Author:
Linlin WANG
1
;
Xianying HE
;
Fangfang CUI
;
Rui YAN
;
Jie ZHAO
Author Information
1. 郑州大学第一附属医院互联网医疗系统与应用国家工程实验室,郑州 450052
- Publication Type:Journal Article
- Keywords:
Big data;
Data lakehouse;
Platform;
Clinical research;
Data governance
- From:
Chinese Journal of Hospital Administration
2025;41(4):317-322
- CountryChina
- Language:Chinese
-
Abstract:
In order to integrate clinical data, image data, and omics data scattered across different systems, and effectively support clinical research based on real-world data, a hospital has integrated Hadoop big data processing technology with distributed parallel database technology to build a data storage and calculation system that integrates lakes and warehouses. Through the integration of 15 medical information system data, data governance based on patient master indexes, and the design and development of an application platform that covered 8 major functions and integrated general scientific research and specialized disease applications, the hospital has built an integrated scientific research big data platform, which included 3.3 billion pieces of data from 20.26 million patients and 98.57 million visits, and has built 3 specialized disease databases. From January to August 2024, it has supported data extraction and analysis for 35 research projects, reducing traditional code-based data retrieval time from 5-45 workdays to several hours or even minutes, significantly enhancing the efficiency of clinical research.