<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<id>https://medinform.jmir.org/issue/feed</id>
	<title>JMIR Medical Informatics</title>
			<updated>2024-12-31T10:00:00-05:00</updated>
	
		<author>
		<name>JMIR Publications</name>
				<email>editor@jmir.org</email>
			</author>
		<link rel="alternate" href="https://medinform.jmir.org" />
	<link rel="self" type="application/atom+xml" href="https://medinform.jmir.org/feed/atom" />

	<generator uri="http://pkp.sfu.ca/ojs/" version="2.2.0.0">Open Journal Systems</generator>

				        <rights> This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. </rights>
    	<subtitle>Clinical informatics</subtitle>



	<entry>
		<id> https://medinform.jmir.org/2026/1/e88472 </id>
		<title>The Regional Implementation of an Electronic Health Record–Integrated Ambient Scribe in Primary and Secondary Care in England: Real-Time Qualitative Evaluation</title>
		<updated>2026-05-29T14:45:17-04:00</updated>

					<author>
				<name>Kathrin Cresswell</name>
			</author>
					<author>
				<name>Catharine Rose</name>
			</author>
					<author>
				<name>Jessica Howdle</name>
			</author>
					<author>
				<name>Lucas Martinus Seuren</name>
			</author>
					<author>
				<name>Robin Williams</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e88472" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e88472">Background: There is significant potential for ambient scribe technology to enhance health care productivity, with a growing range of applications being developed and implemented internationally. Strong organizational drivers to improve efficiency, coupled with the technology’s potential to help address clinician burnout, are accelerating interest and adoption. However, limited attention has been paid to the integration of such systems within electronic health records and unintended consequences, as stakeholders navigate their implementation and integration into clinical practice. Objective: This study therefore aimed to explore the processes involved in implementing and adopting ambient scribe technology across diverse health care settings. Methods: We conducted a real-time, longitudinal qualitative evaluation of a pilot implementation of an integrated ambient scribe system across National Health Service primary care and secondary hospital settings within a care system in the Midlands region of England. Data collection involved in-depth one-to-one interviews conducted in several phases: an initial scoping study to identify key interests and stakeholders for inclusion, followed by an implementation study with participants involved in the pilot. We also conducted 16 hours of nonparticipant observations of consultations. The implementation study gathered data at 2 time points, before implementation and 3 to 4 weeks after, to capture experiences, changes, and emerging impacts over time. Results: We collected data from 45 individuals. Use cases varied across settings and shaped how ambient scribe systems were deployed. Differences between general practice and secondary care in documentation purpose, format, and workflow created challenges for developing and validating templates, though the technology showed flexibility across contexts. Automated note-taking often improved patient interaction but required clinicians to adjust how they spoke to ensure the technology captured their reasoning. Outputs were sometimes generic, reinforcing defensive documentation and reducing personal tone or contextual recall. Integration pathways carried distinct trade-offs: stand-alone systems (which were used by many stakeholders in our study) were easier to adopt but offered limited long-term benefits, while integrated systems required greater effort and standardization, yet promised improved efficiency and safety. Conclusions: The ambient scribe market remains immature and volatile, creating strategic uncertainty for health systems. Careful procurement approaches are needed to balance the risks and benefits of integrated versus stand-alone systems, while aligning user demand with organizational needs for integration.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/69fc5ee1e338a0833bb304187da0ebe9" />
		
		<published>2026-05-29T14:45:17-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e99873 </id>
		<title>Correction: Automated ICD-10–Anchored Classification of Primary Care Text Data: Development and Evaluation of a Custom Multilabel Classifier</title>
		<updated>2026-05-29T10:45:16-04:00</updated>

					<author>
				<name>Christina Haag</name>
			</author>
					<author>
				<name>Thomas Grischott</name>
			</author>
					<author>
				<name>Jakob M Burgstaller</name>
			</author>
					<author>
				<name>Stefan Markun</name>
			</author>
					<author>
				<name>Oliver Senn</name>
			</author>
					<author>
				<name>Viktor von Wyl</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e99873" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e99873">Electronic medical records are a vast and valuable source of information, useful for tasks such as estimating disease prevalence. However, in routine primary care, much of this information is in free-text format rather than in a structured form and, therefore, not readily amenable to analysis. Manual coding of this textual data is both time-consuming and resource-intensive, making it impractical for large datasets. Although powerful open-source language models offer new opportunities for automated coding, their use on short heterogeneous primary care notes, particularly in German-language settings, remains insufficiently studied. By providing hands-on guidance for applied health researchers, this study aims to demonstrate the effective and accurate automatic classification of free-text notes using a language model fine-tuned for automated International Statistical Classification of Diseases, Tenth Revision (ICD-10) coding. Building on the extensive Family Medicine Research Using Electronic Medical Records (FIRE) routine database from the Institute of Primary Care at the University Hospital Zurich and the University of Zurich, we trained a large language model–based multilabel classifier on a dataset of 38,728 free-text notes, which had been manually categorized into 47 classes using specific ICD-10 codes and code ranges or nondiagnostic/ad hoc labels (eg, “unclear diagnosis,” “status post”). We stratified the labeled data into training (70%), validation (15%), and posttraining test (15%) sets, ensuring similar label distributions across these sets. Using the Transformers Python library, we trained the model over 10 epochs and evaluated it on the posttraining test dataset. Across 48 classes, the FIRE classifier achieved strong performance on the held-out posttraining set, with F1-scores of 0.85 (micro, overall across all predictions), 0.86 (macro, mean of per-class scores treating classes equally), and 0.84 (weighted, per-class scores weighted by class frequency). This study demonstrates steps for training open-source large language models and highlights the potential to streamline and scale the extraction of diagnostic information for practical applications. Our model can be robustly deployed, for example, for prescreening and labeling of free-text information, thus potentially reducing the burden of repetitive and error-prone manual handling.</summary>
		
        
        
		<published>2026-05-29T10:45:16-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e86965 </id>
		<title>Advancing Alzheimer Disease Prediction With Large Language Model–Based Linguistic Feature Analysis: Development and Validation Study</title>
		<updated>2026-05-28T18:00:05-04:00</updated>

					<author>
				<name>Ming-Hsia Hsu</name>
			</author>
					<author>
				<name>San-Yih Hwang</name>
			</author>
					<author>
				<name>Yi-Hang Tsai</name>
			</author>
					<author>
				<name>Yun-Chi Chang</name>
			</author>
					<author>
				<name>Chih-Kuang Liang</name>
			</author>
					<author>
				<name>Chiung-Yun Chang</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e86965" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e86965">Background: Alzheimer disease (AD) is a progressive neurodegenerative disorder with rapidly growing global prevalence. Early detection is critical for timely intervention; yet, conventional diagnostic methods remain costly and invasive. Speech-based assessment has emerged as a noninvasive alternative, as AD characteristically impairs linguistic abilities including fluency, coherence, and informational content. Recent advances in large language models (LLMs) offer new opportunities to extract structured linguistic features from transcribed speech for automated AD classification. However, existing LLM-based approaches often lack transparency and clinical interpretability, limiting their adoption in clinical workflows. Objective: This study aims to investigate the influence of linguistic features extracted from transcribed speech, as analyzed by LLMs, on the accuracy and interpretability of AD prediction. Methods: We propose a framework that leverages LLMs to analyze linguistic features extracted from transcribed speech for AD classification. Our approach focuses on 4 key aspects, including readability, fluency, richness of detail, and keyword relevance. To enhance classification accuracy, the framework integrates transcript embeddings with feature explanation embeddings, forming a comprehensive linguistic representation. We conducted extensive ablation studies to evaluate the contributions of individual features and benchmarked our framework against existing LLM-driven methodologies through pairwise explainability evaluations. Output stability was assessed across 3 independent pipeline runs. A fully local configuration (Llama 3 8B + nomic-embed-text) was tested to evaluate privacy-preserving deployment feasibility. Explainability was assessed via LLM-based pairwise comparison (Gemini-3.1-flash-lite) against the method of Bang et al across 54 correctly classified cases and by blinded evaluation from 2 neurologists. Results: The proposed framework achieved a mean precision of 91.52%, a sensitivity of 91.08%, a specificity of 96.29%, and -score of 91.05% across 3 independent runs on the ADReSSo 2021 dataset, outperforming existing LLM-based approaches. A fully-local configuration (Llama 3 8B+nomic-embed-text, requiring no cloud application programming interface access) achieved an -score of 81.58%, demonstrating framework transferability to privacy-preserving deployment environments. Keyword relevance was the most influential feature (-score drop of 13.22 pp when removed). Explainability evaluations showed our method was preferred in 49 out of 54 cases via Gemini-3.1-flash-lite, with human experts preferring our method in 89 of 108 blinded assessments. Conclusions: These findings highlight that a structured linguistic feature analysis using LLMs provides a robust and interpretable framework for preliminary AD detection. Our approach offers a scalable and accessible solution that bridges artificial intelligence–driven text analysis with clinical applications, supporting early detection of cognitive decline through noninvasive assessment methods.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/c736708ca49c999c1b3bd3049670f2a9" />
		
		<published>2026-05-28T18:00:05-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e76980 </id>
		<title>Blockchain Smart Contracts for Automating Clinical Trials: Systematic Review and Proposed System Architecture</title>
		<updated>2026-05-28T15:30:16-04:00</updated>

					<author>
				<name>Zara Sheikh</name>
			</author>
					<author>
				<name>Gargi Samarth</name>
			</author>
					<author>
				<name>Usman Jaffer</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e76980" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e76980">Background: Blockchain technologies have revolutionized the financial sector through their ability to generate immutable, cryptographically secure records. Clinical trials and health care data possess several synergies with those of the financial sector, specifically pertaining to the importance of tamper-resistant recording of processes. The evolution of blockchain to autonomously execute tasks contingent upon predefined contractual terms via smart contracts (SCs) allows a dynamic chain of interlinked events to unfold independently and in sequence, with time-stamped records. In recent years, mistrust in clinical trial data has grown significantly. Recording the entire clinical trial lifecycle from application, registration, recruitment to conduct, finance management, statistical analysis, and reporting in an immutable, cryptographically secure ledger with SC execution of trial processes could limit the potential for human intervention and tampering. This would produce a time-stamped record of all events within the trial lifecycle. Leveraging the capabilities of SCs could alleviate recruitment challenges and address ongoing concerns regarding data transparency, ownership, and integrity that currently undermine clinical trial processes. Objective: This study aimed to review the existing literature on SC applications in clinical trials and propose a system architecture for using SCs to automate key processes throughout the clinical trial lifecycle. Methods: A systematic search was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, identifying peer-reviewed studies and open-source repositories pertaining to the implementation of SCs in clinical trials. Data were extracted specific to the stage of the trial lifecycle described, SC architecture, and technical specifications for real-world implementation. Data were synthesized to propose an architecture for automating clinical trial processes within the lifecycle using SCs. Results: A total of 144 records were screened; 10 studies met the inclusion criteria. Most implementations used private Ethereum-based networks (7/10, 70%). Reported applications included automated patient matching (5/10, 50%), consent management with dynamic permissioning (6/10, 60%), protocol enforcement and time-stamped audit logs (9/10, 90%), adverse event reporting (3/10, 30%), and financial or workflow automation (3/10, 30%). SC-based recruitment systems demonstrated rapid matching performance (eg, 6000 simulated patients matched in 2.13 s). However, all included systems were prototypes or simulations, and none were tested in real-world regulatory settings. Scalability, interoperability limitations, regulatory ambiguity (eg, General Data Protection Regulation right-to-erasure conflicts), and high infrastructural complexity were common gaps noted across studies. Conclusions: Current evidence suggests that SCs can enhance transparency, traceability, and automation throughout the clinical trial lifecycle. However, the literature remains dominated by simulation-based prototypes, primarily Ethereum-dependent architectures, and lacks analyses of cost-effectiveness, governance, and integration with institutional workflows. Future research should evaluate hybrid architectures, develop interoperability standards, and assess regulatory and ethical implications in real deployments.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/ae45a6aabe5e50e550524935acff2787" />
		
		<published>2026-05-28T15:30:16-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e80527 </id>
		<title>Usability and Usefulness of Machine Learning–Based Clinical Decision Support Software in Primary Care: Survey of Users in a Prospective Observational Study</title>
		<updated>2026-05-27T17:00:04-04:00</updated>

					<author>
				<name>Willem Ernst Herter</name>
			</author>
					<author>
				<name>Janine Khuc</name>
			</author>
					<author>
				<name>Tobias N Bonten</name>
			</author>
					<author>
				<name>Robert A Verheij</name>
			</author>
					<author>
				<name>Mattijs E Numans</name>
			</author>
					<author>
				<name>Niels H Chavannes</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e80527" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e80527">Background: The successful implementation of decision support systems promises to enhance high-quality care. However, the successful implementation of a clinical decision support system (CDSS) depends on user acceptance and adoption. A machine learning (ML)–based CDSS to assist primary care professionals treating urinary tract infections (UTIs) was implemented, and usability and usefulness were assessed through a questionnaire. Objective: This study aimed to assess the system’s usability by examining users’ experiences with the software. A secondary goal was to assess users’ attitudes toward evidence-based practice and innovation in health care. Methods: In collaboration with the Netherlands Institute for Health Services Research (NIVEL) and Leiden University Medical Center (LUMC), Pacmed Ltd developed the CDSS. The cohort was mostly recruited at the care group level; practices within participating care groups were required to participate. Health insurers partly funded the research. Practitioners participated in the implementation study for 4 months. A survey based on the Unified Theory of Acceptance and Use of Technology (UTAUT) was sent to 263 general practitioners and assistants shortly after the implementation period. Furthermore, usage data were analyzed. Results: Of the 34 participating practices that used the software, 30 (88%) submitted at least one survey response, with a mean of 2.23 responses per practice (SD 1.43). The CDSS was used throughout the pilot period, and 31 practices continued using the tool, with 9% dropping out during the first 8 weeks. Sixty-seven percent of respondents trusted the tool’s output, and 73% found it understandable how the algorithm came to predictions. Sixty-five percent of respondents indicated that the information provided was useful in addition to the available guidelines, and 52% agreed that it supported their decision-making. However, many respondents were uncertain whether the tool improved patient care (46%) or patient outcomes (66%). Forty-eight percent of respondents found the software easy to integrate into their clinical workflow. Conclusions: The CDSS was perceived as trustworthy and easy to use. However, users were unable to determine whether the CDSS improved patient outcomes. In addition, the CDSS development could have benefited from including assistants as well as general practitioners more in the design phase of the software. Because assistants play an important role in UTI care, designing the software to better fit existing workflows may reduce the perceived time investment associated with using the tool. Finally, respondents reported strong motivation to contribute to further research in this field and indicated willingness to embrace change in health care delivery, which may also reflect selection bias in our sample. Trial Registration: ClinicalTrials.gov NCT04408976; https://clinicaltrials.gov/study/NCT04408976 International Registered Report Identifier (IRRID): RR2-10.2196/27795</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/a16d37a7b2e4bc936fba7ca68e519683" />
		
		<published>2026-05-27T17:00:04-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e84396 </id>
		<title>Multimodal Prediction of Renal Tumor Malignancy From Radiology Reports and Structured Electronic Health Records: Retrospective Cohort Study</title>
		<updated>2026-05-27T16:00:04-04:00</updated>

					<author>
				<name>Zhengkang Fan</name>
			</author>
					<author>
				<name>Renjie Liang</name>
			</author>
					<author>
				<name>Chengkun Sun</name>
			</author>
					<author>
				<name>Jinqian Pan</name>
			</author>
					<author>
				<name>Russell Terry</name>
			</author>
					<author>
				<name>Jie Xu</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e84396" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e84396">&lt;strong&gt;Background:&lt;/strong&gt; Accurate preoperative prediction of renal tumor malignancy is critical for guiding decisions and reducing overtreatment, as a substantial proportion of renal masses prove benign. Although radiology assessments and structured electronic health record (EHR) data are routinely used, many tumor-specific descriptors remain embedded in free-text radiology reports and are underused due to extraction challenges. &lt;strong&gt;Objective:&lt;/strong&gt; This study aimed to develop and evaluate a multimodal pipeline that integrates structured EHR variables with natural language processing features from computed tomography (CT) radiology reports, including large language model (LLM)–extracted abnormality characteristics and transformer-based report embeddings, to improve malignancy prediction. &lt;strong&gt;Methods:&lt;/strong&gt; We conducted a retrospective cohort study using University of Florida Health Integrated Data Repository Observational Medical Outcomes Partnership–mapped EHR data from December 2011 to August 2024. Adults with renal tumors were included if they had longitudinal diagnostic documentation consistent with a renal mass and at least 1 preoperative renal CT report; final benign or malignant status served as the outcome. Structured features included demographics, comorbidities, medications, vital signs, and laboratory measurements. From the recent preindex CT report, an on-premises LLM isolated kidney-specific findings and extracted abnormality characteristics. Four locally deployed LLMs were evaluated against manual annotations of 500 reports. Kidney-specific text was encoded using pretrained biomedical transformer models, including radiology Bidirectional Encoder Representations from Transformers (BERT) variants. We evaluated unimodal baselines and multimodal early, middle, and late fusion strategies. Model development used 5-fold cross-validation within the 80% training partition; each fold-specific model was evaluated on the same independent 20% held-out test set, with performance reported as mean and SD across the 5 held-out test evaluations. The primary metric was area under the receiver operating characteristic curve (AUC). &lt;strong&gt;Results:&lt;/strong&gt; The final cohort included 967 patients (n=712, 73.6% malignant). In extraction evaluation, Qwen2.5-32B achieved 88.3% overall accuracy with a 100% extraction success rate and was selected for downstream feature generation. Among unimodal models, the structured clinical variable model achieved an AUC of 0.758 (SD 0.012), kidney-specific text with radiology BERT achieved an AUC of 0.746 (SD 0.058), and abnormality characteristics alone achieved an AUC of 0.716 (SD 0.015). Multimodal fusion models achieved higher descriptive performance than unimodal models. Early fusion achieved the highest AUC (mean 0.813, SD 0.008), and &lt;i&gt;F&lt;/i&gt;&lt;sub&gt;1&lt;/sub&gt;-score (mean 0.809, SD 0.030), while late fusion achieved an AUC of 0.805 (SD 0.016). Ablation and interpretability analyses suggested complementary predictive information from structured clinical variables and kidney-specific text embeddings. &lt;strong&gt;Conclusions:&lt;/strong&gt; Integrating unstructured radiology report text with structured EHR variables achieved higher mean predictive performance than unimodal approaches in descriptive comparisons. Multimodal fusion, particularly early fusion incorporating radiology BERT–derived kidney-specific text embeddings, achieved the strongest discrimination, suggesting potential value of natural language processing–enabled multimodal EHR pipelines for informing preoperative risk stratification. </summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/821d70671243994afedd2661828a33e0" />
		
		<published>2026-05-27T16:00:04-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e73119 </id>
		<title>Association Between Metabolic Clusters and Microbial Age in High-Risk Populations for Diabetes and Their Potential Impact on Cardiovascular Disease Risk: Cross-Sectional Observational Study</title>
		<updated>2026-05-26T17:15:14-04:00</updated>

					<author>
				<name>Lu Xinlin</name>
			</author>
					<author>
				<name>Hongli Gu</name>
			</author>
					<author>
				<name>Ren Li</name>
			</author>
					<author>
				<name>Xianjun Mao</name>
			</author>
					<author>
				<name>Can Chen</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e73119" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e73119">Background: Metabolic multimorbidity is prevalent in high-risk populations for diabetes and is linked to cardiovascular disease (CVD) and gut microbiota composition. The relationship between metabolic clusters (MCs), microbial age (MA), and metabolic markers remains poorly understood. Objective: This study aimed to investigate the characteristics of MCs and MA in high-risk diabetic populations, focusing on their associations with gut microbiota, metabolic dysregulation, and CVD risk. Methods: Using data from the NIH Integrative Human Microbiome Project, we performed metabolomic and microbiomic analyses. K-means clustering identified MCs, and redundancy analysis examined the relationship between metabolic variables and microbiota. A random forest (RF) model predicted MA and CVD risk, while the linear discriminant analysis effect size identified microbial species associated with MCs and MA. Co-occurrence network analysis explored microbial interactions. Results: We included 103 high-risk individuals (56/103, 54.4% female, mean age 50.6, SD 54.6 years). In total, 3 MCs were identified: MC1 (high glucose or blood urea nitrogen), MC2 (relatively healthy), and MC3 (lipid dysregulation). Age explained 3% of gut microbiota variation (=0.03; =.006). The RF model predicting microbial age showed a strong correlation with chronological age in training data (ρ=0.97, root mean square error=3.33; &lt;.001) and moderate correlation in test data (ρ=0.35; &lt;.001). High microbial age was associated with elevated lipid markers (low-density lipoprotein and triglycerides; &lt;.001) and higher cardiovascular risk. The RF model for CVD risk prediction achieved excellent discrimination (area under the curve=0.95 for the low-risk and 0.95 for the high-risk groups). Conclusions: This study highlights the relationship between MCs, MA, and gut microbiota, providing insights for early intervention and personalized treatment strategies for diabetes and related metabolic disorders.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/9313aa4aaa4e4f591352c2aade0e52e3" />
		
		<published>2026-05-26T17:15:14-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e79378 </id>
		<title>Interoperable Integration of a National Rare Disease Registry Into a Rare Eye Disease Data Warehouse: Implementation Study</title>
		<updated>2026-05-26T17:00:18-04:00</updated>

					<author>
				<name>Camille Beluffi Marin</name>
			</author>
					<author>
				<name>Marilyne Oswald</name>
			</author>
					<author>
				<name>Laura Ratenet</name>
			</author>
					<author>
				<name>Matthieu Stoll</name>
			</author>
					<author>
				<name>Kirsley Chennen</name>
			</author>
					<author>
				<name>Hélène Dollfus</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e79378" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e79378">Background: In France, clinical data on rare diseases are primarily collected through BaMaRa (Base Maladies Rares), a software platform used by national expert centers to populate the BNDMR (Banque Nationale de Données Maladies Rares), the French national rare disease data warehouse. BaMaRa ensures standardized and structured data collection across all rare disease networks, with a focus on care coordination and epidemiological reporting. In 2024, FREDD (French Rare Eye Disease Database), a health data warehouse dedicated to rare eye diseases, was developed within the framework of the third French National Rare Disease Plan by the SENSGENE sector. Despite overlapping datasets, there is no native interoperability between BaMaRa and FREDD, requiring the development of a dedicated, traceable pipeline to transform BaMaRa exports into data suitable for inclusion in FREDD. This transformation involves complex business rules to address structural, semantic, and specific differences between the two systems. Objective: This study aims to describe the design and implementation of a robust data transformation pipeline that enables the automated conversion of BaMaRa clinical records into a structured dataset aligned with the FREDD data model. The primary goal is to ensure that the data remain semantically consistent and reusable for the secondary use of health data. Methods: We developed a Python-based application called FREDDEX that integrates several configuration files and encodes the domain-specific business rules required to align BaMaRa data with the FREDD schema. These rules include patient filtering, mapping of variable names and values, management of multisource redundancy, and prevention of overwriting. The system was designed to be modular, auditable, and usable by clinical data managers with minimal technical expertise. Results: FREDDEX was tested and validated on a BaMaRa export of 1000 real patients from Strasbourg University Hospital. The tool successfully filtered and created 641 patient profiles in FREDD, with a 99% success rate for attempted imports and full concordance (100%) for directly mapped and inferred variables. Genetic data reconstruction was confirmed on a random sample of 30 patients with genetic information, showing 100% accuracy, and multidiagnostic blocks were correctly handled in all manually reviewed cases. Beyond validation, FREDDEX processed up to 5000 patient records, enabling the rapid onboarding of new clinical sites and significantly reducing manual curation time, while runtime and memory usage demonstrated near-linear scaling. Importantly, the tool established a facilitated reproducible framework adaptable to other rare disease contexts and interoperable with national and European platforms, such as European Reference Network-EYE. Conclusions: This work demonstrates that transforming structured national rare disease registry data into a research-oriented health data warehouse is feasible when clinical business rules are explicitly formalized within an auditable extract-transform-load framework. Beyond the FREDD use case, this approach illustrates how interoperability between care-based and research infrastructures can be operationalized in rare diseases while preserving semantic integrity and regulatory compliance.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/643b76ed8a900bf71c69d93c9a771699" />
		
		<published>2026-05-26T17:00:18-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e85335 </id>
		<title>Enhancing Early Prediction of Gestational Diabetes Mellitus Through Data Augmentation and Feature Guidance: Model Development and Validation Study</title>
		<updated>2026-05-25T16:00:27-04:00</updated>

					<author>
				<name>Xiekun Chen</name>
			</author>
					<author>
				<name>Zhifa Jiang</name>
			</author>
					<author>
				<name>Dong Su</name>
			</author>
					<author>
				<name>Xiaoping Chen</name>
			</author>
					<author>
				<name>Aiping Chen</name>
			</author>
					<author>
				<name>Zhen Zhang</name>
			</author>
					<author>
				<name>Huabin Wang</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e85335" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e85335">Background: Early prediction of gestational diabetes mellitus (GDM) is critical for improving maternal health outcomes. However, predictive models are often challenged by limited early-pregnancy samples, severe class imbalance in datasets, and complex interrelationships among clinical features. Objective: This study aimed to develop and evaluate a unified dual-dimensional enhancement framework integrating data augmentation and feature engineering. By addressing data imbalance and leveraging medical prior knowledge, this framework significantly improves early GDM prediction performance. Methods: We proposed a framework combining Generative Adversarial Network (GAN)–based data augmentation with large language model–inspired feature engineering. GAN sampling was used to generate clinically plausible synthetic minority class samples to mitigate data imbalance. The large language model was guided to organize features into domains (eg, basic demographics, metabolic syndrome, and core liver biomarkers) and generate higher-order composite features, integrating medical prior knowledge. Machine learning models were subsequently developed, and interpretability analyses were performed using Shapley additive explanations to identify key predictors. Results: This study used a final analytical cohort of 8214 pregnant women, divided into dataset A comprising 966 out of 5251 (18.4%) participants with GDM, and dataset B comprising 598 out of 2963 (20.2%) participants with GDM. The random forest model enhanced by Tabular Variational Autoencoder–based feature augmentation demonstrated the best performance. On the test dataset, it achieved a recall of 0.7559, an accuracy of 0.8444, and an area under the receiver operating characteristic curve (AUROC) of 0.8873. Statistical evaluation confirmed that the Tabular Variational Autoencoder method significantly outperformed the baseline (Cohen =2.894; &lt;.001) and the Conditional Tabular Generative Adversarial Network method (Cohen =1.637; =.02) in recall enhancement. Shapley additive explanations analysis identified the following 5 features as the most influential predictors: fasting blood glucose, the composite feature (fasting blood glucose+triglycerides)×prepregnancy BMI, activated partial thromboplastin time, leukocyte count, and neutrophil count. Conclusions: The proposed dual-dimensional enhancement framework effectively alleviates data limitations and captures complex feature interactions in early GDM prediction. This strategy not only improves model performance, particularly in recall, but also provides interpretable biological evidence to support rapid clinical screening, stratified management, and early intervention in pregnancy.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/9988b37f853cf5ca75fde2f54f96186b" />
		
		<published>2026-05-25T16:00:27-04:00</published>
	</entry>
	<entry>
		<id> https://medinform.jmir.org/2026/1/e68935 </id>
		<title>A Practical Approach to Assessing the Completeness of Electronic Health Records for Medical Research: Data Quality Study</title>
		<updated>2026-05-21T16:45:15-04:00</updated>

					<author>
				<name>Minsik Lim</name>
			</author>
					<author>
				<name>Doyeon An</name>
			</author>
					<author>
				<name>Nayeong Son</name>
			</author>
					<author>
				<name>Woongsang Sunwoo</name>
			</author>
					<author>
				<name>Suehyun Lee</name>
			</author>
				<link rel="alternate" href="https://medinform.jmir.org/2026/1/e68935" />
					<summary type="html" xml:base="https://medinform.jmir.org/2026/1/e68935">Background: Data quality is the degree to which data are fit for their intended purpose and is described using quality dimensions. The increased use of medical data in clinical research and medical artificial intelligence development has rendered data quality assessment essential. Despite existing data quality definitions, frameworks, and tools, data quality assessment in real-world settings faces multiple challenges. This stems from a lack of understanding of how to assess real-world data quality and interpret the results. Therefore, practical approaches to data quality assessment are needed that are appropriate for diverse data environments, intended uses, quality dimensions, and requirements. Objective: This study proposes a practical approach for assessing the completeness of electronic health records (EHRs) for medical research. This approach integrates structural completeness, rule-based assessment, and descriptive analyses of completeness and data diversity to clarify how data quality can be measured and meaningfully interpreted in practice. Methods: The completeness of a large-scale EHR dataset from Gachon University Gil Medical Center was evaluated covering January 2005 to December 2023. Completeness was assessed using a three-part approach comprising (1) structural completeness assessment, (2) rule-based assessment, and (3) descriptive analyses of completeness and data diversity. Assessments were conducted using clinical data quality assessment tools. This practical approach was used to assess EHR completeness for medical research from 1,798,153 patient records. Results: In the structural assessment, 12.8% (5/39) of the data tables were unavailable, indicating limited capturing of clinician free-text data. The rule-based assessment identified substantial missingness in vocabulary fields (38/124, 30.6%) and missing or special characteristic values in relation to observations (3,643,581/15,313,287, 23.8%), measurements (25,583,622/642,623,715, 4%), care sites (28/1715, 1.6%), and deaths (117/34,330, 0.3%). Descriptive analyses demonstrated a balanced gender distribution (886,489/1,798,153, 49.3% male and 911,664/1,798,153, 50.7% female) and a predominantly Korean racial distribution (1,739,628/1,798,153, 96.7%). Collectively, these findings illustrate the completeness quality of a multiperspective completeness assessment for medical research. Conclusions: This study demonstrates how data quality dimensions can be measured in practice through a real-world completeness assessment. This practical approach enables evaluation of EHR completeness and provides insights into data quality. Its findings have implications for researchers conducting data quality assessments and applying quality dimensions in medical research.</summary>
		
        
                	<content type="image/png" src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/822478644d2f13b6ca298aeb579afaff" />
		
		<published>2026-05-21T16:45:15-04:00</published>
	</entry>
</feed>