By Yuna Nakayasu and Tinglong Dai, Johns Hopkins University
In the heart of winter, a quiet but isolated village in Hokkaido, Japan, faces a stark reality faced by many rural communities worldwide: the difficulty of accessing emergency medical services when every second counts. For elderly residents like Mr. Sato, experiencing symptoms of a stroke, the distance is more than geographic – it could imperil life-saving treatment.
This scenario is far from unique; research indicates that poor access to emergency medicine, especially due to longer distances to hospitals, is associated with higher mortality rates. Compounding the geographic challenge is an acute shortage of medical professionals and essential diagnostic services in rural areas. Moreover, significant disparities in access exist for vulnerable populations, defined not only by their rural location, but also by their insurance status and income.
The combined effect of these challenges is a dangerous gap in emergency care that artificial intelligence solutions can help fill. Now imagine a local health worker using a portable CT scanner in a mobile diagnostic unit to quickly perform a brain scan on Mr. Sato. The health worker then sends the CT scan, along with medical notes, to an AI system powered by GPT 4 Vision, which indicates a high chance of ischemic stroke and, after speaking with a neurologist on the phone to confirm the assessment, arranges transportation to a larger (or tertiary) hospital nearby for a clot-busting drug – all within minutes. This setup – combining wearable scanning technology and AI – enables real-time, life-saving medical decisions in rural areas.
Faster, more contextual assessments with AI with Vision
While the benefits of AI in medicine have been touted for years, our ongoing research at Johns Hopkins University shows that the revolutionary potential of generative AI with Vision capabilities – not just any type of AI – will be key to the future of emergency medicine in caring for hard-to-reach patients like Mr. Sato.
In its nearly 70-year evolution, AI has progressed through paradigms such as expert systems that encode human knowledge, machine learning models that detect patterns in structured data, and computer vision techniques that recognize objects in images and video. The latest wave of AI has introduced foundational models and generative AI, with a pivotal breakthrough in models with multimodal capabilities – models that can generate novel content by combining speech with vision, audio and other modalities. Unlike previous approaches limited to text or images, this generative AI achieves multimodal understanding and reasoning, exemplified by models such as GPT-4 with its Vision capabilities.
- First, unlike most machine learning models that can only handle structured data, such as numbers and categories, generative AI with Vision can analyze complex, unstructured data, such as medical images and natural language. This allows it to understand real-world inputs, such as CT scans in their raw form, rather than requiring time-consuming pre-processing. By seamlessly integrating vision and text, these AI systems can make faster, more contextual assessments in emergency situations that are aligned with the clinical reality, where decision-making relies on interpretation of a spectrum of data sources and modalities.
- Second, unlike traditional computer vision models that merely detect and classify objects, generative AI with Vision capabilities can make rich inferences to provide actual medical diagnosis and treatment recommendations. In our ongoing experiments using GPT-4’s multimodal capabilities on CT scans and other medical images, we found encouraging levels of accuracy that can match human experts in many cases. With targeted tuning on larger medical datasets, performance could be further improved for safe real-world deployment.
- Third, if deployed appropriately, the operational costs of deploying generative AI systems for rural emergency care are remarkably low compared to alternatives such as building new tertiary care centers or providing comprehensive tele-stroke services. A single AI model building upon GPT-4 Vision could effectively serve multiple rural regions simultaneously at a fraction of the cost.
Implementing AI
Rural clinics already have the off-the-shelf hardware, such as smartphones or tablets, to run affordable AI software locally. Alternatively, they could use AI via the cloud without major infrastructure investments. The cost is primarily one-time model training and small service fees – likely within the annual budget of a single urban stroke center for an entire rural region. As computing costs continue to fall rapidly, access to cutting-edge emergency AI will become even more democratized over time, replacing extraordinarily rare human expertise with convenient, low-cost AI tools.
To be certain, deploying generative AI with Vision capabilities carries notable challenges and risks. The recent WHO report on AI ethics and governance for large multimodal models highlights the complex challenges introduced by these technologies in healthcare. These range from regulatory and systemic risks, given the potential for “’hallucination,” to system-wide biases, privacy concerns and environmental impacts. Additionally, it underscores the need for stringent oversight and ethical frameworks to address societal risks, such as the dominance of large tech companies and the potential undermining of human expertise. However, despite these considerable obstacles, the dire reality of limited access to emergency care, especially in low-resource settings, including rural areas and low- and middle-income countries, demands new solutions to bridge this critical gap.
Taken together, we can now imagine a world where the heartbeat of a small, remote town is no longer overshadowed by the fear of medical emergencies that go unchecked because of sheer distance. Generative AI that blends Vision provides a reassuring presence, bringing the expertise of specialists to help make critical decisions alongside our trusted local healthcare professionals. No longer will a stroke patient’s fate depend on geographical luck.
This push to integrate generative AI into emergency medicine is not just a technological leap; it is a promise to address one of the most persistent disparities in our healthcare system. It is about giving our rural communities the dignity, care and hope they deserve, and ensuring that no call for help goes unanswered, no matter how far it must travel. Through Vision-enabled AI assistants, we are not only saving lives, we are restoring faith in a system that promises to leave no one behind.
About the authors
Yuna Nakayasu, MD, is a physician and an MPH/MBA candidate at Johns Hopkins University’s Bloomberg School of Public Health and Carey Business School.
Tinglong Dai, PhD, is the Bernard T. Ferrari professor at the Johns Hopkins Carey Business School; co-chair of the Johns Hopkins Workgroup on AI and Healthcare, which is part of the Hopkins Business of Health Initiative; and vice president of marketing, communication and outreach at INFORMS.