Generative AI / LLMs

Writer Releases Domain-Specific LLMs for Healthcare and Finance

Writer has released two new domain-specific AI models, Palmyra-Med 70B and Palmyra-Fin 70B, expanding the capabilities of NVIDIA NIM. These models bring unparalleled accuracy to medical and financial generative AI applications—outperforming comparable models like GPT-4, Med-PaLM 2, and Claude 3.5 Sonnet. 

While general-purpose large language models (LLMs) have captured recent headlines, it’s the targeted power of specialized models—with their improved accuracy and domain knowledge—that’ll reshape complex, regulated industries like finance and healthcare. Palmyra-Med 70B and Palmyra-Fin 70B are specialized models, making them uniquely adept at powering AI workflows in two industries that are known for their strict regulation and compliance standards. 

Palmyra-Med 70B and Palmyra-Fin 70B are joining a roster of top-ranking LLMs built by Writer. These include the general-purpose model Palmyra-X, Palmyra-Vision for image analysis, and many more. Making Palmyra-Med 70B and Palmyra-Fin 70B available as NVIDIA NIM microservices improves the composability of the models with preconfigured containers that can be deployed to NVIDIA-accelerated architecture across cloud, data center, and local platforms. 

Beyond facilitating fast deployment, both Palmyra-Med 70B and Palmyra-Fin 70B have seen improved performance from NVIDIA AI software. Optimizations using NVIDIA TensorRT-LLM have reduced inference latency (TTFT) of the models by 23% and 30%, respectively, and increased the rate of token return (TPS) by ~60% for both. The result is a more responsive prompting experience that rapidly produces answers to queries.

Figure 1. Impact of NVIDIA TensorRT-LLM optimization on TTFT (Ieft) and TPS (right) for both Palmyra-Med 70B and Palmyra-Fin 70B

Improving patient outcomes with record-breaking medical accuracy

Palmyra-Med 70B is the latest version of our healthcare model and the most accurate model available on the market. In our testing, Palmyra-Med 70B averaged 85.9% across all medical benchmarks, beating the runner-up, Med-PaLM 2 by close to 2 percentage points. Med-PaLM 2 only achieved those results when five examples were provided compared to Palmyra’s zero-shot performance. 

Table 1 shows a comprehensive medical Massive Multitask Language Understanding (MMLU) benchmark comparison between popular models. Benchmarks include MMLU Clinical Knowledge, Professional Medicine, PubMedQA and many more. See the full list and results

Palmyra-MedMed-PaLM 2 (5-shot)GPT-4Gemini 1.0GPT-3.5 Turbo
MMLU Clinical Knowledge90.988.38676.774.7
MMLU Medical Genetics94909175.874
MMLU Anatomy83.777.88066.772.8
MMLU College Medicine84.480.976.969.264.7
PubMedQA79.679.275.270.772.7
Average*85.984.182.870.866
Table 1. Comprehensive medical MMLU benchmark comparison of popular models
*Average performance is measured from all nine tests

The result is an accurate, reliable model that can help improve patient outcomes and research through its ability to tackle complex medical tasks in a range of disciplines, including:

  • Clinical knowledge and anatomy: Achieving scores of 90.9% in MMLU Clinical Knowledge and 83.7% in MMLU Anatomy, Palmyra-Med 70B demonstrates a robust understanding of clinical procedures and human anatomy. This makes it exceptionally useful for supporting diagnostic accuracy and treatment planning in medical settings.
  • Genetics and college medicine: Scoring 94.0% in Medical Genetics and 84.4% in College Medicine, the model excels in interpreting genetic data and applying complex medical knowledge, crucial for genetic counseling and medical education.
  • Biomedical research: With 80% performance in PubMedQA, Palmyra-Med 70B proves its capability to effectively extract and analyze information from biomedical literature, aiding in research and evidence-based medical practices.

Writer works with some of the world’s leading healthcare companies to help them improve patient outcomes with powerful generative AI applications.  Palmyra-Med 70B is highly proficient in a range of medical use cases including clinical decision support, offering evidence-based diagnosis suggestions and successful treatment strategies. It also aids in the development and understanding of clinical trial protocols, drug interaction summaries, medical document generation, and much more. 

Palmyra-Med 70B empowers developers across the medical industry to build new AI apps that are infused with deep medical knowledge and expertise. 

A powerful LLM for finance

Adopting generative AI in the financial sector comes with its own unique obstacles: lengthy financial statements, complex terminology, and nuanced market analysis. By combining a well-curated set of financial training data with custom fine-tuning instruction data, the team trained a highly accurate financial LLM that can power a range of use cases.

  • Financial trend analysis and forecasts: Examining market dynamics and developing forecasts for financial performance. 
  • Investment analysis: Producing detailed evaluations of firms, industries, or economic markers.
  • Risk evaluation: Assessing the potential hazards linked to different financial tools or approaches.
  • Asset allocation strategy: Recommending investment mixes tailored to individual risk preferences and financial objectives.

To test the expertise of Palmyra-Fin, it was tasked with passing the CFA Level III exam. The model scored 73% on the multiple-choice section of a CFA Level III sample test, making it the first model that can pass the exam. To put this achievement in perspective, passing the CFA Level III is one of the highest distinctions in the investment management profession. The average passing score over the last 11 years has been 60%, and typically less than half of all test takers receive a passing score. 

Palmyra-Fin’s performance is a stark improvement over other general-purpose models like GPT-4, which have previously reported a 33% performance on the exam.

The team also ran Palmyra-Fin through the long-fin-eval benchmark test, where it outperformed popular models like Claude 3.5 Sonnet, GPT-4o, and Mixtral 8x7B showcasing the model’s ability to analyze complex financial topics. 

A bar chart comparing the performance of different large language models on a financial benchmark evaluation. Palmyra-Fin 70B performs the best, followed by Claude 3.5 Sonnet, Qwen2 70B Instruct, GPT-40, and Mixtral 8x7B. The benchmark was designed to simulate real-world use cases.
Figure 2. Palmyra-Fin 70B outperformed other models in the long-fin-eval benchmark test

Getting started with Palmyra LLMs

Looking to the future, domain-specific LLMs will be at the forefront of AI innovation, transforming how industries build specialized AI applications. Writer is pioneering this movement by creating models like Palmyra-Med 70B and Palmyra-Fin 70B—models with deep, sector-specific expertise, exceptionally well-suited for enterprise use cases. These targeted models promise not only greater accuracy and efficiency, but also improved data management and regulatory compliance. 

If you’re building an AI application in the medical or financial field, try Palmyra-Med 70B and Palmyra-Fin 70B, accessible through the NVIDIA API catalog. For commercial use cases, you can get in touch with the Writer team at sales@writer.com.

Discuss (0)

Tags