Fine-Tuning Impact

LoRA (r=16) fine-tuning pipeline on 54 synthetic SOAP note pairs | Notebook: notebooks/med-gemma-4b-soap-lora.ipynb

Base MedGemma 4B

Input Transcript

45-year-old male, new onset T2DM, A1c 9.1%. BMI 34. Fasting glucose 210. Reports polydipsia, polyuria x3 weeks. Family history of diabetes (mother, sister). No prior medications.

Output (unstructured)

The patient is a 45 year old male who presents with new onset diabetes. His A1c is 9.1% which is elevated. BMI is 34 indicating obesity. He has symptoms of polydipsia and polyuria for 3 weeks.

I would recommend starting metformin and lifestyle modifications. Follow up in 3 months to recheck A1c. Consider referral to endocrinology if not improving.

He should also be screened for complications including eye exam and foot exam.
No SOAP structureNo ICD-10 codeFree-text narrative

MedScribe (Fine-tuned)

Same Input Transcript

45-year-old male, new onset T2DM, A1c 9.1%. BMI 34. Fasting glucose 210. Reports polydipsia, polyuria x3 weeks. Family history of diabetes (mother, sister). No prior medications.

Output (structured SOAP)

SUBJECTIVE: 45-year-old male presenting with new-onset polyuria and polydipsia x3 weeks. FHx: T2DM in mother and sister. No prior medications. Reports increased thirst and frequent urination.

OBJECTIVE: A1c 9.1% (elevated). Fasting glucose 210 mg/dL. BMI 34 (obese). Vitals otherwise stable.

ASSESSMENT: Type 2 diabetes mellitus, newly diagnosed with hyperglycemia. ICD-10: E11.65 - Type 2 diabetes mellitus with hyperglycemia.

PLAN:
1. Start metformin 500mg BID, titrate to 1000mg BID over 2 weeks.
2. Diabetic education: diet, exercise, glucose monitoring.
3. Ophthalmology referral for baseline diabetic eye exam.
4. Comprehensive metabolic panel, lipid panel, urine microalbumin.
5. Recheck A1c in 3 months. Target <7%.
6. Return precautions: DKA symptoms (nausea, vomiting, abdominal pain).
4/4 SOAP sectionsICD-10: E11.65EHR-ready format

Evaluation Metrics (10 held-out test cases)

MetricBase MedGemma 4BFine-tunedDelta
SOAP completeness (4/4 sections)6/1010/10+67%
ICD-10 exact code match4/109/10+125%
Structured output consistency3/1010/10+233%
Drug name extraction5/109/10+80%

Fine-Tuning Configuration

Base Model

MedGemma 4B IT

Method

LoRA (r=16, a=32)

Target Modules

q/k/v/o_proj

Trainable Params

~0.5%

Training Data

54 SOAP pairs

Epochs

3

Learning Rate

2e-4

Precision

BF16 + NF4