Offline AI-Powered Healthcare Assistant — Complete Blueprint

A realistic, safe, and scalable end-to-end roadmap for building an offline AI medical decision-support system on IoT/edge devices.

⚠ CAUTION
This system is decision support only — it must never be marketed, labeled, or used as a diagnostic or treatment device without full regulatory clearance (FDA 510(k)/De Novo, CE MDR Class IIa+).

[System Architecture](#1-system-architecture)

[Model Strategy](#2-model-strategy)

[Dataset Requirements](#3-dataset-requirements)

[Training Approach](#4-training-approach)

[Offline Knowledge System](#5-offline-knowledge-system)

[Edge Deployment](#6-edge-deployment)

[Safety & Compliance](#7-safety--compliance)

[UI/UX Design](#8-uiux-design)

[Validation & Testing](#9-validation--testing)

[Scalability Roadmap](#10-scalability-roadmap)

1. System Architecture

1.1 Hardware Tiers

Tier	Device	CPU	RAM	AI Accelerator	Cost	Best For
Entry	Raspberry Pi 5 (8 GB)	Cortex-A76 × 4	8 GB	Coral USB TPU (optional)	~$80–120	Rural clinics, basic symptom checker
Mid	NVIDIA Jetson Orin Nano	Cortex-A78AE × 6	8 GB	1024 CUDA + 32 Tensor cores	~$250	LLM inference + multi-model
High	NVIDIA Jetson AGX Orin	Cortex-A78AE × 12	32–64 GB	2048 CUDA + 64 Tensor cores	~$900+	Full hybrid stack, multi-modal

⚠ TIP
Recommended starting point: NVIDIA Jetson Orin Nano — best price/performance ratio for running quantized LLMs + ML classifiers simultaneously.

1.2 High-Level Architecture Diagram

[Diagram 1 — see original .md file for interactive Mermaid diagram]

1.3 Data Flow (Request Lifecycle)

[Diagram 2 — see original .md file for interactive Mermaid diagram]

1.4 Software Stack

Layer	Technology	Purpose
OS	Ubuntu 22.04 LTS (ARM64) / JetPack 6	Stable base with long-term support
Runtime	Python 3.10+, ONNX Runtime, llama.cpp	Model inference
LLM Server	llama.cpp server / Ollama	Quantized LLM serving
Vector DB	FAISS / Hnswlib	Local embedding retrieval
Relational DB	SQLite / DuckDB	Structured medical knowledge
UI	Flask/FastAPI + HTMX or Qt/PyQt5	Lightweight local web or native UI
TTS/STT	Whisper.cpp (STT), Piper (TTS)	Voice I/O
Logging	SQLite audit log + syslog	Compliance & traceability

2. Model Strategy

2.1 Hybrid Architecture (Recommended)

⚠ IMPORTANT
No single model handles everything well. Use a hybrid approach: purpose-built ML classifiers for structured prediction + a small LLM for reasoning, explanation, and natural language interaction.

[Diagram 3 — see original .md file for interactive Mermaid diagram]

2.2 ML Classifier (Disease Prediction)

Aspect	Recommendation
Algorithm	XGBoost or LightGBM (tabular data champions)
Input	Encoded symptoms vector, vitals (temp, BP, HR, SpO₂), demographics
Output	Top-5 probable conditions with calibrated probabilities
Size	< 10 MB (fits easily on any edge device)
Inference	< 5 ms on Raspberry Pi
Alternative	Scikit-learn Random Forest for simpler deployments

2.3 LLM Selection

Model	Parameters	Quantized Size	Min RAM	Tokens/sec (Orin Nano)	Use Case
Phi-3 Mini	3.8B	~2.2 GB (Q4_K_M)	4 GB	~15–20	Best quality/size ratio
TinyLlama 1.1B	1.1B	~700 MB (Q4)	2 GB	~30–40	Fastest, RPi-compatible
Mistral 7B	7B	~4.5 GB (Q4_K_M)	8 GB	~8–12	Highest quality (Orin only)
Gemma 2 2B	2B	~1.5 GB (Q4)	3 GB	~20–25	Good multilingual
Qwen2.5 3B	3B	~2 GB (Q4)	4 GB	~15–18	Strong reasoning

⚠ TIP
Recommended: Start with Phi-3 Mini (Q4_K_M) on Jetson Orin Nano — best balance of medical reasoning quality and inference speed. Fall back to TinyLlama for RPi-only deployments.

2.4 Supporting Models

Task	Model	Size	Notes
Speech-to-Text	Whisper-small / Whisper-tiny	150 MB / 75 MB	Via whisper.cpp, runs on CPU
Text-to-Speech	Piper TTS	~20–50 MB per voice	ONNX-based, very fast
Text Embeddings	all-MiniLM-L6-v2	~80 MB	For RAG retrieval vector search

3. Dataset Requirements

3.1 Public Medical Datasets

Symptom–Disease Mapping

Dataset	Type	Records	Source	License
Columbia Disease–Symptom KB	Tabular	~150 diseases, 400+ symptoms	Columbia Univ.	Research
Symptom–Disease Dataset (Kaggle)	Tabular	~5K records, 130+ diseases	Kaggle community	CC0 / Open
DDXPlus	Tabular + Text	~1.3M synthetic patients, 49 diseases	Mila / McGill	CC-BY
MedQuAD	Q&A text	~47K Q&A pairs	NIH / NLM	Public domain
PubMedQA	Q&A text	~1K expert, 211K+ artificial	PubMed	MIT

Vitals & Clinical

Dataset	Type	Records	Source
MIMIC-IV	EHR (structured)	~430K admissions	PhysioNet (credentialed)
eICU	ICU vitals	~200K stays	PhysioNet (credentialed)
Heart Disease UCI	Vitals + outcomes	~920 records	UCI ML Repository
Diabetes 130-Hospitals	Clinical	~100K records	UCI ML Repository

Medical Knowledge for RAG

Source	Type	Use
WHO ICD-11	Disease classification	Standardized disease coding
SNOMED CT	Clinical terminology	Symptom/condition ontology
UpToDate / BMJ Best Practice	Clinical guidelines	RAG knowledge base (licensing required)
OpenMedData / WikiDoc	Articles	Open medical reference
BNF / WHO Essential Medicines	Drug reference	Medication information

3.2 Data Preparation Pipeline

[Diagram 4 — see original .md file for interactive Mermaid diagram]

3.3 Data Cleaning Guidelines

De-identification — Strip all PHI (names, dates, MRNs) per HIPAA Safe Harbor method

Standardize terminology — Map free-text symptoms to SNOMED-CT or ICD-11 codes

Handle missing values — Use clinically appropriate imputation (never mean-fill vitals blindly)

Outlier detection — Flag physiologically impossible values (e.g., HR > 300, SpO₂ > 100%)

Deduplication — Remove duplicate patient records within and across datasets

3.4 Bias Mitigation

⚠ WARNING
Medical datasets are historically biased by demographics. Failing to address this creates unsafe predictions for underrepresented populations.

Strategy	Implementation
Demographic audit	Measure performance across age, sex, ethnicity subgroups
Stratified sampling	Ensure proportional representation in train/test splits
Oversampling	SMOTE or ADASYN for underrepresented disease groups
Fairness constraints	Equalized odds or demographic parity during training
Documentation	Datasheet for Datasets (Gebru et al.) for every dataset used

4. Training Approach

4.1 Decision Framework

[Diagram 5 — see original .md file for interactive Mermaid diagram]

4.2 ML Classifier Training


# Pseudocode: Disease Prediction Classifier

1. Load datasets (DDXPlus, symptom–disease mappings)
2. Feature engineering:
   - One-hot encode symptoms (binary vector)
   - Normalize vitals (z-score within clinical ranges)
   - Encode demographics (age bins, sex)
3. Train XGBoost with:
   - objective: multi:softprob (multi-class probabilistic)
   - n_estimators: 500–1000
   - max_depth: 6–8
   - Calibrate with Platt scaling or isotonic regression
4. Evaluate: ROC-AUC (macro), sensitivity per disease, calibration plots
5. Export: ONNX format for edge deployment

4.3 LLM Strategy: RAG vs. Fine-Tuning

Approach	When to Use	Pros	Cons
RAG (Recommended)	You have curated medical guidelines	No retraining needed; updatable knowledge; traceable citations	Requires good embeddings + retrieval pipeline
Fine-tuning	Need domain-specific reasoning patterns	Better domain understanding; smaller model can punch above weight	Expensive; risk of hallucination; hard to update
Hybrid RAG + Light Fine-tune	Production systems	Best of both worlds	More complex pipeline

⚠ TIP
Start with RAG. Fine-tune only if RAG retrieval quality is insufficient after optimization. Fine-tuning on medical data carries significant hallucination risk if not done carefully.

4.4 Fine-Tuning (If Needed)


# LoRA / QLoRA Fine-Tuning Pipeline

1. Base model: Phi-3 Mini or Gemma 2 2B
2. Dataset: MedQuAD + curated clinical Q&A (≥10K examples)
3. Method: QLoRA (4-bit quantized LoRA)
   - LoRA rank: 16–64
   - Learning rate: 2e-4
   - Epochs: 3–5
   - Use PEFT + bitsandbytes
4. Hardware: Single GPU (RTX 3090/4090) or cloud A100 for training
5. Export: Merge LoRA weights → GGUF quantization → deploy via llama.cpp

4.5 Model Compression

Technique	Savings	Quality Impact	Tools
Post-Training Quantization (PTQ)	4× size reduction (FP16 → INT4)	Minimal (< 2% accuracy drop)	llama.cpp, GPTQ, AWQ
Quantization-Aware Training (QAT)	4× with less quality loss	Very low	TensorRT, AIMET
Pruning (unstructured)	50–90% sparsity	Moderate (needs fine-tuning)	Neural Magic, SparseML
Knowledge Distillation	Train smaller student model	Variable	Hugging Face, custom
ONNX Optimization	1.5–3× inference speedup	None	ONNX Runtime, graph optimizations

5. Offline Knowledge System

5.1 Architecture

[Diagram 6 — see original .md file for interactive Mermaid diagram]

5.2 Knowledge Base Content

Category	Content	Format	Size Estimate
Disease profiles	~500–1000 conditions with symptoms, risk factors, epidemiology	SQLite rows + text chunks	~50 MB
Clinical guidelines	WHO, national treatment protocols	Chunked text (512 tokens)	~200 MB
Drug reference	Essential medicines, interactions, contraindications	SQLite table	~30 MB
First-aid protocols	Emergency procedures, triage	Structured JSON	~5 MB
ICD-11 / SNOMED mapping	Standardized terminology	SQLite FTS5	~100 MB

Total knowledge base: ~400 MB — easily fits on any edge device.

5.3 RAG Implementation


# Pseudo-implementation of local RAG pipeline

# 1. Offline Indexing (done once during setup)
from sentence_transformers import SentenceTransformer
import faiss, sqlite3

embedder = SentenceTransformer("all-MiniLM-L6-v2")
chunks = load_medical_chunks_from_sqlite("knowledge.db")
vectors = embedder.encode(chunks)
index = faiss.IndexFlatIP(384)  # Inner product for normalized vectors
index.add(vectors)
faiss.write_index(index, "medical_index.faiss")

# 2. Runtime Retrieval
def retrieve(query: str, top_k: int = 5) -> list[str]:
    q_vec = embedder.encode([query])
    scores, indices = index.search(q_vec, top_k)
    return [chunks[i] for i in indices[0]]

# 3. LLM Prompting with Retrieved Context
def generate_response(query: str) -> str:
    context = retrieve(query)
    prompt = f"""You are a medical decision-support assistant.
Based ONLY on the following medical references, answer the query.
Always state your confidence level and cite the source.
NEVER provide a diagnosis — only suggest possible conditions.

References:
{chr(10).join(context)}

Query: {query}

Response:"""
    return llm.generate(prompt)

5.4 Explainability Layer

Component	Implementation	Purpose
Feature Attribution	SHAP values for ML classifier	"Fever and cough contributed most to this prediction"
Source Citation	RAG chunk IDs → original guideline	"Based on WHO Malaria Treatment Guidelines (2023)"
Confidence Score	Calibrated probability from classifier + LLM self-assessment	"78% confidence (moderate)"
Reasoning Chain	LLM chain-of-thought prompting	Step-by-step reasoning visible to clinician
Differential Summary	Top-3 conditions with distinguishing features	"Consider X, Y, Z — differentiated by..."

6. Edge Deployment

6.1 Deployment Pipeline

[Diagram 7 — see original .md file for interactive Mermaid diagram]

6.2 Step-by-Step Deployment

Step 1: Prepare Models


# Quantize LLM to GGUF (4-bit)
python llama.cpp/convert.py phi-3-mini/ --outfile phi3-mini-f16.gguf
./llama.cpp/quantize phi3-mini-f16.gguf phi3-mini-q4_k_m.gguf Q4_K_M

# Export ML classifier to ONNX
python -c "
import xgboost, onnxmltools
model = xgboost.Booster(model_file='disease_classifier.json')
onnx_model = onnxmltools.convert_xgboost(model)
onnxmltools.utils.save_model(onnx_model, 'disease_classifier.onnx')
"

Step 2: Set Up Device


# Jetson Orin Nano setup
sudo apt update && sudo apt install -y python3-pip cmake
pip3 install onnxruntime-gpu faiss-cpu flask piper-tts

# Build llama.cpp with CUDA
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && mkdir build && cd build
cmake .. -DGGML_CUDA=ON && cmake --build . -j$(nproc)

Step 3: Package Application


# Docker compose for all services
# docker-compose.yml
services:
  llm:
    image: healthcare-llm:latest
    runtime: nvidia
    volumes:
      - ./models:/models
    command: ./llama-server -m /models/phi3-mini-q4_k_m.gguf -c 2048

  app:
    image: healthcare-app:latest
    ports:
      - "8080:8080"
    volumes:
      - ./knowledge:/knowledge
    depends_on:
      - llm

6.3 Inference Frameworks

Framework	Best For	GPU Support	Quantization	Notes
llama.cpp	LLM inference	CUDA, Metal	GGUF (Q2–Q8)	Best for quantized LLMs
ONNX Runtime	ML classifiers, embeddings	CUDA, TensorRT EP	INT8, FP16	Universal model format
TensorRT	Max GPU performance	NVIDIA only	INT8, FP16	Best perf on Jetson
TFLite	Coral TPU / RPi	EdgeTPU, CPU	INT8	Google ecosystem
Apache TVM	Custom hardware	Various	INT8, FP16	Compiler-based optimization

6.4 Memory & CPU Optimization

Strategy	Technique	Impact
Model sharing	mmap model files, share across processes	30–50% RAM savings
Batch size = 1	Optimize for single-user latency	Lowest latency
Context window	Limit LLM context to 1024–2048 tokens	50% less VRAM
Swap management	ZRAM compressed swap (4 GB)	Prevents OOM
Process priority	`nice -n -10` for inference, `ionice` for DB	Consistent latency
Model lazy-loading	Load embedding model only when RAG is triggered	Save idle RAM
Thermal management	Active cooling + thermal throttle monitoring	Sustained performance

7. Safety & Compliance

⚠ CAUTION
This section is critical. An improperly classified or marketed system can lead to regulatory action, patient harm, and legal liability.

7.1 Regulatory Classification

Jurisdiction	Classification	Pathway	Notes
USA (FDA)	Class II (if CDS exempt) or Class I	CDS exemption under 21st Century Cures Act §3060(a)	Must meet all 4 CDS criteria
EU (MDR)	Class IIa (Rule 11)	CE marking, notified body	MDCG 2019-11 guidance
India (CDSCO)	Class B (SaMD)	CDSCO SaMD guidance	Evolving framework

7.2 FDA Clinical Decision Support (CDS) Exemption Criteria

To qualify as non-device CDS (exempt from FDA regulation), ALL FOUR must be met:

#	Criterion	How This System Complies
1	Not intended to acquire, process, or analyze medical images or signals	✅ No imaging/signal processing — text + vitals input only
2	Intended for HCPs or patients with disclosed logic	✅ Explainability layer shows reasoning
3	Intended for HCPs/patients to independently review the basis of recommendations	✅ Citations, confidence scores, differential reasoning provided
4	Does not replace clinical judgment — HCP acts as learned intermediary	✅ "Decision support only" — never provides diagnosis or treatment orders

⚠ WARNING
If any criterion is not met, the system becomes a Software as a Medical Device (SaMD) and requires FDA premarket review.

7.3 Mandatory Safety Features


┌─────────────────────────────────────────────────────────┐
│  SAFETY IMPLEMENTATION CHECKLIST                         │
├─────────────────────────────────────────────────────────┤
│  ☑ Every output includes confidence score (0–100%)      │
│  ☑ Every output includes standard disclaimer            │
│  ☑ Red-flag conditions trigger URGENT referral notice    │
│  ☑ System never uses words "diagnose" or "prescribe"    │
│  ☑ All interactions logged with timestamps               │
│  ☑ Audit trail is tamper-evident (hash-chained)         │
│  ☑ Model version and knowledge base version logged      │
│  ☑ Fail-safe: if confidence < 30%, output "Insufficient │
│    information — please consult a healthcare provider"   │
│  ☑ Emergency symptoms → immediate "SEEK EMERGENCY CARE" │
└─────────────────────────────────────────────────────────┘

7.4 Standard Disclaimer Template


╔══════════════════════════════════════════════════════════╗
║  ⚠ IMPORTANT MEDICAL DISCLAIMER                         ║
║                                                          ║
║  This tool provides DECISION SUPPORT ONLY.               ║
║  It does NOT provide medical diagnoses or treatment.     ║
║                                                          ║
║  • Results are probabilistic suggestions, not diagnoses  ║
║  • Always consult a qualified healthcare professional    ║
║  • In case of emergency, seek immediate medical care     ║
║  • This system has not been evaluated by FDA/CE as a     ║
║    medical device                                        ║
║                                                          ║
║  Confidence: [XX]%  |  Model v[X.X]  |  KB v[YYYY-MM]   ║
╚══════════════════════════════════════════════════════════╝

7.5 Data Privacy (HIPAA-Aligned Principles)

Principle	Implementation
Data minimization	Collect only clinically necessary inputs; no PII stored
Local-only processing	All data stays on device — no cloud, no telemetry
Encryption at rest	LUKS full-disk encryption on edge device
Access control	PIN/biometric auth for healthcare worker access
Audit logging	Every query/response logged with timestamp, user ID
Data retention	Configurable auto-purge (default: 30 days)
Physical security	Tamper-evident enclosure, Kensington lock

8. UI/UX Design

8.1 Design Principles

Clinical simplicity — No visual clutter; every element serves a purpose

Glanceable results — Risk level visible in < 2 seconds

Accessible — Large fonts (≥16px), high contrast (WCAG AA), touch-friendly (48px targets)

Language-agnostic — Icon-heavy design, i18n-ready text

8.2 Screen Flow

[Diagram 8 — see original .md file for interactive Mermaid diagram]

8.3 Results Dashboard Layout


┌──────────────────────────────────────────────────────────┐
│  HEALTH ASSESSMENT RESULTS          [Print] [New]        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  Overall Risk Level:  🟡 MODERATE                        │
│  Confidence: 74%                                         │
│                                                          │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  POSSIBLE CONDITIONS                                │ │
│  │                                                     │ │
│  │  1. 🟠 Community-Acquired Pneumonia     62%         │ │
│  │     Key factors: fever, productive cough, crackles  │ │
│  │                                                     │ │
│  │  2. 🟡 Acute Bronchitis                 24%         │ │
│  │     Key factors: cough, low-grade fever             │ │
│  │                                                     │ │
│  │  3. 🟢 Upper Respiratory Infection       9%         │ │
│  │     Key factors: cough, rhinorrhea                  │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                          │
│  SUGGESTED NEXT STEPS:                                   │
│  • Chest X-ray recommended                               │
│  • Monitor SpO₂ — if <92%, escalate urgently             │
│  • Consider sputum culture if available                  │
│                                                          │
│  WHY THIS RESULT:                                        │
│  Fever (38.5°C) + productive cough (5 days) + bilateral  │
│  crackles on auscultation → pneumonia is most likely.    │
│  Source: WHO Pneumonia Guidelines (2023), Ch. 4          │
│                                                          │
├──────────────────────────────────────────────────────────┤
│  ⚠ DECISION SUPPORT ONLY — Not a medical diagnosis      │
│  Consult a qualified healthcare professional             │
│  Model v1.2 | KB 2025-03 | Confidence: 74%              │
└──────────────────────────────────────────────────────────┘

8.4 Input Modes

Mode	Interface	Best For
Touch Form	Checkboxes + sliders on 7–10" touchscreen	Primary input in clinics
Body Map	Tappable body diagram to indicate pain/symptoms	Intuitive symptom location
Voice	"I have a headache and fever for 3 days" → parsed by Whisper + LLM	Low-literacy users, hands-free
Sensor Auto-fill	BLE pulse oximeter, BP cuff, thermometer	Automated vitals entry

9. Validation & Testing

9.1 Validation Framework

[Diagram 9 — see original .md file for interactive Mermaid diagram]

9.2 Accuracy Metrics

Metric	Target	Measurement
Top-1 Accuracy	≥ 70%	Correct condition in first prediction
Top-3 Accuracy	≥ 85%	Correct condition in top 3
Sensitivity (per-disease)	≥ 80%	True positive rate for each condition
Specificity	≥ 90%	True negative rate
Calibration (ECE)	≤ 0.10	Expected calibration error
Red-flag sensitivity	≥ 95%	Emergency conditions never missed
Inference latency	< 10 sec	End-to-end response time
Subgroup fairness	Δ ≤ 5%	Accuracy gap across demographics

9.3 Clinical Validation Approach

Phase	Activity	Duration	Outcome
Phase 1: Retrospective	Test on labeled clinical datasets (DDXPlus, MIMIC)	2–4 weeks	Baseline accuracy metrics
Phase 2: Expert Review	3–5 physicians evaluate 200+ system outputs for clinical appropriateness	4–6 weeks	Inter-rater agreement (Cohen's κ)
Phase 3: Prospective Pilot	Deploy in 2–3 clinics alongside standard care (shadow mode)	3–6 months	Real-world concordance with physician diagnosis
Phase 4: Outcome Tracking	Monitor patient outcomes where system was consulted	6–12 months	Safety signal detection

9.4 Testing Checklist


TECHNICAL TESTS:
☐ Model unit tests (known input → expected output)
☐ RAG retrieval accuracy (relevant chunks retrieved)
☐ Edge case handling (empty input, contradictory symptoms)
☐ Stress test (100 sequential queries, measure latency drift)
☐ Memory leak test (24-hour continuous operation)
☐ Power failure recovery (graceful restart)
☐ Encryption verification

CLINICAL TESTS:
☐ Emergency condition detection (chest pain, stroke symptoms)
☐ Rare disease handling (appropriate "low confidence" response)
☐ Pediatric vs. adult differentiation
☐ Pregnancy-aware recommendations
☐ Drug interaction warnings (if medication module included)
☐ Contradictory symptom handling

10. Scalability Roadmap

10.1 Phased Rollout

[Diagram 10 — see original .md file for interactive Mermaid diagram]

10.2 Feature Roadmap

Phase	Feature	Description
P1 MVP	Symptom checker	Text input → top-5 conditions + confidence
P1 MVP	Vitals assessment	Manual entry of temp, BP, HR, SpO₂
P2	Voice I/O	Speak symptoms, hear results (multilingual)
P2	BLE sensors	Auto-capture from pulse oximeter, thermometer, BP cuff
P2	Maternal health module	Pregnancy risk assessment, ANC protocols
P3	Multi-language	Hindi, Swahili, Spanish, French, Arabic (via multilingual LLM)
P3	Mesh OTA updates	Device-to-device model/KB updates without internet
P3	Chronic disease tracking	Longitudinal patient records (encrypted local)
P4	Federated learning	Aggregate learning across devices (privacy-preserving)
P4	Specialty modules	Dermatology (image), ophthalmology, mental health
P4	Fleet management	Remote monitoring, batch updates, analytics dashboard

10.3 Sensor Integration Plan

[Diagram 11 — see original .md file for interactive Mermaid diagram]

10.4 Continuous Model Improvement

Method	Internet Required	Description
OTA via Mesh	❌ No	Transfer updated models via Wi-Fi Direct / BLE mesh between devices
USB updates	❌ No	Field workers carry USB with model/KB updates
Federated Learning	⚠️ Periodic	Devices train locally, share only gradients (not data) when connectivity available
Feedback loop	❌ No	Clinicians mark predictions as correct/incorrect → stored locally for future retraining

Appendix A: Bill of Materials (Entry Kit)

Item	Est. Cost (USD)
NVIDIA Jetson Orin Nano Developer Kit	$250
7" Touchscreen Display	$50
256 GB NVMe SSD	$30
Active Cooling Fan + Heatsink	$15
BLE Pulse Oximeter (medical grade)	$40
Protective Enclosure (3D printed)	$20
UPS / Battery Backup (4 hrs)	$60
Total	~$465

Appendix B: Key Open-Source Tools

Tool	Purpose	License
[llama.cpp](https://github.com/ggerganov/llama.cpp)	LLM inference engine	MIT
[ONNX Runtime](https://onnxruntime.ai)	ML model inference	MIT
[FAISS](https://github.com/facebookresearch/faiss)	Vector similarity search	MIT
[Whisper.cpp](https://github.com/ggerganov/whisper.cpp)	Speech-to-text	MIT
[Piper](https://github.com/rhasspy/piper)	Text-to-speech	MIT
[Sentence-Transformers](https://www.sbert.net)	Text embeddings	Apache 2.0
[XGBoost](https://xgboost.readthedocs.io)	Gradient boosting classifier	Apache 2.0
[SHAP](https://shap.readthedocs.io)	Model explainability	MIT
[Flask](https://flask.palletsprojects.com)	Web UI framework	BSD

⚠ NOTE
This blueprint is a living document. Revisit each section as you progress through development phases. Start with Phase 1 (MVP) and validate before adding complexity.