Offline AI-Powered Healthcare Assistant — Complete Blueprint

A realistic, safe, and scalable end-to-end roadmap for building an offline AI medical decision-support system on IoT/edge devices.

⚠ CAUTION
This system is decision support only — it must never be marketed, labeled, or used as a diagnostic or treatment device without full regulatory clearance (FDA 510(k)/De Novo, CE MDR Class IIa+).


Table of Contents

  • [System Architecture](#1-system-architecture)
  • [Model Strategy](#2-model-strategy)
  • [Dataset Requirements](#3-dataset-requirements)
  • [Training Approach](#4-training-approach)
  • [Offline Knowledge System](#5-offline-knowledge-system)
  • [Edge Deployment](#6-edge-deployment)
  • [Safety & Compliance](#7-safety--compliance)
  • [UI/UX Design](#8-uiux-design)
  • [Validation & Testing](#9-validation--testing)
  • [Scalability Roadmap](#10-scalability-roadmap)

  • 1. System Architecture

    1.1 Hardware Tiers

    TierDeviceCPURAMAI AcceleratorCostBest For
    **Entry**Raspberry Pi 5 (8 GB)Cortex-A76 × 48 GBCoral USB TPU (optional)~$80–120Rural clinics, basic symptom checker
    **Mid**NVIDIA Jetson Orin NanoCortex-A78AE × 68 GB1024 CUDA + 32 Tensor cores~$250LLM inference + multi-model
    **High**NVIDIA Jetson AGX OrinCortex-A78AE × 1232–64 GB2048 CUDA + 64 Tensor cores~$900+Full hybrid stack, multi-modal

    ⚠ TIP
    Recommended starting point: NVIDIA Jetson Orin Nano — best price/performance ratio for running quantized LLMs + ML classifiers simultaneously.

    1.2 High-Level Architecture Diagram

    [Diagram 1 — see original .md file for interactive Mermaid diagram]

    1.3 Data Flow (Request Lifecycle)

    [Diagram 2 — see original .md file for interactive Mermaid diagram]

    1.4 Software Stack

    LayerTechnologyPurpose
    OSUbuntu 22.04 LTS (ARM64) / JetPack 6Stable base with long-term support
    RuntimePython 3.10+, ONNX Runtime, llama.cppModel inference
    LLM Serverllama.cpp server / OllamaQuantized LLM serving
    Vector DBFAISS / HnswlibLocal embedding retrieval
    Relational DBSQLite / DuckDBStructured medical knowledge
    UIFlask/FastAPI + HTMX or Qt/PyQt5Lightweight local web or native UI
    TTS/STTWhisper.cpp (STT), Piper (TTS)Voice I/O
    LoggingSQLite audit log + syslogCompliance & traceability

    2. Model Strategy

    2.1 Hybrid Architecture (Recommended)

    ⚠ IMPORTANT
    No single model handles everything well. Use a hybrid approach: purpose-built ML classifiers for structured prediction + a small LLM for reasoning, explanation, and natural language interaction.

    [Diagram 3 — see original .md file for interactive Mermaid diagram]

    2.2 ML Classifier (Disease Prediction)

    AspectRecommendation
    **Algorithm**XGBoost or LightGBM (tabular data champions)
    **Input**Encoded symptoms vector, vitals (temp, BP, HR, SpO₂), demographics
    **Output**Top-5 probable conditions with calibrated probabilities
    **Size**< 10 MB (fits easily on any edge device)
    **Inference**< 5 ms on Raspberry Pi
    **Alternative**Scikit-learn Random Forest for simpler deployments

    2.3 LLM Selection

    ModelParametersQuantized SizeMin RAMTokens/sec (Orin Nano)Use Case
    **Phi-3 Mini**3.8B~2.2 GB (Q4_K_M)4 GB~15–20Best quality/size ratio
    **TinyLlama 1.1B**1.1B~700 MB (Q4)2 GB~30–40Fastest, RPi-compatible
    **Mistral 7B**7B~4.5 GB (Q4_K_M)8 GB~8–12Highest quality (Orin only)
    **Gemma 2 2B**2B~1.5 GB (Q4)3 GB~20–25Good multilingual
    **Qwen2.5 3B**3B~2 GB (Q4)4 GB~15–18Strong reasoning

    ⚠ TIP
    Recommended: Start with Phi-3 Mini (Q4_K_M) on Jetson Orin Nano — best balance of medical reasoning quality and inference speed. Fall back to TinyLlama for RPi-only deployments.

    2.4 Supporting Models

    TaskModelSizeNotes
    Speech-to-TextWhisper-small / Whisper-tiny150 MB / 75 MBVia whisper.cpp, runs on CPU
    Text-to-SpeechPiper TTS~20–50 MB per voiceONNX-based, very fast
    Text Embeddingsall-MiniLM-L6-v2~80 MBFor RAG retrieval vector search

    3. Dataset Requirements

    3.1 Public Medical Datasets

    Symptom–Disease Mapping

    DatasetTypeRecordsSourceLicense
    **Columbia Disease–Symptom KB**Tabular~150 diseases, 400+ symptomsColumbia Univ.Research
    **Symptom–Disease Dataset (Kaggle)**Tabular~5K records, 130+ diseasesKaggle communityCC0 / Open
    **DDXPlus**Tabular + Text~1.3M synthetic patients, 49 diseasesMila / McGillCC-BY
    **MedQuAD**Q&A text~47K Q&A pairsNIH / NLMPublic domain
    **PubMedQA**Q&A text~1K expert, 211K+ artificialPubMedMIT

    Vitals & Clinical

    DatasetTypeRecordsSource
    **MIMIC-IV**EHR (structured)~430K admissionsPhysioNet (credentialed)
    **eICU**ICU vitals~200K staysPhysioNet (credentialed)
    **Heart Disease UCI**Vitals + outcomes~920 recordsUCI ML Repository
    **Diabetes 130-Hospitals**Clinical~100K recordsUCI ML Repository

    Medical Knowledge for RAG

    SourceTypeUse
    **WHO ICD-11**Disease classificationStandardized disease coding
    **SNOMED CT**Clinical terminologySymptom/condition ontology
    **UpToDate / BMJ Best Practice**Clinical guidelinesRAG knowledge base (licensing required)
    **OpenMedData / WikiDoc**ArticlesOpen medical reference
    **BNF / WHO Essential Medicines**Drug referenceMedication information

    3.2 Data Preparation Pipeline

    [Diagram 4 — see original .md file for interactive Mermaid diagram]

    3.3 Data Cleaning Guidelines

  • De-identification — Strip all PHI (names, dates, MRNs) per HIPAA Safe Harbor method
  • Standardize terminology — Map free-text symptoms to SNOMED-CT or ICD-11 codes
  • Handle missing values — Use clinically appropriate imputation (never mean-fill vitals blindly)
  • Outlier detection — Flag physiologically impossible values (e.g., HR > 300, SpO₂ > 100%)
  • Deduplication — Remove duplicate patient records within and across datasets
  • 3.4 Bias Mitigation

    ⚠ WARNING
    Medical datasets are historically biased by demographics. Failing to address this creates unsafe predictions for underrepresented populations.

    StrategyImplementation
    **Demographic audit**Measure performance across age, sex, ethnicity subgroups
    **Stratified sampling**Ensure proportional representation in train/test splits
    **Oversampling**SMOTE or ADASYN for underrepresented disease groups
    **Fairness constraints**Equalized odds or demographic parity during training
    **Documentation**Datasheet for Datasets (Gebru et al.) for every dataset used

    4. Training Approach

    4.1 Decision Framework

    [Diagram 5 — see original .md file for interactive Mermaid diagram]

    4.2 ML Classifier Training

    
    # Pseudocode: Disease Prediction Classifier
    
    1. Load datasets (DDXPlus, symptom–disease mappings)
    2. Feature engineering:
       - One-hot encode symptoms (binary vector)
       - Normalize vitals (z-score within clinical ranges)
       - Encode demographics (age bins, sex)
    3. Train XGBoost with:
       - objective: multi:softprob (multi-class probabilistic)
       - n_estimators: 500–1000
       - max_depth: 6–8
       - Calibrate with Platt scaling or isotonic regression
    4. Evaluate: ROC-AUC (macro), sensitivity per disease, calibration plots
    5. Export: ONNX format for edge deployment
    

    4.3 LLM Strategy: RAG vs. Fine-Tuning

    ApproachWhen to UseProsCons
    **RAG (Recommended)**You have curated medical guidelinesNo retraining needed; updatable knowledge; traceable citationsRequires good embeddings + retrieval pipeline
    **Fine-tuning**Need domain-specific reasoning patternsBetter domain understanding; smaller model can punch above weightExpensive; risk of hallucination; hard to update
    **Hybrid RAG + Light Fine-tune**Production systemsBest of both worldsMore complex pipeline

    ⚠ TIP
    Start with RAG. Fine-tune only if RAG retrieval quality is insufficient after optimization. Fine-tuning on medical data carries significant hallucination risk if not done carefully.

    4.4 Fine-Tuning (If Needed)

    
    # LoRA / QLoRA Fine-Tuning Pipeline
    
    1. Base model: Phi-3 Mini or Gemma 2 2B
    2. Dataset: MedQuAD + curated clinical Q&A (≥10K examples)
    3. Method: QLoRA (4-bit quantized LoRA)
       - LoRA rank: 16–64
       - Learning rate: 2e-4
       - Epochs: 3–5
       - Use PEFT + bitsandbytes
    4. Hardware: Single GPU (RTX 3090/4090) or cloud A100 for training
    5. Export: Merge LoRA weights → GGUF quantization → deploy via llama.cpp
    

    4.5 Model Compression

    TechniqueSavingsQuality ImpactTools
    **Post-Training Quantization (PTQ)**4× size reduction (FP16 → INT4)Minimal (< 2% accuracy drop)llama.cpp, GPTQ, AWQ
    **Quantization-Aware Training (QAT)**4× with less quality lossVery lowTensorRT, AIMET
    **Pruning (unstructured)**50–90% sparsityModerate (needs fine-tuning)Neural Magic, SparseML
    **Knowledge Distillation**Train smaller student modelVariableHugging Face, custom
    **ONNX Optimization**1.5–3× inference speedupNoneONNX Runtime, graph optimizations

    5. Offline Knowledge System

    5.1 Architecture

    [Diagram 6 — see original .md file for interactive Mermaid diagram]

    5.2 Knowledge Base Content

    CategoryContentFormatSize Estimate
    Disease profiles~500–1000 conditions with symptoms, risk factors, epidemiologySQLite rows + text chunks~50 MB
    Clinical guidelinesWHO, national treatment protocolsChunked text (512 tokens)~200 MB
    Drug referenceEssential medicines, interactions, contraindicationsSQLite table~30 MB
    First-aid protocolsEmergency procedures, triageStructured JSON~5 MB
    ICD-11 / SNOMED mappingStandardized terminologySQLite FTS5~100 MB

    Total knowledge base: ~400 MB — easily fits on any edge device.

    5.3 RAG Implementation

    
    # Pseudo-implementation of local RAG pipeline
    
    # 1. Offline Indexing (done once during setup)
    from sentence_transformers import SentenceTransformer
    import faiss, sqlite3
    
    embedder = SentenceTransformer("all-MiniLM-L6-v2")
    chunks = load_medical_chunks_from_sqlite("knowledge.db")
    vectors = embedder.encode(chunks)
    index = faiss.IndexFlatIP(384)  # Inner product for normalized vectors
    index.add(vectors)
    faiss.write_index(index, "medical_index.faiss")
    
    # 2. Runtime Retrieval
    def retrieve(query: str, top_k: int = 5) -> list[str]:
        q_vec = embedder.encode([query])
        scores, indices = index.search(q_vec, top_k)
        return [chunks[i] for i in indices[0]]
    
    # 3. LLM Prompting with Retrieved Context
    def generate_response(query: str) -> str:
        context = retrieve(query)
        prompt = f"""You are a medical decision-support assistant.
    Based ONLY on the following medical references, answer the query.
    Always state your confidence level and cite the source.
    NEVER provide a diagnosis — only suggest possible conditions.
    
    References:
    {chr(10).join(context)}
    
    Query: {query}
    
    Response:"""
        return llm.generate(prompt)
    

    5.4 Explainability Layer

    ComponentImplementationPurpose
    **Feature Attribution**SHAP values for ML classifier"Fever and cough contributed most to this prediction"
    **Source Citation**RAG chunk IDs → original guideline"Based on WHO Malaria Treatment Guidelines (2023)"
    **Confidence Score**Calibrated probability from classifier + LLM self-assessment"78% confidence (moderate)"
    **Reasoning Chain**LLM chain-of-thought promptingStep-by-step reasoning visible to clinician
    **Differential Summary**Top-3 conditions with distinguishing features"Consider X, Y, Z — differentiated by..."

    6. Edge Deployment

    6.1 Deployment Pipeline

    [Diagram 7 — see original .md file for interactive Mermaid diagram]

    6.2 Step-by-Step Deployment

    Step 1: Prepare Models

    
    # Quantize LLM to GGUF (4-bit)
    python llama.cpp/convert.py phi-3-mini/ --outfile phi3-mini-f16.gguf
    ./llama.cpp/quantize phi3-mini-f16.gguf phi3-mini-q4_k_m.gguf Q4_K_M
    
    # Export ML classifier to ONNX
    python -c "
    import xgboost, onnxmltools
    model = xgboost.Booster(model_file='disease_classifier.json')
    onnx_model = onnxmltools.convert_xgboost(model)
    onnxmltools.utils.save_model(onnx_model, 'disease_classifier.onnx')
    "
    

    Step 2: Set Up Device

    
    # Jetson Orin Nano setup
    sudo apt update && sudo apt install -y python3-pip cmake
    pip3 install onnxruntime-gpu faiss-cpu flask piper-tts
    
    # Build llama.cpp with CUDA
    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp && mkdir build && cd build
    cmake .. -DGGML_CUDA=ON && cmake --build . -j$(nproc)
    

    Step 3: Package Application

    
    # Docker compose for all services
    # docker-compose.yml
    services:
      llm:
        image: healthcare-llm:latest
        runtime: nvidia
        volumes:
          - ./models:/models
        command: ./llama-server -m /models/phi3-mini-q4_k_m.gguf -c 2048
    
      app:
        image: healthcare-app:latest
        ports:
          - "8080:8080"
        volumes:
          - ./knowledge:/knowledge
        depends_on:
          - llm
    

    6.3 Inference Frameworks

    FrameworkBest ForGPU SupportQuantizationNotes
    **llama.cpp**LLM inferenceCUDA, MetalGGUF (Q2–Q8)Best for quantized LLMs
    **ONNX Runtime**ML classifiers, embeddingsCUDA, TensorRT EPINT8, FP16Universal model format
    **TensorRT**Max GPU performanceNVIDIA onlyINT8, FP16Best perf on Jetson
    **TFLite**Coral TPU / RPiEdgeTPU, CPUINT8Google ecosystem
    **Apache TVM**Custom hardwareVariousINT8, FP16Compiler-based optimization

    6.4 Memory & CPU Optimization

    StrategyTechniqueImpact
    **Model sharing**mmap model files, share across processes30–50% RAM savings
    **Batch size = 1**Optimize for single-user latencyLowest latency
    **Context window**Limit LLM context to 1024–2048 tokens50% less VRAM
    **Swap management**ZRAM compressed swap (4 GB)Prevents OOM
    **Process priority**`nice -n -10` for inference, `ionice` for DBConsistent latency
    **Model lazy-loading**Load embedding model only when RAG is triggeredSave idle RAM
    **Thermal management**Active cooling + thermal throttle monitoringSustained performance

    7. Safety & Compliance

    ⚠ CAUTION
    This section is critical. An improperly classified or marketed system can lead to regulatory action, patient harm, and legal liability.

    7.1 Regulatory Classification

    JurisdictionClassificationPathwayNotes
    **USA (FDA)**Class II (if CDS exempt) or Class ICDS exemption under 21st Century Cures Act §3060(a)Must meet all 4 CDS criteria
    **EU (MDR)**Class IIa (Rule 11)CE marking, notified bodyMDCG 2019-11 guidance
    **India (CDSCO)**Class B (SaMD)CDSCO SaMD guidanceEvolving framework

    7.2 FDA Clinical Decision Support (CDS) Exemption Criteria

    To qualify as non-device CDS (exempt from FDA regulation), ALL FOUR must be met:

    #CriterionHow This System Complies
    1Not intended to acquire, process, or analyze medical images or signals✅ No imaging/signal processing — text + vitals input only
    2Intended for HCPs or patients with disclosed logic✅ Explainability layer shows reasoning
    3Intended for HCPs/patients to independently review the basis of recommendations✅ Citations, confidence scores, differential reasoning provided
    4Does not replace clinical judgment — HCP acts as learned intermediary✅ "Decision support only" — never provides diagnosis or treatment orders

    ⚠ WARNING
    If any criterion is not met, the system becomes a Software as a Medical Device (SaMD) and requires FDA premarket review.

    7.3 Mandatory Safety Features

    
    ┌─────────────────────────────────────────────────────────┐
    │  SAFETY IMPLEMENTATION CHECKLIST                         │
    ├─────────────────────────────────────────────────────────┤
    │  ☑ Every output includes confidence score (0–100%)      │
    │  ☑ Every output includes standard disclaimer            │
    │  ☑ Red-flag conditions trigger URGENT referral notice    │
    │  ☑ System never uses words "diagnose" or "prescribe"    │
    │  ☑ All interactions logged with timestamps               │
    │  ☑ Audit trail is tamper-evident (hash-chained)         │
    │  ☑ Model version and knowledge base version logged      │
    │  ☑ Fail-safe: if confidence < 30%, output "Insufficient │
    │    information — please consult a healthcare provider"   │
    │  ☑ Emergency symptoms → immediate "SEEK EMERGENCY CARE" │
    └─────────────────────────────────────────────────────────┘
    

    7.4 Standard Disclaimer Template

    
    ╔══════════════════════════════════════════════════════════╗
    ║  ⚠ IMPORTANT MEDICAL DISCLAIMER                         ║
    ║                                                          ║
    ║  This tool provides DECISION SUPPORT ONLY.               ║
    ║  It does NOT provide medical diagnoses or treatment.     ║
    ║                                                          ║
    ║  • Results are probabilistic suggestions, not diagnoses  ║
    ║  • Always consult a qualified healthcare professional    ║
    ║  • In case of emergency, seek immediate medical care     ║
    ║  • This system has not been evaluated by FDA/CE as a     ║
    ║    medical device                                        ║
    ║                                                          ║
    ║  Confidence: [XX]%  |  Model v[X.X]  |  KB v[YYYY-MM]   ║
    ╚══════════════════════════════════════════════════════════╝
    

    7.5 Data Privacy (HIPAA-Aligned Principles)

    PrincipleImplementation
    **Data minimization**Collect only clinically necessary inputs; no PII stored
    **Local-only processing**All data stays on device — no cloud, no telemetry
    **Encryption at rest**LUKS full-disk encryption on edge device
    **Access control**PIN/biometric auth for healthcare worker access
    **Audit logging**Every query/response logged with timestamp, user ID
    **Data retention**Configurable auto-purge (default: 30 days)
    **Physical security**Tamper-evident enclosure, Kensington lock

    8. UI/UX Design

    8.1 Design Principles

  • Clinical simplicity — No visual clutter; every element serves a purpose
  • Glanceable results — Risk level visible in < 2 seconds
  • Accessible — Large fonts (≥16px), high contrast (WCAG AA), touch-friendly (48px targets)
  • Language-agnostic — Icon-heavy design, i18n-ready text
  • 8.2 Screen Flow

    [Diagram 8 — see original .md file for interactive Mermaid diagram]

    8.3 Results Dashboard Layout

    
    ┌──────────────────────────────────────────────────────────┐
    │  HEALTH ASSESSMENT RESULTS          [Print] [New]        │
    ├──────────────────────────────────────────────────────────┤
    │                                                          │
    │  Overall Risk Level:  🟡 MODERATE                        │
    │  Confidence: 74%                                         │
    │                                                          │
    │  ┌─────────────────────────────────────────────────────┐ │
    │  │  POSSIBLE CONDITIONS                                │ │
    │  │                                                     │ │
    │  │  1. 🟠 Community-Acquired Pneumonia     62%         │ │
    │  │     Key factors: fever, productive cough, crackles  │ │
    │  │                                                     │ │
    │  │  2. 🟡 Acute Bronchitis                 24%         │ │
    │  │     Key factors: cough, low-grade fever             │ │
    │  │                                                     │ │
    │  │  3. 🟢 Upper Respiratory Infection       9%         │ │
    │  │     Key factors: cough, rhinorrhea                  │ │
    │  └─────────────────────────────────────────────────────┘ │
    │                                                          │
    │  SUGGESTED NEXT STEPS:                                   │
    │  • Chest X-ray recommended                               │
    │  • Monitor SpO₂ — if <92%, escalate urgently             │
    │  • Consider sputum culture if available                  │
    │                                                          │
    │  WHY THIS RESULT:                                        │
    │  Fever (38.5°C) + productive cough (5 days) + bilateral  │
    │  crackles on auscultation → pneumonia is most likely.    │
    │  Source: WHO Pneumonia Guidelines (2023), Ch. 4          │
    │                                                          │
    ├──────────────────────────────────────────────────────────┤
    │  ⚠ DECISION SUPPORT ONLY — Not a medical diagnosis      │
    │  Consult a qualified healthcare professional             │
    │  Model v1.2 | KB 2025-03 | Confidence: 74%              │
    └──────────────────────────────────────────────────────────┘
    

    8.4 Input Modes

    ModeInterfaceBest For
    **Touch Form**Checkboxes + sliders on 7–10" touchscreenPrimary input in clinics
    **Body Map**Tappable body diagram to indicate pain/symptomsIntuitive symptom location
    **Voice**"I have a headache and fever for 3 days" → parsed by Whisper + LLMLow-literacy users, hands-free
    **Sensor Auto-fill**BLE pulse oximeter, BP cuff, thermometerAutomated vitals entry

    9. Validation & Testing

    9.1 Validation Framework

    [Diagram 9 — see original .md file for interactive Mermaid diagram]

    9.2 Accuracy Metrics

    MetricTargetMeasurement
    **Top-1 Accuracy**≥ 70%Correct condition in first prediction
    **Top-3 Accuracy**≥ 85%Correct condition in top 3
    **Sensitivity (per-disease)**≥ 80%True positive rate for each condition
    **Specificity**≥ 90%True negative rate
    **Calibration (ECE)**≤ 0.10Expected calibration error
    **Red-flag sensitivity**≥ 95%Emergency conditions never missed
    **Inference latency**< 10 secEnd-to-end response time
    **Subgroup fairness**Δ ≤ 5%Accuracy gap across demographics

    9.3 Clinical Validation Approach

    PhaseActivityDurationOutcome
    **Phase 1: Retrospective**Test on labeled clinical datasets (DDXPlus, MIMIC)2–4 weeksBaseline accuracy metrics
    **Phase 2: Expert Review**3–5 physicians evaluate 200+ system outputs for clinical appropriateness4–6 weeksInter-rater agreement (Cohen's κ)
    **Phase 3: Prospective Pilot**Deploy in 2–3 clinics alongside standard care (shadow mode)3–6 monthsReal-world concordance with physician diagnosis
    **Phase 4: Outcome Tracking**Monitor patient outcomes where system was consulted6–12 monthsSafety signal detection

    9.4 Testing Checklist

    
    TECHNICAL TESTS:
    ☐ Model unit tests (known input → expected output)
    ☐ RAG retrieval accuracy (relevant chunks retrieved)
    ☐ Edge case handling (empty input, contradictory symptoms)
    ☐ Stress test (100 sequential queries, measure latency drift)
    ☐ Memory leak test (24-hour continuous operation)
    ☐ Power failure recovery (graceful restart)
    ☐ Encryption verification
    
    CLINICAL TESTS:
    ☐ Emergency condition detection (chest pain, stroke symptoms)
    ☐ Rare disease handling (appropriate "low confidence" response)
    ☐ Pediatric vs. adult differentiation
    ☐ Pregnancy-aware recommendations
    ☐ Drug interaction warnings (if medication module included)
    ☐ Contradictory symptom handling
    

    10. Scalability Roadmap

    10.1 Phased Rollout

    [Diagram 10 — see original .md file for interactive Mermaid diagram]

    10.2 Feature Roadmap

    PhaseFeatureDescription
    **P1 MVP**Symptom checkerText input → top-5 conditions + confidence
    **P1 MVP**Vitals assessmentManual entry of temp, BP, HR, SpO₂
    **P2**Voice I/OSpeak symptoms, hear results (multilingual)
    **P2**BLE sensorsAuto-capture from pulse oximeter, thermometer, BP cuff
    **P2**Maternal health modulePregnancy risk assessment, ANC protocols
    **P3**Multi-languageHindi, Swahili, Spanish, French, Arabic (via multilingual LLM)
    **P3**Mesh OTA updatesDevice-to-device model/KB updates without internet
    **P3**Chronic disease trackingLongitudinal patient records (encrypted local)
    **P4**Federated learningAggregate learning across devices (privacy-preserving)
    **P4**Specialty modulesDermatology (image), ophthalmology, mental health
    **P4**Fleet managementRemote monitoring, batch updates, analytics dashboard

    10.3 Sensor Integration Plan

    [Diagram 11 — see original .md file for interactive Mermaid diagram]

    10.4 Continuous Model Improvement

    MethodInternet RequiredDescription
    **OTA via Mesh**❌ NoTransfer updated models via Wi-Fi Direct / BLE mesh between devices
    **USB updates**❌ NoField workers carry USB with model/KB updates
    **Federated Learning**⚠️ PeriodicDevices train locally, share only gradients (not data) when connectivity available
    **Feedback loop**❌ NoClinicians mark predictions as correct/incorrect → stored locally for future retraining

    Appendix A: Bill of Materials (Entry Kit)

    ItemEst. Cost (USD)
    NVIDIA Jetson Orin Nano Developer Kit$250
    7" Touchscreen Display$50
    256 GB NVMe SSD$30
    Active Cooling Fan + Heatsink$15
    BLE Pulse Oximeter (medical grade)$40
    Protective Enclosure (3D printed)$20
    UPS / Battery Backup (4 hrs)$60
    **Total****~$465**

    Appendix B: Key Open-Source Tools

    ToolPurposeLicense
    [llama.cpp](https://github.com/ggerganov/llama.cpp)LLM inference engineMIT
    [ONNX Runtime](https://onnxruntime.ai)ML model inferenceMIT
    [FAISS](https://github.com/facebookresearch/faiss)Vector similarity searchMIT
    [Whisper.cpp](https://github.com/ggerganov/whisper.cpp)Speech-to-textMIT
    [Piper](https://github.com/rhasspy/piper)Text-to-speechMIT
    [Sentence-Transformers](https://www.sbert.net)Text embeddingsApache 2.0
    [XGBoost](https://xgboost.readthedocs.io)Gradient boosting classifierApache 2.0
    [SHAP](https://shap.readthedocs.io)Model explainabilityMIT
    [Flask](https://flask.palletsprojects.com)Web UI frameworkBSD

    ⚠ NOTE
    This blueprint is a living document. Revisit each section as you progress through development phases. Start with Phase 1 (MVP) and validate before adding complexity.