HIPAA and LLMs: The Engineering Guide to Compliant AI in Healthcare

I sat in a meeting where a Compliance Officer told an Engineering Director: "We can't use an LLM because we can't control what it outputs."

This is a fundamental misunderstanding of how HIPAA applies to software.

HIPAA doesn't demand that your software be deterministic. It doesn't ban probabilistic systems. It demands that you protect Protected Health Information (PHI) throughout its lifecycle-in storage, in transit, and in processing.

If you can architect your AI application so that PHI never reaches the LLM in identifiable form, you've solved the compliance problem without abandoning the technology.

The Two Failure Modes

Healthcare teams building AI applications typically fall into one of two traps:

The Wild West: "We signed a BAA with OpenAI, so we can send them everything."

A Business Associate Agreement is necessary for using a cloud LLM with PHI, but it's not a Get Out of Jail Free card. It means OpenAI is a business associate, which means they're now in your compliance scope. If their systems are breached, you have a reportable incident. If they log your prompts (even temporarily, for abuse detection), those logs contain PHI. If that logging account is compromised, you have a data breach.

A BAA doesn't eliminate risk. It redistributes it.

The Paralysis: "We can't use AI until we build our own on-prem GPU cluster."

This is a $500,000+ infrastructure investment with a 6-month timeline before you can run your first inference. It's technically correct but economically absurd for most healthcare organizations. And it doesn't solve the fundamental problem-even a self-hosted model can mishandle PHI if the application architecture is wrong.

The Middle Path: Data Minimization

There is an architecture that lets you use commercial LLM APIs for healthcare AI without sending identifiable PHI to the provider. It's called data minimization, and it's built on a principle that HIPAA already mandates: the Minimum Necessary Rule.

The Minimum Necessary Rule Applied to AI

HIPAA's Minimum Necessary standard says: don't access, use, or disclose more PHI than needed for the task at hand.

Most RAG-powered healthcare applications violate this by default:

User query: "Does the patient have a history of diabetes?"

Naive RAG retrieval: Fetches the entire EHR JSON blob
{
    "name": "John Smith",
    "ssn": "123-45-6789",
    "address": "742 Evergreen Terrace",
    "mental_health_notes": "History of depression...",
    "substance_abuse_record": "...",
    "diagnosis": [{"code": "E11", "description": "Type 2 diabetes"}],
    ...200 more fields...
}

The question was about diabetes. The application fetched-and sent to the LLM-the patient's name, SSN, address, mental health history, and substance abuse records. That's six categories of PHI exposed for a yes/no question.

The fix: Create a clinical summary projection. Only index medical facts. Separate demographics from clinical data. And use PromptGuard's PII detector to catch any PHI that leaks through despite your projections.

# Minimum Necessary: only retrieve what's needed
def get_clinical_context(patient_id: str, query: str):
    # Retrieve ONLY clinical facts relevant to the query
    clinical_summary = db.clinical_facts.search(
        filter={"patient_id": patient_id},
        query=query,
        fields=["diagnosis_code", "diagnosis_description", "date"]
        # Explicitly: no name, no SSN, no address, no notes
    )
    return clinical_summary

De-identification Before LLM Processing

For applications that need to process narrative text (clinical notes, discharge summaries), de-identify the text before it reaches the LLM:

Original: "Patient John Smith (DOB: 03/15/1985, MRN: 12345) presents
          with Type 2 diabetes. Referred by Dr. Sarah Johnson at
          Springfield General Hospital."

De-identified: "Patient [NAME_1] (DOB: [DOB_REDACTED], MRN: [ID_REDACTED])
               presents with Type 2 diabetes. Referred by [NAME_2] at
               [ORGANIZATION_1]."

PromptGuard's PII detector handles 39+ entity types, including:

Names → [NAME_REDACTED] (via email/context patterns)
Dates of birth → [DOB_REDACTED]
SSN → [SSN_REDACTED]
Phone numbers → [PHONE_REDACTED]
Medicare IDs → [MEDICARE_REDACTED]
NHS numbers → [NHS_REDACTED]

The LLM receives the de-identified text. It can still reason about the clinical content ("Type 2 diabetes") without knowing whose diabetes it is. The response is just as useful, but the PHI never leaves your infrastructure.

The Architecture

Patient Record → [De-identification Layer] → LLM API → Response
                  ↑                                       ↓
            PromptGuard PII                         [Re-identification]
            Detector (39+ types)                     (optional, internal only)

De-identification (outbound): PII is stripped before the prompt reaches the LLM provider. PromptGuard handles this automatically when configured with the proxy pattern.
LLM processing: The LLM sees de-identified text. It generates a response using clinical content only.
Re-identification (optional, internal): If the response needs to reference specific patients by name (e.g., in a clinical summary), the application maps redaction tokens back to original values internally. The LLM provider never sees the original PHI.

The Audit Trail Problem

HIPAA auditors don't just ask "is the data protected?" They ask "who accessed what, when, and why?"

If your answer is "it's in the application logs," you're failing the audit. AI interactions are conversational. A clinician might ask 5 questions to get 1 answer. The audit trail needs to capture the full decision context.

The Semantic Audit Log

We recommend capturing three dimensions for every AI interaction:

Intent: What was the user trying to do?

{
    "intent": "clinical_query",
    "category": "diagnosis_history",
    "specificity": "single_condition"
}

Context: What data was accessed?

{
    "documents_retrieved": 3,
    "document_ids": ["DOC-451", "DOC-452", "DOC-789"],
    "phi_categories_accessed": ["diagnosis", "medication"],
    "phi_categories_excluded": ["demographics", "mental_health"]
}

Outcome: What was returned to the user?

{
    "response_type": "clinical_summary",
    "phi_in_response": false,
    "confidence": 0.92,
    "event_id": "evt_abc123"
}

PromptGuard's security event logging captures the decision metadata (confidence, threat type, event ID) automatically. For healthcare applications, augment this with the clinical-specific fields above.

Zero Retention Mode

For the most sensitive deployments, enable PromptGuard's zero retention mode. When active, the security engine processes and evaluates prompts but does not store the prompt content in event records. You get the audit trail (who, when, what decision, what confidence) without storing the actual PHI in the security logs.

This is configured per-project in the dashboard. The zero_retention flag on the project controls whether content_preview is included in security events.

Self-Hosting for Data Sovereignty

For healthcare organizations that cannot send any data to external services-even with BAAs-PromptGuard is fully self-hostable.

Deploy the entire stack (API, dashboard, PostgreSQL, Redis) inside your VPC with docker-compose up. Point your application's LLM base URL at your internal PromptGuard instance. All PII detection and security scanning happens inside your network. The only external call is from PromptGuard to the LLM provider, and by that point, the PHI has already been de-identified.

For organizations that also want to self-host the LLM, you can run a local inference server (vLLM, TGI, Ollama) and configure PromptGuard to route to it. PHI never leaves your data center.

The Compliance Checklist

Data minimization: RAG queries retrieve only the fields needed for each question
De-identification: PHI is redacted before leaving your VPC (PromptGuard PII detector)
BAA in place: With your LLM provider (if using cloud APIs)
Audit logging: Every AI interaction is logged with intent, context, and outcome
Zero retention: Enabled for projects that process PHI
Access controls: AI access to patient records follows the same role-based controls as human access
Output scanning: Bot responses are scanned for PHI before reaching the end user
Breach notification plan: Documented process for when (not if) something goes wrong

Conclusion

HIPAA compliance isn't about buying a tool. It's about architecture.

If you treat the LLM as an untrusted public API-one that should never see identifiable PHI-you'll build a secure system by default. De-identify on the way out. Scan on the way back. Log everything. Trust nothing.

You don't need to wait for a HIPAA-certified LLM. You need an architecture that doesn't need one.

HIPAA and LLMs: The Engineering Guide to Compliant AI in Healthcare

HIPAA and LLMs: The Engineering Guide to Compliant AI in Healthcare

The Two Failure Modes

The Middle Path: Data Minimization

The Minimum Necessary Rule Applied to AI

De-identification Before LLM Processing

The Architecture

The Audit Trail Problem

The Semantic Audit Log

Zero Retention Mode

Self-Hosting for Data Sovereignty

The Compliance Checklist

Conclusion

Continue Reading

The OWASP Top 10 for LLM Applications: A Practitioner's Guide to the Risks That Actually Matter

Beyond Redaction: Why We Replace PII With Synthetic Data

PCI-DSS for AI: Don't Let Your Chatbot Touch Credit Cards