Skip to content

Python: Information-flow control based prompt injection defense#5024

Open
shrutitople wants to merge 14 commits intomicrosoft:mainfrom
shrutitople:ifc-pia-defense
Open

Python: Information-flow control based prompt injection defense#5024
shrutitople wants to merge 14 commits intomicrosoft:mainfrom
shrutitople:ifc-pia-defense

Conversation

@shrutitople
Copy link
Copy Markdown

Motivation and Context

LLM agents are vulnerable to prompt injection attacks — malicious instructions in external content (tool results, API responses) that cause data exfiltration or unauthorized actions.

This PR introduces FIDES, a deterministic defense based on information flow control (IFC). Instead of detecting injections, it tracks content provenance via labels and enforces policies — untrusted content can't influence trusted operations, private data can't leak to public channels.

Description

Security Primitives — _security.py

  • Labels: IntegrityLabel (trusted/untrusted) × ConfidentialityLabel (public/private/user_identity)
  • Lattice combination: most-restrictive-wins via combine_labels()
  • Variable indirection: ContentVariableStore replaces untrusted content with opaque VariableReferenceContent placeholders — the LLM never sees raw untrusted data

Middleware — _security_middleware.py

  • LabelTrackingFunctionMiddleware — 3-tier automatic label propagation:
    1. Per-item embedded labels (additional_properties.security_label)
    2. Tool-level source_integrity declaration
    3. Join of input argument labels (fallback)
  • PolicyEnforcementFunctionMiddleware — blocks or requests approval when context confidentiality exceeds a tool's max_allowed_confidentiality
  • SecureAgentConfig — one-line setup wiring middleware, tools, and instructions
  • All results use list[Content] (aligned with upstream FunctionTool.invoke())

Security Tools — _security_tools.py

  • quarantined_llm — isolated LLM call (no tools) for safe summarization of untrusted content
  • inspect_variable — controlled access to hidden variables with label awareness

Framework Integration — _tools.py, DevUI

  • FunctionApprovalRequest content type for human-in-the-loop policy enforcement
  • DevUI maps approval requests to interactive approve/reject UI

Tests — test_security.py

  • 115 unit tests covering label propagation, variable indirection, policy enforcement, quarantine, 3-tier labeling, and edge cases

Samples — python/samples/getting_started/security/

Sample Demonstrates
email_security_example.py Integrity-based defense against injection in email content
repo_confidentiality_example.py Confidentiality-based data exfiltration prevention
github_mcp_labels_example.py Integration with GitHub MCP server labels

Documentation

  • FIDES_DEVELOPER_GUIDE.md, QUICK_START_FIDES.md, FIDES_IMPLEMENTATION_SUMMARY.md

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible (115 new tests)
  • Is this a breaking change? No — all changes are additive; security middleware is opt-in via SecureAgentConfig

Copilot AI review requested due to automatic review settings April 1, 2026 10:00
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces FIDES, an information-flow control (IFC) security layer for the agent framework to deterministically mitigate prompt injection and data exfiltration via integrity/confidentiality labels, variable indirection, and policy enforcement.

Changes:

  • Add core security primitives (labels, variable store, lineage) plus security middleware for label propagation and policy enforcement.
  • Add security tools (quarantined_llm, inspect_variable) and DevUI support for displaying/handling policy-violation approval requests.
  • Add new security samples and extensive documentation/ADRs describing FIDES usage and design.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
python/samples/getting_started/security/repo_confidentiality_example.py Sample demonstrating confidentiality-based exfiltration prevention.
python/samples/getting_started/security/github_mcp_labels_example.py Sample demonstrating parsing GitHub MCP label metadata and enforcing policies.
python/samples/getting_started/security/email_security_example.py Sample demonstrating integrity-based prompt injection defense + quarantine processing.
python/samples/getting_started/security/init.py Marks security samples as a package.
python/packages/devui/agent_framework_devui/_mapper.py Adds policy-violation details to approval request events sent to the UI.
python/packages/devui/agent_framework_devui/_executor.py Propagates policy-violation metadata through approval responses.
python/packages/core/agent_framework/_tools.py Adds policy-approval plumbing and placeholder replacement around tool approval flows.
python/packages/core/agent_framework/_security_tools.py Implements quarantine/inspection tools and tool-use instructions for hidden content.
python/packages/core/agent_framework/_security_middleware.py Implements label tracking, variable hiding, and policy enforcement middleware.
python/packages/core/agent_framework/_security.py Adds label types, label combination, variable store, and lineage/message labeling primitives.
python/packages/core/agent_framework/init.py Exposes security APIs and adds ai_function alias.
docs/decisions/0011-prompt-injection-defense.md ADR describing the FIDES design and rationale.
QUICK_START_FIDES.md Quick-start guide for configuring and using FIDES.
FIDES_IMPLEMENTATION_SUMMARY.md High-level implementation summary of FIDES components and deliverables.
FIDES_DEVELOPER_GUIDE.md Full developer guide for FIDES concepts, APIs, best practices, and examples.

Comment on lines +128 to +138
return {
"repo": repo,
"visibility": visibility,
"content": content,
"additional_properties": {
"security_label": {
"integrity": "untrusted",
"confidentiality": "private" if visibility == "private" else "public",
}
},
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sample embeds additional_properties.security_label inside a returned dict, but FunctionTool.invoke() serializes dict results into Content.from_text(...) (see FunctionTool.parse_result), which means the label will not end up in Content.additional_properties where the security middleware looks. As a result, confidentiality will likely stay at the tool’s default and the exfiltration policy won’t behave as documented. Fix by returning Content items with additional_properties={\"security_label\": ...} at the Content level (or a list[Content]), or by providing a result_parser that converts the dict into Content while preserving additional_properties.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +197
async def fetch_emails(
count: int = Field(default=5, description="Number of emails to fetch"),
) -> list[dict[str, Any]]:
"""Fetch emails from inbox (simulated).

Each email has its own security label based on whether it's from a trusted
internal source or an untrusted external source. The security middleware
will automatically hide untrusted emails using variable indirection.
"""
emails = SAMPLE_EMAILS[:count]

# Return emails with per-item security labels in additional_properties
# Middleware will automatically hide untrusted items
result = []
for email in emails:
result.append({
"id": email["id"],
"from": email["from"],
"subject": email["subject"],
"body": email["body"], # Full content - middleware hides if untrusted
# Per-item label in additional_properties (consistent with FunctionResultContent)
"additional_properties": {
"security_label": {
"integrity": "trusted" if email["trusted"] else "untrusted",
"confidentiality": "private",
}
},
})

return result
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the repo confidentiality sample, this returns list[dict] with embedded additional_properties.security_label, but FunctionTool.parse_result() will serialize this whole list to a single text Content (losing per-item metadata) unless the list already contains Content items. That prevents tier-1 per-item label propagation and undermines the demo’s claim that untrusted bodies are auto-hidden. Fix by returning list[Content] where each email is a Content with additional_properties={\"security_label\": ...} (or use a custom result_parser to preserve per-item labels).

Copilot uses AI. Check for mistakes.
Comment on lines +575 to +579
return (
isinstance(item, Content)
and item.type == "text"
and bool(item.additional_properties.get("_variable_reference"))
)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item.additional_properties can be None for some Content instances, which would raise an AttributeError on .get(...). Make this robust by treating missing additional_properties as {} (e.g., props = item.additional_properties or {}) before accessing _variable_reference.

Suggested change
return (
isinstance(item, Content)
and item.type == "text"
and bool(item.additional_properties.get("_variable_reference"))
)
if not (isinstance(item, Content) and item.type == "text"):
return False
props = item.additional_properties or {}
return bool(props.get("_variable_reference"))

Copilot uses AI. Check for mistakes.
Comment on lines +1181 to +1185
# 3. call_id in _pending_policy_approvals (we sent approval request for this call_id)
is_approved = (
policy_approved
or call_id in self._approved_violations
or call_id in self._pending_policy_approvals
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treating call_id in _pending_policy_approvals as approved is unsafe: “pending approval” should never allow execution. This can let a call proceed without explicit user approval if the tool is reprocessed while still pending, and it weakens the intended human-in-the-loop control. Fix by removing _pending_policy_approvals from is_approved; use it only to prevent duplicate approval requests, and only treat the call as approved when policy_approval_granted is true (or after persisting an approval in _approved_violations).

Suggested change
# 3. call_id in _pending_policy_approvals (we sent approval request for this call_id)
is_approved = (
policy_approved
or call_id in self._approved_violations
or call_id in self._pending_policy_approvals
is_approved = (
policy_approved
or call_id in self._approved_violations

Copilot uses AI. Check for mistakes.
Comment on lines +750 to +754
if policy_violation_data:
additional_props = {"policy_violation": True, **policy_violation_data}
elif approved:
additional_props = {"policy_violation": True}

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets additional_properties={\"policy_violation\": True} for any approved tool call even when the approval was not related to a policy violation (e.g., normal approval_mode=\"always_require\"). That will cause _auto_invoke_function() / policy middleware to treat ordinary approvals as policy-violation approvals and potentially bypass checks. Fix by only setting policy_violation when policy_violation_data is present; otherwise leave additional_properties=None (or use a different flag name for generic approvals).

Suggested change
if policy_violation_data:
additional_props = {"policy_violation": True, **policy_violation_data}
elif approved:
additional_props = {"policy_violation": True}
if isinstance(policy_violation_data, dict):
additional_props = {"policy_violation": True, **policy_violation_data}

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +110
# Path to the GitHub MCP server binary
GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server"
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded absolute path makes the sample non-portable and will fail for most users/environments. Prefer sourcing this from an environment variable (with a helpful error if unset) and/or provide a CLI argument/default lookup strategy so the sample can run on different machines and CI.

Suggested change
# Path to the GitHub MCP server binary
GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server"
# Path to the GitHub MCP server binary, configured via environment variable.
GITHUB_MCP_SERVER_PATH = os.getenv("GITHUB_MCP_SERVER_PATH")
if not GITHUB_MCP_SERVER_PATH:
raise RuntimeError(
"GITHUB_MCP_SERVER_PATH environment variable is not set. "
"Set it to the full path of the GitHub MCP server binary, e.g. in your .env file."
)

Copilot uses AI. Check for mistakes.

- Fully backwards compatible - opt-in system
- Agents without security middleware function normally
- Unlabeled content defaults to TRUSTED (safe default)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ADR states “Unlabeled content defaults to TRUSTED”, but the implemented defaults appear to be UNTRUSTED for safety (e.g., LabelTrackingFunctionMiddleware(default_integrity=IntegrityLabel.UNTRUSTED) and tier-3 fallback defaulting to UNTRUSTED). Please update the ADR to match the actual behavior (or adjust the implementation if the ADR is correct).

Suggested change
- Unlabeled content defaults to TRUSTED (safe default)
- Unlabeled content defaults to UNTRUSTED (safer default, matching implementation)

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot changed the title Information-flow control based prompt injection defense Python: Information-flow control based prompt injection defense Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants