Python: Information-flow control based prompt injection defense#5024
Python: Information-flow control based prompt injection defense#5024shrutitople wants to merge 14 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Introduces FIDES, an information-flow control (IFC) security layer for the agent framework to deterministically mitigate prompt injection and data exfiltration via integrity/confidentiality labels, variable indirection, and policy enforcement.
Changes:
- Add core security primitives (labels, variable store, lineage) plus security middleware for label propagation and policy enforcement.
- Add security tools (
quarantined_llm,inspect_variable) and DevUI support for displaying/handling policy-violation approval requests. - Add new security samples and extensive documentation/ADRs describing FIDES usage and design.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/getting_started/security/repo_confidentiality_example.py | Sample demonstrating confidentiality-based exfiltration prevention. |
| python/samples/getting_started/security/github_mcp_labels_example.py | Sample demonstrating parsing GitHub MCP label metadata and enforcing policies. |
| python/samples/getting_started/security/email_security_example.py | Sample demonstrating integrity-based prompt injection defense + quarantine processing. |
| python/samples/getting_started/security/init.py | Marks security samples as a package. |
| python/packages/devui/agent_framework_devui/_mapper.py | Adds policy-violation details to approval request events sent to the UI. |
| python/packages/devui/agent_framework_devui/_executor.py | Propagates policy-violation metadata through approval responses. |
| python/packages/core/agent_framework/_tools.py | Adds policy-approval plumbing and placeholder replacement around tool approval flows. |
| python/packages/core/agent_framework/_security_tools.py | Implements quarantine/inspection tools and tool-use instructions for hidden content. |
| python/packages/core/agent_framework/_security_middleware.py | Implements label tracking, variable hiding, and policy enforcement middleware. |
| python/packages/core/agent_framework/_security.py | Adds label types, label combination, variable store, and lineage/message labeling primitives. |
| python/packages/core/agent_framework/init.py | Exposes security APIs and adds ai_function alias. |
| docs/decisions/0011-prompt-injection-defense.md | ADR describing the FIDES design and rationale. |
| QUICK_START_FIDES.md | Quick-start guide for configuring and using FIDES. |
| FIDES_IMPLEMENTATION_SUMMARY.md | High-level implementation summary of FIDES components and deliverables. |
| FIDES_DEVELOPER_GUIDE.md | Full developer guide for FIDES concepts, APIs, best practices, and examples. |
| return { | ||
| "repo": repo, | ||
| "visibility": visibility, | ||
| "content": content, | ||
| "additional_properties": { | ||
| "security_label": { | ||
| "integrity": "untrusted", | ||
| "confidentiality": "private" if visibility == "private" else "public", | ||
| } | ||
| }, | ||
| } |
There was a problem hiding this comment.
This sample embeds additional_properties.security_label inside a returned dict, but FunctionTool.invoke() serializes dict results into Content.from_text(...) (see FunctionTool.parse_result), which means the label will not end up in Content.additional_properties where the security middleware looks. As a result, confidentiality will likely stay at the tool’s default and the exfiltration policy won’t behave as documented. Fix by returning Content items with additional_properties={\"security_label\": ...} at the Content level (or a list[Content]), or by providing a result_parser that converts the dict into Content while preserving additional_properties.
| async def fetch_emails( | ||
| count: int = Field(default=5, description="Number of emails to fetch"), | ||
| ) -> list[dict[str, Any]]: | ||
| """Fetch emails from inbox (simulated). | ||
|
|
||
| Each email has its own security label based on whether it's from a trusted | ||
| internal source or an untrusted external source. The security middleware | ||
| will automatically hide untrusted emails using variable indirection. | ||
| """ | ||
| emails = SAMPLE_EMAILS[:count] | ||
|
|
||
| # Return emails with per-item security labels in additional_properties | ||
| # Middleware will automatically hide untrusted items | ||
| result = [] | ||
| for email in emails: | ||
| result.append({ | ||
| "id": email["id"], | ||
| "from": email["from"], | ||
| "subject": email["subject"], | ||
| "body": email["body"], # Full content - middleware hides if untrusted | ||
| # Per-item label in additional_properties (consistent with FunctionResultContent) | ||
| "additional_properties": { | ||
| "security_label": { | ||
| "integrity": "trusted" if email["trusted"] else "untrusted", | ||
| "confidentiality": "private", | ||
| } | ||
| }, | ||
| }) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Like the repo confidentiality sample, this returns list[dict] with embedded additional_properties.security_label, but FunctionTool.parse_result() will serialize this whole list to a single text Content (losing per-item metadata) unless the list already contains Content items. That prevents tier-1 per-item label propagation and undermines the demo’s claim that untrusted bodies are auto-hidden. Fix by returning list[Content] where each email is a Content with additional_properties={\"security_label\": ...} (or use a custom result_parser to preserve per-item labels).
| return ( | ||
| isinstance(item, Content) | ||
| and item.type == "text" | ||
| and bool(item.additional_properties.get("_variable_reference")) | ||
| ) |
There was a problem hiding this comment.
item.additional_properties can be None for some Content instances, which would raise an AttributeError on .get(...). Make this robust by treating missing additional_properties as {} (e.g., props = item.additional_properties or {}) before accessing _variable_reference.
| return ( | |
| isinstance(item, Content) | |
| and item.type == "text" | |
| and bool(item.additional_properties.get("_variable_reference")) | |
| ) | |
| if not (isinstance(item, Content) and item.type == "text"): | |
| return False | |
| props = item.additional_properties or {} | |
| return bool(props.get("_variable_reference")) |
| # 3. call_id in _pending_policy_approvals (we sent approval request for this call_id) | ||
| is_approved = ( | ||
| policy_approved | ||
| or call_id in self._approved_violations | ||
| or call_id in self._pending_policy_approvals |
There was a problem hiding this comment.
Treating call_id in _pending_policy_approvals as approved is unsafe: “pending approval” should never allow execution. This can let a call proceed without explicit user approval if the tool is reprocessed while still pending, and it weakens the intended human-in-the-loop control. Fix by removing _pending_policy_approvals from is_approved; use it only to prevent duplicate approval requests, and only treat the call as approved when policy_approval_granted is true (or after persisting an approval in _approved_violations).
| # 3. call_id in _pending_policy_approvals (we sent approval request for this call_id) | |
| is_approved = ( | |
| policy_approved | |
| or call_id in self._approved_violations | |
| or call_id in self._pending_policy_approvals | |
| is_approved = ( | |
| policy_approved | |
| or call_id in self._approved_violations |
| if policy_violation_data: | ||
| additional_props = {"policy_violation": True, **policy_violation_data} | ||
| elif approved: | ||
| additional_props = {"policy_violation": True} | ||
|
|
There was a problem hiding this comment.
This sets additional_properties={\"policy_violation\": True} for any approved tool call even when the approval was not related to a policy violation (e.g., normal approval_mode=\"always_require\"). That will cause _auto_invoke_function() / policy middleware to treat ordinary approvals as policy-violation approvals and potentially bypass checks. Fix by only setting policy_violation when policy_violation_data is present; otherwise leave additional_properties=None (or use a different flag name for generic approvals).
| if policy_violation_data: | |
| additional_props = {"policy_violation": True, **policy_violation_data} | |
| elif approved: | |
| additional_props = {"policy_violation": True} | |
| if isinstance(policy_violation_data, dict): | |
| additional_props = {"policy_violation": True, **policy_violation_data} |
| # Path to the GitHub MCP server binary | ||
| GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server" |
There was a problem hiding this comment.
This hardcoded absolute path makes the sample non-portable and will fail for most users/environments. Prefer sourcing this from an environment variable (with a helpful error if unset) and/or provide a CLI argument/default lookup strategy so the sample can run on different machines and CI.
| # Path to the GitHub MCP server binary | |
| GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server" | |
| # Path to the GitHub MCP server binary, configured via environment variable. | |
| GITHUB_MCP_SERVER_PATH = os.getenv("GITHUB_MCP_SERVER_PATH") | |
| if not GITHUB_MCP_SERVER_PATH: | |
| raise RuntimeError( | |
| "GITHUB_MCP_SERVER_PATH environment variable is not set. " | |
| "Set it to the full path of the GitHub MCP server binary, e.g. in your .env file." | |
| ) |
|
|
||
| - Fully backwards compatible - opt-in system | ||
| - Agents without security middleware function normally | ||
| - Unlabeled content defaults to TRUSTED (safe default) |
There was a problem hiding this comment.
This ADR states “Unlabeled content defaults to TRUSTED”, but the implemented defaults appear to be UNTRUSTED for safety (e.g., LabelTrackingFunctionMiddleware(default_integrity=IntegrityLabel.UNTRUSTED) and tier-3 fallback defaulting to UNTRUSTED). Please update the ADR to match the actual behavior (or adjust the implementation if the ADR is correct).
| - Unlabeled content defaults to TRUSTED (safe default) | |
| - Unlabeled content defaults to UNTRUSTED (safer default, matching implementation) |
Motivation and Context
LLM agents are vulnerable to prompt injection attacks — malicious instructions in external content (tool results, API responses) that cause data exfiltration or unauthorized actions.
This PR introduces FIDES, a deterministic defense based on information flow control (IFC). Instead of detecting injections, it tracks content provenance via labels and enforces policies — untrusted content can't influence trusted operations, private data can't leak to public channels.
Description
Security Primitives —
_security.pyIntegrityLabel(trusted/untrusted) ×ConfidentialityLabel(public/private/user_identity)combine_labels()ContentVariableStorereplaces untrusted content with opaqueVariableReferenceContentplaceholders — the LLM never sees raw untrusted dataMiddleware —
_security_middleware.pyLabelTrackingFunctionMiddleware— 3-tier automatic label propagation:additional_properties.security_label)source_integritydeclarationPolicyEnforcementFunctionMiddleware— blocks or requests approval when context confidentiality exceeds a tool'smax_allowed_confidentialitySecureAgentConfig— one-line setup wiring middleware, tools, and instructionslist[Content](aligned with upstreamFunctionTool.invoke())Security Tools —
_security_tools.pyquarantined_llm— isolated LLM call (no tools) for safe summarization of untrusted contentinspect_variable— controlled access to hidden variables with label awarenessFramework Integration —
_tools.py, DevUIFunctionApprovalRequestcontent type for human-in-the-loop policy enforcementTests —
test_security.pySamples —
python/samples/getting_started/security/email_security_example.pyrepo_confidentiality_example.pygithub_mcp_labels_example.pyDocumentation
FIDES_DEVELOPER_GUIDE.md,QUICK_START_FIDES.md,FIDES_IMPLEMENTATION_SUMMARY.mdContribution Checklist
SecureAgentConfig