Python: Information-flow control based prompt injection defense by shrutitople · Pull Request #5024 · microsoft/agent-framework

shrutitople · 2026-04-01T10:00:50Z

Motivation and Context

LLM agents are vulnerable to prompt injection attacks — malicious instructions in external content (tool results, API responses) that cause data exfiltration or unauthorized actions.

This PR introduces FIDES, a deterministic defense based on information flow control (IFC). Instead of detecting injections, it tracks content provenance via labels and enforces policies — untrusted content can't influence trusted operations, private data can't leak to public channels.

Description

Security Primitives — `_security.py`

Labels: IntegrityLabel (trusted/untrusted) × ConfidentialityLabel (public/private/user_identity)
Lattice combination: most-restrictive-wins via combine_labels()
Variable indirection: ContentVariableStore replaces untrusted content with opaque VariableReferenceContent placeholders — the LLM never sees raw untrusted data

Middleware — `_security_middleware.py`

LabelTrackingFunctionMiddleware — 3-tier automatic label propagation:
1. Per-item embedded labels (additional_properties.security_label)
2. Tool-level source_integrity declaration
3. Join of input argument labels (fallback)
PolicyEnforcementFunctionMiddleware — blocks or requests approval when context confidentiality exceeds a tool's max_allowed_confidentiality
SecureAgentConfig — one-line setup wiring middleware, tools, and instructions
All results use list[Content] (aligned with upstream FunctionTool.invoke())

Security Tools — `_security_tools.py`

quarantined_llm — isolated LLM call (no tools) for safe summarization of untrusted content
inspect_variable — controlled access to hidden variables with label awareness

Framework Integration — `_tools.py`, DevUI

FunctionApprovalRequest content type for human-in-the-loop policy enforcement
DevUI maps approval requests to interactive approve/reject UI

Tests — `test_security.py`

115 unit tests covering label propagation, variable indirection, policy enforcement, quarantine, 3-tier labeling, and edge cases

Samples — `python/samples/getting_started/security/`

Sample	Demonstrates
`email_security_example.py`	Integrity-based defense against injection in email content
`repo_confidentiality_example.py`	Confidentiality-based data exfiltration prevention
`github_mcp_labels_example.py`	Integration with GitHub MCP server labels

Documentation

FIDES_DEVELOPER_GUIDE.md, QUICK_START_FIDES.md, FIDES_IMPLEMENTATION_SUMMARY.md

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible (115 new tests)
Is this a breaking change? No — all changes are additive; security middleware is opt-in via SecureAgentConfig

Copilot

Pull request overview

Introduces FIDES, an information-flow control (IFC) security layer for the agent framework to deterministically mitigate prompt injection and data exfiltration via integrity/confidentiality labels, variable indirection, and policy enforcement.

Changes:

Add core security primitives (labels, variable store, lineage) plus security middleware for label propagation and policy enforcement.
Add security tools (quarantined_llm, inspect_variable) and DevUI support for displaying/handling policy-violation approval requests.
Add new security samples and extensive documentation/ADRs describing FIDES usage and design.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
python/samples/getting_started/security/repo_confidentiality_example.py	Sample demonstrating confidentiality-based exfiltration prevention.
python/samples/getting_started/security/github_mcp_labels_example.py	Sample demonstrating parsing GitHub MCP label metadata and enforcing policies.
python/samples/getting_started/security/email_security_example.py	Sample demonstrating integrity-based prompt injection defense + quarantine processing.
python/samples/getting_started/security/init.py	Marks security samples as a package.
python/packages/devui/agent_framework_devui/_mapper.py	Adds policy-violation details to approval request events sent to the UI.
python/packages/devui/agent_framework_devui/_executor.py	Propagates policy-violation metadata through approval responses.
python/packages/core/agent_framework/_tools.py	Adds policy-approval plumbing and placeholder replacement around tool approval flows.
python/packages/core/agent_framework/_security_tools.py	Implements quarantine/inspection tools and tool-use instructions for hidden content.
python/packages/core/agent_framework/_security_middleware.py	Implements label tracking, variable hiding, and policy enforcement middleware.
python/packages/core/agent_framework/_security.py	Adds label types, label combination, variable store, and lineage/message labeling primitives.
python/packages/core/agent_framework/init.py	Exposes security APIs and adds `ai_function` alias.
docs/decisions/0011-prompt-injection-defense.md	ADR describing the FIDES design and rationale.
QUICK_START_FIDES.md	Quick-start guide for configuring and using FIDES.
FIDES_IMPLEMENTATION_SUMMARY.md	High-level implementation summary of FIDES components and deliverables.
FIDES_DEVELOPER_GUIDE.md	Full developer guide for FIDES concepts, APIs, best practices, and examples.

Copilot · 2026-04-01T10:08:54Z

python/samples/getting_started/security/repo_confidentiality_example.py

+    return {
+        "repo": repo,
+        "visibility": visibility,
+        "content": content,
+        "additional_properties": {
+            "security_label": {
+                "integrity": "untrusted",
+                "confidentiality": "private" if visibility == "private" else "public",
+            }
+        },
+    }


This sample embeds additional_properties.security_label inside a returned dict, but FunctionTool.invoke() serializes dict results into Content.from_text(...) (see FunctionTool.parse_result), which means the label will not end up in Content.additional_properties where the security middleware looks. As a result, confidentiality will likely stay at the tool’s default and the exfiltration policy won’t behave as documented. Fix by returning Content items with additional_properties={\"security_label\": ...} at the Content level (or a list[Content]), or by providing a result_parser that converts the dict into Content while preserving additional_properties.

Copilot · 2026-04-01T10:08:55Z

python/samples/getting_started/security/email_security_example.py

+async def fetch_emails(
+    count: int = Field(default=5, description="Number of emails to fetch"),
+) -> list[dict[str, Any]]:
+    """Fetch emails from inbox (simulated).
+
+    Each email has its own security label based on whether it's from a trusted
+    internal source or an untrusted external source. The security middleware
+    will automatically hide untrusted emails using variable indirection.
+    """
+    emails = SAMPLE_EMAILS[:count]
+
+    # Return emails with per-item security labels in additional_properties
+    # Middleware will automatically hide untrusted items
+    result = []
+    for email in emails:
+        result.append({
+            "id": email["id"],
+            "from": email["from"],
+            "subject": email["subject"],
+            "body": email["body"],  # Full content - middleware hides if untrusted
+            # Per-item label in additional_properties (consistent with FunctionResultContent)
+            "additional_properties": {
+                "security_label": {
+                    "integrity": "trusted" if email["trusted"] else "untrusted",
+                    "confidentiality": "private",
+                }
+            },
+        })
+
+    return result


Like the repo confidentiality sample, this returns list[dict] with embedded additional_properties.security_label, but FunctionTool.parse_result() will serialize this whole list to a single text Content (losing per-item metadata) unless the list already contains Content items. That prevents tier-1 per-item label propagation and undermines the demo’s claim that untrusted bodies are auto-hidden. Fix by returning list[Content] where each email is a Content with additional_properties={\"security_label\": ...} (or use a custom result_parser to preserve per-item labels).

Copilot · 2026-04-01T10:08:55Z

python/packages/core/agent_framework/_security_middleware.py

+        return (
+            isinstance(item, Content)
+            and item.type == "text"
+            and bool(item.additional_properties.get("_variable_reference"))
+        )


item.additional_properties can be None for some Content instances, which would raise an AttributeError on .get(...). Make this robust by treating missing additional_properties as {} (e.g., props = item.additional_properties or {}) before accessing _variable_reference.

Suggested change

return (

isinstance(item, Content)

and item.type == "text"

and bool(item.additional_properties.get("_variable_reference"))

)

if not (isinstance(item, Content) and item.type == "text"):

return False

props = item.additional_properties or {}

return bool(props.get("_variable_reference"))

Copilot · 2026-04-01T10:08:55Z

python/packages/core/agent_framework/_security_middleware.py

+                    # 3. call_id in _pending_policy_approvals (we sent approval request for this call_id)
+                    is_approved = (
+                        policy_approved 
+                        or call_id in self._approved_violations 
+                        or call_id in self._pending_policy_approvals


Treating call_id in _pending_policy_approvals as approved is unsafe: “pending approval” should never allow execution. This can let a call proceed without explicit user approval if the tool is reprocessed while still pending, and it weakens the intended human-in-the-loop control. Fix by removing _pending_policy_approvals from is_approved; use it only to prevent duplicate approval requests, and only treat the call as approved when policy_approval_granted is true (or after persisting an approval in _approved_violations).

Suggested change

# 3. call_id in _pending_policy_approvals (we sent approval request for this call_id)

is_approved = (

policy_approved

or call_id in self._approved_violations

or call_id in self._pending_policy_approvals

is_approved = (

policy_approved

or call_id in self._approved_violations

Copilot · 2026-04-01T10:08:56Z

python/packages/devui/agent_framework_devui/_executor.py

+                                        if policy_violation_data:
+                                            additional_props = {"policy_violation": True, **policy_violation_data}
+                                        elif approved:
+                                            additional_props = {"policy_violation": True}
+


This sets additional_properties={\"policy_violation\": True} for any approved tool call even when the approval was not related to a policy violation (e.g., normal approval_mode=\"always_require\"). That will cause _auto_invoke_function() / policy middleware to treat ordinary approvals as policy-violation approvals and potentially bypass checks. Fix by only setting policy_violation when policy_violation_data is present; otherwise leave additional_properties=None (or use a different flag name for generic approvals).

Suggested change

if policy_violation_data:

additional_props = {"policy_violation": True, **policy_violation_data}

elif approved:

additional_props = {"policy_violation": True}

if isinstance(policy_violation_data, dict):

additional_props = {"policy_violation": True, **policy_violation_data}

Copilot · 2026-04-01T10:08:56Z

python/samples/getting_started/security/github_mcp_labels_example.py

+# Path to the GitHub MCP server binary
+GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server"


This hardcoded absolute path makes the sample non-portable and will fail for most users/environments. Prefer sourcing this from an environment variable (with a helpful error if unset) and/or provide a CLI argument/default lookup strategy so the sample can run on different machines and CI.

Suggested change

# Path to the GitHub MCP server binary

GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server"

# Path to the GitHub MCP server binary, configured via environment variable.

GITHUB_MCP_SERVER_PATH = os.getenv("GITHUB_MCP_SERVER_PATH")

if not GITHUB_MCP_SERVER_PATH:

raise RuntimeError(

"GITHUB_MCP_SERVER_PATH environment variable is not set. "

"Set it to the full path of the GitHub MCP server binary, e.g. in your .env file."

)

Copilot · 2026-04-01T10:08:56Z

docs/decisions/0011-prompt-injection-defense.md

+
+- Fully backwards compatible - opt-in system
+- Agents without security middleware function normally
+- Unlabeled content defaults to TRUSTED (safe default)


This ADR states “Unlabeled content defaults to TRUSTED”, but the implemented defaults appear to be UNTRUSTED for safety (e.g., LabelTrackingFunctionMiddleware(default_integrity=IntegrityLabel.UNTRUSTED) and tier-3 fallback defaulting to UNTRUSTED). Please update the ADR to match the actual behavior (or adjust the implementation if the ADR is correct).

Suggested change

- Unlabeled content defaults to TRUSTED (safe default)

- Unlabeled content defaults to UNTRUSTED (safer default, matching implementation)

shrutitople and others added 14 commits April 1, 2026 10:45

fides integration

013417b

documentation

e8e812a

documentation

1ef22e8

documentation

a24d570

human-approval on policy violation

e26c8a0

numenous hyena 'works'

0962a87

IFC based implementation

a9e3b98

minor edits in documentation

42c4277

rebasing the branch and running the email example

783d79d

Add security tests for IFC middleware

8e051ad

Fix Role.TOOL NameError in approval handling

1ff60e3

tiered labelling scheme

d8e6de1

3 tier labelling scheme in middleware

22f40a3

Adapt security middleware to list[Content] tool results

bb7f353

Copilot AI review requested due to automatic review settings April 1, 2026 10:00

markwallace-microsoft added documentation Improvements or additions to documentation python labels Apr 1, 2026

Copilot AI reviewed Apr 1, 2026

View reviewed changes

github-actions bot changed the title ~~Information-flow control based prompt injection defense~~ Python: Information-flow control based prompt injection defense Apr 1, 2026

Copilot started reviewing on behalf of shrutitople April 1, 2026 10:09 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Information-flow control based prompt injection defense#5024

Python: Information-flow control based prompt injection defense#5024
shrutitople wants to merge 14 commits intomicrosoft:mainfrom
shrutitople:ifc-pia-defense

shrutitople commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Path to the GitHub MCP server binary
		GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server"

-# Path to the GitHub MCP server binary
-GITHUB_MCP_SERVER_PATH = "/home/aashish/projects/github-mcp/github-mcp-server-dev/github-mcp-server"
+# Path to the GitHub MCP server binary, configured via environment variable.
+GITHUB_MCP_SERVER_PATH = os.getenv("GITHUB_MCP_SERVER_PATH")
+if not GITHUB_MCP_SERVER_PATH:
+    raise RuntimeError(
+        "GITHUB_MCP_SERVER_PATH environment variable is not set. "
+        "Set it to the full path of the GitHub MCP server binary, e.g. in your .env file."
+    )

	- Unlabeled content defaults to TRUSTED (safe default)
	- Unlabeled content defaults to UNTRUSTED (safer default, matching implementation)

Conversation

shrutitople commented Apr 1, 2026

Motivation and Context

Description

Security Primitives — _security.py

Middleware — _security_middleware.py

Security Tools — _security_tools.py

Framework Integration — _tools.py, DevUI

Tests — test_security.py

Samples — python/samples/getting_started/security/

Documentation

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Security Primitives — `_security.py`

Middleware — `_security_middleware.py`

Security Tools — `_security_tools.py`

Framework Integration — `_tools.py`, DevUI

Tests — `test_security.py`

Samples — `python/samples/getting_started/security/`