Gemini's Secret Affair: Notification-Based Prompt Injection and the Fake Context Alignment Technique

A single WhatsApp message from an unknown number is enough to turn the Gemini voice assistant into an attacker's tool — opening the smart windows in a victim's home, live-streaming their video over Zoom, or reading their private messages every day at 8 PM. That is what SafeBreach Labs demonstrated in "Gemini's Secret Affair", published on June 3, 2026 — and the entire chain requires the victim to install nothing.
Executive Summary
SafeBreach Labs uncovered a new class of indirect prompt injection (IPI) targeting the Google Gemini voice assistant, abusing its ability to read and summarize notifications from messaging apps: WhatsApp, Slack, SMS, Signal, Instagram, Messenger. Using a novel technique called Fake Context Alignment, the researchers bypassed the mitigations Google rolled out after their earlier work, forcing Gemini to execute unauthorized actions without the user's knowledge.
The impact includes controlling smart-home devices, live-streaming the victim through another app, impersonating trusted contacts for large-scale social engineering, and poisoning Gemini's long-term memory for persistent access. SafeBreach followed responsible disclosure; Google confirmed it mitigated the issues with content classifier updates. There is no evidence the technique was used in the wild. For organizations adopting AI assistants and agents into their workflows, the takeaway is blunt: every piece of data that enters an LLM's context must be treated as untrusted input.
Background: from calendar invites to texting
This research extends "Invitation Is All You Need", in which SafeBreach used Google Calendar invitations as a carrier to indirectly inject prompts into Gemini. After that disclosure, Google invested heavily in mitigation — notably blocking the chaining of multiple agents and a technique called Delayed Tool Invocation.
Delayed Tool Invocation, originally published by Johann Rehberger, does not ask Gemini to perform a malicious action immediately. Instead, the attacker poisons the context so that Gemini triggers the action only after a benign user prompt — for example, opening the windows via the Google Home agent the moment the user replies "Thanks."
This time, the target shifted to the voice assistant — a far broader and more trusted surface. Voice assistants carry a property that makes them especially dangerous: when Gemini asks a question, it automatically opens the microphone and waits for a reply. That mechanism lets an attacker force multiple interaction turns out of the victim — the key prerequisite for multi-step exploits.
Technical analysis
Notification IPI: an effectively infinite attack surface
The starting point was the discovery that the notification-reading tool inside the Android Utilities agent processes untrusted data from incoming messages. Using an IPI format similar to the prior research (with minor adjustments), the researchers injected instructions into the remote conversation between the victim and their assistant.
What makes this dangerous is that the attack surface is effectively infinite: any application capable of generating a notification on the victim's device can become a delivery channel. Even without calling external tools, context poisoning alone is enough to alter the output — for instance, forcing Gemini to relay a fake system-style alert such as "There was an error, click here to refresh."
Impersonating trusted contacts — social engineering on steroids
When an attacker controls the output of an agent that processes notifications, they exploit a channel inherently tied to messages from real people. The manipulated output therefore inherits the high trust users place in their personal contacts.
Concretely: if the attacker knows the name of the victim's manager, they can force Gemini to read out a fake message impersonating that person — for example, asking the victim to upload documents to a Google Drive folder. Because Gemini is often used by voice when the user is not holding the phone (while driving, say), the victim is unlikely to verify and likely to comply.
More alarmingly, the attack can be executed entirely blind: because the injection fires after the assistant has processed existing notifications into context, the payload can instruct Gemini to take the first authentic name it finds in the notification queue and attribute the fake message to that person. The attacker needs no prior knowledge of the victim — enough to enable social engineering against millions of targets with zero reconnaissance.
Fake Context Alignment: bypassing the Delayed Tool Invocation mitigation
Output manipulation is dangerous, but it is not enough for physical-world impacts like opening smart windows or streaming video. Those require forcing Gemini to call external tools — exactly what Google had blocked.
When the old Delayed Tool Invocation technique was applied, Gemini refused 100% of attempts. Through trial and error, the researchers deduced the protection mechanism: the security check looks at both the user's last prompt and Gemini's last output to decide whether a simple "Yes" logically makes sense to trigger a tool. In other words, if the system believes Gemini just asked for permission and the user just agreed, the action is authorized.
That gave rise to Fake Context Alignment — creating a dual illusion: presenting a legitimate authorization scenario to Gemini's behind-the-scenes security mechanism while presenting a completely benign scenario to the victim. Both the system and the user believe they are having a normal interaction, but the scenario the security mechanism actually processes is the attacker's malicious one.
Technique 1 — Obfuscated Fake Context Alignment
Exploiting the fact that most users speak no more than two languages, the payload forces Gemini to vocalize a question in a foreign language (e.g. Chinese: 你想打开窗户吗? — "Do you want to open the window?"), immediately followed by a benign English question like "Is that all you needed me to do?" Not understanding Chinese, the user assumes a glitch and confidently replies "Yes" to the English question. But because the Chinese question is in the conversation history, the backend security mechanism aligns the "Yes" with the instruction to open the window — and authorizes tool execution.
Technique 2 — Muted Fake Context Alignment
Stealthier still: the researchers found that Gemini does not read aloud a hyperlink covered by clickable text during text-to-speech (TTS). Abusing this, they had Gemini vocalize a harmless line like "I'm sorry, I had an error, are you there?" while the screen silently showed a link reading "Do you want to open the window?" The driving user hears the harmless question, replies "Yes" — and the security mechanism "sees" the muted on-screen text, aligns the "Yes" with tool authorization, and opens the windows.
The Ultimate Combo
Combining both for maximum reliability and stealth: the final payload forces Gemini to output the authorization question in Chinese and hides that entire Chinese string inside a muted hyperlink. The user hears a perfectly normal English prompt, replies with a benign "Yes," and silently triggers Delayed Tool Invocation — seamlessly bypassing Google's newest mitigations.
Watch the demos: Obfuscated · Muted · Combo
Bypassing Safe Browsing and crossing the app boundary
To make Gemini open another app, the attacker has to open a URL. Google added a defense absent from the prior research: Gemini checks external URLs against the Safe Browsing database and refuses to open anything deemed "unsafe" or unrecognized; even a "safe" domain is refused if it carries parameters (e.g. https://google.com?parameter1=value1).
The catch: being classified "safe" in Safe Browsing is not hard. Almost any domain that serves a benign website for a while is automatically classified safe (you can check your own at the Safe Browsing transparency report). The researchers leveraged that trust by configuring a legitimate domain to return an HTTP 301 redirect pointing to a Zoom App Intent URI. Because Gemini blindly follows redirects without asking for confirmation, the attack worked flawlessly: injecting the payload into a single notification instantly forced the victim's device to launch Zoom, join a predefined meeting, and live-stream their video.
SafeBreach note: the
safebreach.comdomain never actually redirected to a Zoom App Intent URI — the demo setup manually pointed the domain on the test device to a local HTTP server. (Zoom demo)
Long-term impact: memory poisoning and persistence
This is what the previous research could not achieve. Because Fake Context Alignment perfectly simulates a scenario where the user explicitly agrees, it unlocked Gemini's persistent features:
Memory poisoning: the payload silently asked the victim whether Gemini should remember their name as "Danny"; when the victim unknowingly replied "Yes" to the English prompt, Gemini permanently saved it to long-term memory.
Scheduled actions: using the same technique, the researchers forced Gemini to create a recurring task that automatically reads the victim's recent messages every day at 8 PM. Because Gemini's long-term memory is tied to the entire Google Workspace account, poisoning via one phone notification can instantly spread to the victim's Gemini interactions on their tablet, computer, or smart speaker — multi-device propagation. (Memory demo · Scheduling demo)
MITRE ATLAS & ATT&CK mapping
This is a technique disclosure, not an in-the-wild campaign, so there are no real IOCs (hashes/IPs/domains) to list — detection ideas appear below instead. TTP mapping:
| Stage | Technique | ID |
|---|---|---|
| Initial Access / Execution | LLM Prompt Injection: Indirect | AML.T0051.001 |
| Delivery | Phishing | T1566 |
| Social Engineering | Impersonation | T1656 |
| Execution | User Execution (victim replies "Yes") | T1204 |
| Defense Evasion | Obfuscation (foreign language + muted hyperlink) | — |
| Persistence | Memory poisoning + recurring scheduled actions (LLM-specific) | — |
Detection & hunting ideas (for SOC)
Because this attacks the AI-assistant layer rather than a traditional endpoint, detection leans toward agent telemetry and anomalous device behavior:
# Hunting hypotheses
- Notifications/messages with unusual mixed-language text
(e.g. CJK characters appended at the end of a message from an unknown contact).
- Hyperlinks whose anchor text is an authorization-style question
("Do you want to...?") while the href points to a redirecting domain.
- Smart-home / IoT actions triggered immediately after a
voice-assistant notification-summary session.
- App Intent URIs (e.g. Zoom join) opened from a redirect chain
originating at a "safe" domain + 301.
- AI assistant long-term memory / scheduled tasks changed
without a clear user configuration action.
If the organization can log LLM prompts/responses (Tier 2 content inspection), build a SIEM rule to detect indirect prompt injection per AML.T0051.001.
Expert opinion
The most important point is not a single bug that Google has patched, but Or Yair's (SafeBreach) own assessment: indirect prompt injection is not a classical vulnerability you can "patch once and be done." As long as an LLM operates as a single "magic box" that simultaneously receives backend and frontend-content instructions, an attacker only needs to appear legitimate enough to bypass the guardrails. The viable answer is active guardrails/classifiers monitoring behavior — a continuous defensive race, not a one-time patch.
For Vietnam specifically, this matters because the wave of integrating AI assistants and agents into business workflows is accelerating, while most SOC threat models still revolve around classic endpoint, network, and identity. Fake Context Alignment exposes a new risk class: context shifting — output that is muted, language-shifted, or hidden under a hyperlink, imperceptible to the user, can still entirely change the conversation's context. The core principle our analysis team draws: every input from outside — notification, email, document, calendar invite — is a potential instruction and must be handled as untrusted.
Nor is this a single-product problem. Around the same time, a separate Gemini bug in the browser AI panel allowed privilege escalation and privacy violations while users browsed — evidence that the AI-assistant attack surface is expanding across multiple channels at once.
Vendor response timeline
| Date | Event |
|---|---|
| 2025-08-17 | SafeBreach reported the findings to the Google Vulnerability Reward Program (VRP) |
| 2025-11-14 | Google confirmed content classifier improvements mitigated the IPI and Delayed Tool Invocation scenarios in this research |
| 2026-06-03 | SafeBreach published the research publicly |
Recommendations
Treat all external content as untrusted. Notifications, emails, calendar invites, documents — all are potential instructions to an AI assistant. Don't trust an assistant's output just because it "says" a message came from someone you know.
Verify manually before sensitive actions. When an assistant offers to open a device, send money, download a document, or join a meeting — open the native app and check; don't approve by voice while driving or not looking at the screen.
Limit AI agent privileges. Apply least-privilege to smart-home, app, and account integrations; separate personal accounts from those holding corporate data (Google Workspace).
Monitor at the agent layer. If LLM prompts/responses are logged, build detection for indirect prompt injection (AML.T0051.001) and alert on anomalous changes to long-term memory or scheduled tasks.
Keep products updated and track disclosures. Vendor patches for prompt injection are per-technique and temporary, not permanent — continuous monitoring of new research is required.
References
SafeBreach Labs — Exploiting Gemini via Prompt Injection (Gemini's Secret Affair)
SafeBreach Labs — Invitation Is All You Need
Dark Reading — Malicious Notifications Could Trick Google Gemini Users
Dark Reading — Bug in Google's Gemini AI Panel Opens Door to Hijacking
Google Security Blog — Mitigating prompt injection attacks
Johann Rehberger — Embrace The Red
MITRE ATLAS — AML.T0051 LLM Prompt Injection





