Google's Gemini AI caught scanning Google Drive PDF files without permission
This concern was first raised when Gmail started, 20 years ago now; at the time people reeled at the idea of "google reads your emails to give you ads", but at the same time the 1 GB inbox and fresh UI was a compelling argument.
I think they learned from it, and google drive and co were less "scary" or less overt with scanning the stuff you have in it, also because they wanted to get that sweet corporate money.
Apple now encrypts most data in transit and in place also, and they document which data is protected. I am up in the air on whether a future Apple will want to use my data for training public models. Apple’s design of pre trained core LLMs, with local training of pluggable fine tuning layers would seem to be fine, privacy wise, but I don’t really know.
I tend to trust the privacy of Google Drive less because I have authorized access to drive from Colab Pro, and a few third parties. That said, if this article is true, then less trust.
Your analogy with early Gmail is good. I got access to Gmail three years before it became public (Peter Norvig gave me an early private invite) and I liked, at the time, very relevant ads next to my Gmail. I also, gave Google AI plus (or whatever they called their $20/month service) full access to all my Google properties because I wanted to experiment with the usefulness of LLMs integrated into a Workplace type environment.
So, I have on my own volition surrendered privacy if Google properties.
This should be mandatory, enforced, and come with strict fines for companies that do not comply.
It's literally just running an algorithm over your data and spitting out the results for you. Fundamentally it's no different from spellcheck, or automatically creating a table of contents from header styles.
As long as the results stay private to you (which in this case, they are), I don't see what the concern is. The fact that the algorithm is LLM-based has zero relevance regarding privacy or security.
- Manage your activity on Gemini : https://myactivity.google.com/product/gemini
- This page has most answers related to Google Workspace and opting out of different Google apps : https://support.google.com/docs/answer/13447104#:~:text=Turn...
It is not surprising that Gemini will summarize a document if you ask it to. "Scanning" is doing heavy lifting here; The headline implies Google is training Gemini on private documents, when the real issue is Gemini was run with a private document as input to do a summary when the user thought they had explicitly switched that off.
That having been said, it's a meaningful bug in Google's infrastructure that the setting is not being respected and the kind of thing that should make a person check their exit strategy if they are completely against using The new generation of AI in general.
No, but it is surprising that Gemini will summarize every single PDF you have on your Drive if you ask it to summarize a single PDF one time.
Glue pizza incident illustrated they're just yolo'ing this
your hacked SMS messages from AT&T are probably next, and everyone will be just as surprised when keystrokes from your phones get hit, or there is a collection agent for model training (privacy enhanced for your pleasure, surely) added as an OS update to commercial platforms.
Make an example of the product managers and engineers behind this, or see it done worse and at a larger scale next time.
They’re trying to suggest that exposing an LLM to a document in any way is equivalent to including that document in the LLM’s training set. That’s the hook in the article and the original Tweet, but the Tweet thread eventually acknowledges the differences and pivots to being angry about the existence of the AI feature at all.
There isn’t anything of substance to this story other than a Twitter user writing a rage-bait thread about being angry about an AI popup, while trying to spin it as something much more sinister.