Hackers Read It

thunderbong

1 year ago

Google's Gemini AI caught scanning Google Drive PDF files without permission

https://www.tomshardware.com/tech-industry/artificial-intelligence/gemini-ai-caught-scanning-google-drive-hosted-pdf-files-without-permission-user-complains-feature-cant-be-disabled

336

147

Cthulhu_

1 year ago

Just reiterates that you don't own your data hosted on cloud providers; this time there's a clear sign, but I can guarantee that google's systems read and aggregated data inside your private docs ages ago.

This concern was first raised when Gmail started, 20 years ago now; at the time people reeled at the idea of "google reads your emails to give you ads", but at the same time the 1 GB inbox and fresh UI was a compelling argument.

I think they learned from it, and google drive and co were less "scary" or less overt with scanning the stuff you have in it, also because they wanted to get that sweet corporate money.

shadowgovt

1 year ago

Of course Google reads and aggregates data inside your private docs. How would it provide search over your documents otherwise?

mark_l_watson

1 year ago

re: data on cloud providers: I trust ProtonDrive to not use my data because it is encrypted in transit and in place.

Apple now encrypts most data in transit and in place also, and they document which data is protected. I am up in the air on whether a future Apple will want to use my data for training public models. Apple’s design of pre trained core LLMs, with local training of pluggable fine tuning layers would seem to be fine, privacy wise, but I don’t really know.

I tend to trust the privacy of Google Drive less because I have authorized access to drive from Colab Pro, and a few third parties. That said, if this article is true, then less trust.

Your analogy with early Gmail is good. I got access to Gmail three years before it became public (Peter Norvig gave me an early private invite) and I liked, at the time, very relevant ads next to my Gmail. I also, gave Google AI plus (or whatever they called their $20/month service) full access to all my Google properties because I wanted to experiment with the usefulness of LLMs integrated into a Workplace type environment.

So, I have on my own volition surrendered privacy if Google properties.

vouaobrasil

1 year ago

All AI should be opt-in, which includes both training and scanning. You should have to check a box that says "I would like to use AI features", and the accompanying text should be crystal clear what that means.

This should be mandatory, enforced, and come with strict fines for companies that do not comply.

crazygringo

1 year ago

Training I can understand, but why scanning?

It's literally just running an algorithm over your data and spitting out the results for you. Fundamentally it's no different from spellcheck, or automatically creating a table of contents from header styles.

As long as the results stay private to you (which in this case, they are), I don't see what the concern is. The fact that the algorithm is LLM-based has zero relevance regarding privacy or security.

DebtDeflation

1 year ago

AI is becoming the new Social Media in that users are NOT the customer they are the product. Instead of generating data for a Social Media company to use to sell ads to companies you are generating data to train their AI, in exchange you get to use their service for free.

nitin_flanker

1 year ago

Apart from the obvious misleading way this article is written. I am listing all the links shared in the tweet thread that the article mentioned -

- Manage your activity on Gemini : https://myactivity.google.com/product/gemini

- This page has most answers related to Google Workspace and opting out of different Google apps : https://support.google.com/docs/answer/13447104#:~:text=Turn...

shadowgovt

1 year ago

The headline is a little unclear on the issue here.

It is not surprising that Gemini will summarize a document if you ask it to. "Scanning" is doing heavy lifting here; The headline implies Google is training Gemini on private documents, when the real issue is Gemini was run with a private document as input to do a summary when the user thought they had explicitly switched that off.

That having been said, it's a meaningful bug in Google's infrastructure that the setting is not being respected and the kind of thing that should make a person check their exit strategy if they are completely against using The new generation of AI in general.

dmvdoug

1 year ago

> It is not surprising that Gemini will summarize a document if you ask it to.

No, but it is surprising that Gemini will summarize every single PDF you have on your Drive if you ask it to summarize a single PDF one time.

thenoblesunfish

1 year ago

The title is misleading, isn't it? I was expecting this was scanning for training or testing or something, but this is summarization of articles the user is looking at, so "caught" is disingenous. You don't "catch" people doing things they tell you they are doing, while they're doing it.

mtnGoat

1 year ago

He had the permissions turned off, so regardless of what it did with the document, it did it without permission! The title is correct!

Havoc

1 year ago

Only a matter of time before someone extracts something valuable out of googe's models. Bank passwords or crypto keys or something

Glue pizza incident illustrated they're just yolo'ing this

motohagiography

1 year ago

this is similar to the scramble for health data during covid where a number of groups tried (and some succeeded) at using the crisis to squeeze the toothpaste out of the tube in a similar way, as there are low costs to being reprimanded and high value in grabbing the data. bureaucratic smash-and-grabs, essentially. disappointing, but predictable to anyone who has worked in privacy, and most people just make a show of acting surprised then moving on because their careers depend on their ability to sustain a gallopingly absurd best-intentions narrative.

your hacked SMS messages from AT&T are probably next, and everyone will be just as surprised when keystrokes from your phones get hit, or there is a collection agent for model training (privacy enhanced for your pleasure, surely) added as an OS update to commercial platforms.

Make an example of the product managers and engineers behind this, or see it done worse and at a larger scale next time.

Aurornis

1 year ago

The original Tweet and this article are mixing terms in a deliberately misleading way.

They’re trying to suggest that exposing an LLM to a document in any way is equivalent to including that document in the LLM’s training set. That’s the hook in the article and the original Tweet, but the Tweet thread eventually acknowledges the differences and pivots to being angry about the existence of the AI feature at all.

There isn’t anything of substance to this story other than a Twitter user writing a rage-bait thread about being angry about an AI popup, while trying to spin it as something much more sinister.

okdood64

1 year ago

I'm shocked, especially this being HN, that how many people are being successfully misled on what is actually going on here. Do people still read articles before posting?