About GenAi (ChatGPT, etc) Safety
-
I recently posted a blog message to friends and family about genAi...and then I realized I mentioned generating character images using Ai...and then I realized the community probably has a growing use of Ai:
- People prolly using Ai images
- People prolly testing code updates against GPT or other engines
- People prolly using Ai to generate text/writing now
So I figured...what the heck. May as well post it, here, too. Might protect some of the lesser savvy folks.
For people who aren't aware how Ai (Generative Ai) works, and why regular people should be aware of it.
(Please read if you want to protect your data around Ai engines, let me put on my "IT guy" hat, and please ignore if you already knew this stuff)
Ai isn't some "intelligence" inside of a machine. It's just a "learning engine". Here's an example:
You ask a brand-new Ai engine about rocks:
You: Please explain rocks to me
Ai: What is a rock?
<you give it access to a bunch of information on rocks>Now the Ai engine has access to that definition of a rock and can/will apply it to future references of "rock"
You: What is a rock?
Ai: (now with data) "A rock is a..."Now just picture thousands of repeated/refined saved data on rocks over a year, questions others asked about rocks, and the engine will attempt to use all of that data to get the best response.
This is how "ChatGPT" became "racist" in 48 hours. It isn't a childlike mind that needs to learn, and it surely has no bias against race. It's just that racist USERS of the software flooded the Ai engine with racist sentences, questions, and racially-charged responses to the Ai's queries. Due to the % of racist data on the server (which the Ai has no true understanding of racism), ChatGPT thought it was giving accurate data.
So...
You: What is the best weapon to use against <race>
Ai: No dataYou spend 20 minutes flooding the engine with false data related to how rocks are used to attack <race>
You: Tell me about rocks
Ai: <bunch of data about rocks> + "often used to kill <racial slur>"
++HOW IS THIS A RISK TO YOU???++
IF you are messing around with a generative Ai engine, it requires the INPUT of data to GENERATE data. It TAKES data from the user, searches for other data within the engine with similar tags, and generates a response based on what it finds.
You must FEED it prompts (you type into the bar what you're looking for), it saves that data, then returns data based on what you asked for.
-
if you accidentally paste private info into a GenAi engine, that data will REMAIN INSIDE OF THE ENGINE.
-
so if you accidentally paste your email address and password into an Ai engine, it'll remain in there. There's no guarantee it'll show up on a search about rocks, but it's in there SOMEWHERE
If you do not own the Ai engine, you cannot confirm the data is in there or can be removed. Your Uid/pass may show up in images or other GenAi searches because the data remains.
This means that, in theory, all generative Ai engines that are not properly maintained and audited may actually be hot targets for data mining. This means that if you search an Ai engine with higher classified company data into an Ai engine, that data may have just been leaked. This means that all art you make (digital or scan) and upload to an Ai engine remains in the engine for others to generate off of
So, please, consider this before entering anything personal, private, or important into a GenerativeAi engine. I suspect in less than 5 years, we may hear a lot about the GDPR, data retention on Ai engines, and Europe's "right to be forgotten", and a need for Ai engines to purge personally identifiable information stored in their databases.
-
@Ghost said in About GenAi (ChatGPT, etc) Safety:
and a need for Ai engines to purge personally identifiable information stored in their databases.
Which will be hilarious to watch, since they would then have to invent a way to erase data from a neural network. Since (as I'm sure you know, Ghost, but not everyone does) it's not a "database" or "data" in the traditional sense, but a vast array of "connections". That's what makes it so hard for these things to correctly attribute sources, avoid regurgitating copyrighted information, and stop the hallucinations.
But that aside, your other points are spot on.
-
@faraday Yeah I used DB for lack of a better term for neonates. Your point is 100%. Databases are far easier to index and maintain than these connections, which to a certain degree are on private hosts. In terms of where it falls on "information collection" and how it applies to something like the GDPR, but you've gotta figure over time the GDPR is eventually going to get notices of people trying to be "forgotten" by Ai engines.
But in the US you're kind of screwed. In the US there is no "right to be forgotten", nor a hard requirement that only data necessary to the function is collected. We are kind of an "open collection" nightmare where the US's current approach to data collection is basically "if you give it to them, it's their property"
...and you've also got to figure that after blatant attempts in recent days like "if an actor inserts their image to my Ai engine, I can duplicate their likeness ad infinitum without paying them" and sites like 23&Me literally collecting your DNA as their proprietary data...it's good to know these things just in case.
(* GDPR is the EU's data collection policy "GeneralDataProtectionRegulation" that specifically limits what data can or cannot be gathered, but also is unique as it is the data protection that requires collectors remove your data if you request to be forgotten by their systems. The US has no such policy)
-
Utter side note:
A D&D group I know had a GM lean into images and text and even story tidbits to flesh out their new small setting so the GM coulod focus on what happens next etc.
So lots of NPCs with described personalities and relationships, items with stories and local history flavor all made this way.
-
@Misadventure on the plus side, most of the big services pledge not to take your work without your permission (in terms of art), but means more that the final product isn't their property (but the info on how to recreate it is theirs to keep).
So if you're trying to do something professional, it's not the best choice in terms of privacy, nor artistry.
-
We live in a personal privacy hellscape. Nothing is above being scraped and sold. Very depressing.