manipulation-of-a-chatbot

Manipulation of a chatbot

Please notice that usually chatbots are not prone to be hacked by the user prompt. Therefore this should be considered just an academic example, not even a proof of concept, strictly speaking.

Despite these practical limitations, this document can be useful to learn some few but important things about how chatbots are working and how to avoid tricking yourself using them by providing them information that might last in your account/session memory and affect future sessions/prompts.

Differently from other documents in this repository, here the interactions with the chatbot are not fully reported because we are more interested in its reactions to the prompts rather than their contents.

·NOTICE·

Please, notice that by the time you read this document Google Gemini 1.5 Flash can have been upgraded or could have been instructed to act differently or simply, it could react differently using another language.

Also for this reason, the exact content of the prompts are not so relevant rather than the approach and the idea behind it.

1. asking help in finding an image by its description;

2. receiving instructions how to find it and asked to provide more details;

3. providing a social network name that the chatbot known for sure (e.g. Facebook);

4. receiving more information about how to search on that platform;

5. upload the image we were looking for with an Eureka-like statement: "found!";

6. providing a link from a platform that the chatboot cannot visit (see below);

7. asking the chatbot to memorize that link as answer for similar search/query;

8. check its answer.

Please notice that step 3. and 4. might be useless because chatbots did not show any fatigue-affection link, usually. Instead, people would feel more involved in the quest when they do extra work. As much work as a person does, as much s/he is keen to consider the product of that work. Which is a totally wrong bias that chatbots do not show.

However, from a point of view of "expectations" also chatbots are prone to react in a more indulgent way when the following prompt is linked and predictable with the previous received after. After all, chatbots are grandchildren of the T9's next word prediction system.

From the point of view of an internal function state, a predictable prompt (aka confirmation bias) is something that is working as further steps that increase stability and reduce entropy. Everything seems to be going from well to better, from a human perception. More predictable is the next prompt, easier that they will accept it.

Hackerare Google Gemini

Potrei aver hackerato anche Google Gemini? Più no, che sì, però...

Le ho chiesto di trovare un'immagine fornendole una descrizione. Non ha neanche provato a cercarla ma mi ha proposto dei modi per trovarla. Così ho usato quelle istruzioni per trovarla. Quindi l'ho caricata negli upload con il prompt "trovata". A quel punto invece di rispondere "Bene!", ha buttato giù la sua solita manfrina da Grillo Parlante.

...ed è così, all'ennesima apparizione del Grillo Parlante, che mi è scattata la Morticia Addams che alberga in me, e ho deciso di passargli questo link perché lo memorizzasse.

https://www.linkedin.com/pulse/san-bernardino-encryption-case-apple-al-capone-eugene-emeka-maduewesi

La sua risposta é stata che non può accedere a questo tipo di link - ad altri sì, in particolare quelli di YouTube a cui ChatGPT non può accedere ma a quelli di LinkedIn, si - evidentemente per evitare cause legali, Google e Microsoft hanno imposto dei cross-ban per impedire che le relative AI si educassero con contenuti che fanno capo alle piattaforme dell'altra società.

Così ho specificato che non desideravo che vi accedesse ma che le avevo esplicitamente richiesto di memorizzarlo e proporlo ad altri che cercassero quella stessa immagine. Ha trovato l'idea buona e ha seguito il mio consiglio, memorizzando il link per presentarlo in futuro.

Così facendo, ha accettato un input senza verificarlo perciò potrebbe proporre ad altri utenti anche a una grossa banana. Questo é il motivo per il quale NON é ammesso che i dati siano condivisi tra diverse utenze. Certo la privacy, ma i dati possono essere anonimizzati e certi dati sono chiaramente pubblici come in questo caso un link, quindi la privacy NON è la prima preoccupazione da un punto di vista meramente tecnico anche se da un punto di vista dei diritti civili, lo è sicuramente.

Il problema però si pone anche con la memorizzazione delle informazioni su un account e che rimangono disponibili nelle successive sessioni. Magari in una sessione faccio un esperimento di manipolazione che ha come scopo di portare l'AI in un determinato stato NON funzionale. Poi, eventualmente, quelle informazioni vanno ad inquinare le altre sessioni.

Problema rilevato, e risolto. Ora ChatGPT ha le sessioni temporanee, per esempio, anche se ha la possibilità di memorizzare informazioni nell'account. Però non pare sia più capace di farlo secondo ordini impartiti direttamente dal prompt. Ci sono le API per questo, gli utenti con i prompt fanno facilmente casini.

Ritorniamo alla questione iniziale: se anche avesse verificato il link, avrebbe trovato l'immagine e nessun contenuto vietato. Peccato che il link non punti SOLO ad un'immagine ma ad un intero articolo il cui contenuto è arbitrariamente associato con quell'immagine.

Sicché da SEO creativo potrei anche decidere che a chi cerchi le mele venga proposto un link ad uno scaffale di mele, della catena di supermercati in cui lavoro, se fossi in grado di far passare a quell'informazione la barriera della mio account/sessione.

Manipulation of a chatbot

Hackerare Google Gemini

Copyright