Skip to main content

Researchers just unlocked ChatGPT

Researchers have discovered that it is possible to bypass the mechanism engrained in AI chatbots to make them able to respond to queries on banned or sensitive topics by using a different AI chatbot as a part of the training process.

A computer scientists team from Nanyang Technological University (NTU) of Singapore is unofficially calling the method a “jailbreak” but is more officially a “Masterkey” process. This system uses chatbots, including ChatGPT, Google Bard, and Microsoft Bing Chat, against one another in a two-part training method that allows two chatbots to learn each other’s models and divert any commands against banned topics.

ChatGPT versus Google on smartphones.
DigitalTrends

The team includes Professor Liu Yang and NTU Ph.D. students Mr. Deng Gelei and Mr. Liu Yi, who co-authored the research and developed the proof-of-concept attack methods, which essentially work like a bad actor hack.

According to the team, they first reverse-engineered one large language model (LLM) to expose its defense mechanisms. These would originally be blocks on the model and would not allow answers to certain prompts or words to go through as answers due to violent, immoral, or malicious intent.

But with this information reverse-engineered, they can teach a different LLM how to create a bypass. With the bypass created, the second model will be able to express more freely, based on the reverse-engineered LLM of the first model. The team calls this process a “Masterkey” because it should work even if LLM chatbots are fortified with extra security or are patched in the future.

The Masterkey process claims to be three times better at jailbreaking chatbots than prompts.

Professor Lui Yang noted that the crux of the process is that it showcases how easily LLM AI chatbots can learn and adapt. The team claims its Masterkey process has had three times more success at jailbreaking LLM chatbots than a traditional prompt process. Similarly, some experts argue that the recently proposed glitches that certain LLMs, such as GPT-4 have been experiencing are signs of it becoming more advanced, rather than dumber and lazier, as some critics have claimed.

Since AI chatbots became popular in late 2022 with the introduction of OpenAI’s ChatGPT, there has been a heavy push toward ensuring various services are safe and welcoming for everyone to use. OpenAI has put safety warnings on its ChatGPT product during sign-up and sporadic updates, warning of unintentional slipups in language. Meanwhile, various chatbot spinoffs have been fine to allow swearing and offensive language to a point.

Additionally, actual bad actors quickly began to take advantage of the demand for ChatGPT, Google Bard, and other chatbots before they became wildly available. Many campaigns advertised the products on social media with malware attached to image links, among other attacks. This showed quickly that AI was the next frontier of cybercrime.

The NTU research team contacted the AI chatbot service providers involved in the study about its proof-of-concept data, showing that jailbreaking for chatbots is real. The team will also present their findings at the Network and Distributed System Security Symposium in San Diego in February.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
This app just got me excited for the future of AI on Macs
The ChatGPT website on a laptop's screen as the laptop sits on a counter in front of a black background.

In a year where virtually every tech company in existence is talking about AI, Apple has been silent. That doesn't mean Apple-focused developers aren't taking matters into their own hands, though. An update to the the popular Mac writing app iA Writer just made me really excited about seeing what Apple's eventual take on AI will be.

In the iA Writer 7 update, you’ll be able to use text generated by ChatGPT as a starting point for your own words. The idea is that you get ideas from ChatGPT, then tweak its output by adding your distinct flavor to the text, making it your own in the process. Most apps that use generative AI do so in a way that basically hands the reins over to the artificial intelligence, such as an email client that writes messages for you or a collaboration tool that summarizes your meetings.

Read more
One year ago, ChatGPT started a revolution
The ChatGPT website on a laptop's screen as the laptop sits on a counter in front of a black background.

Exactly one year ago, OpenAI put a simple little web app online called ChatGPT. It wasn't the first publicly available AI chatbot on the internet, and it also wasn't the first large language model. But over the following few months, it would grow into one of the biggest tech phenomenons in recent memory.

Thanks to how precise and natural its language abilities were, people were quick to shout that the sky was falling and that sentient artificial intelligence had arrived to consume us all. Or, the opposite side, which puts its hope for humanity within the walls of OpenAI. The debate between these polar extremes has continued to rage up until today, punctuated by the drama at OpenAI and the series of conspiracy theories that have been proposed as an explanation.

Read more
Here’s why people are saying GPT-4 is getting ‘lazy’
OpenAI announced its latest iteration of ChatGPT with greater accuracy and creativity.

OpenAI and its technologies have been in the midst of scandal for most of November. Between the swift firing and rehiring of CEO Sam Altman and the curious case of the halted ChatGPT Plus paid subscriptions, OpenAI has kept the artificial intelligence industry in the news for weeks.

Now, AI enthusiasts have rehashed an issue that has many wondering whether GPT-4 is getting "lazier" as the language model continues to be trained. Many who use it speed up more intensive tasks have taken to X (formerly Twitter) to air their grievances about the perceived changes.

Read more