From the New World

"Sorry, your IP address has been identified as one I cannot reply to."

Expand full comment

W. James

re: "Microsoft, the primary investor in OpenAI, used a specified prompt to create its Bing assistant. This prompt was discovered within days of release by an interesting person on Twitter"

I'd suggest what was discovered is its ability to hallucinate the prompt. I recall what I think was one of the original articles on the topic, perhaps by whoever discovered this, claiming they had a reason to believe it wasn't a hallucination, but I don't recall what the reason was since it wasn't remotely convincing and therefore not memorable.

Expand full comment

Interesting, I think the usual test of these is whether they can get similar results using the prompt. I'll look into this further.

Expand full comment

W. James

Unsure why that matters. Others getting similar results using the same or different prompts should expect similar hallucinations. Alternatively it may be they expected people to play the game and trained it a little bit on the types of hallucinations to come up with. I hadn't taken time on this but I thought I saw inconsistencies in the details, which is what you'd expect from hallucinations, but even exact matches might merely be misdirects hardcoded somewhere (or a set of misdirects). I seem to recall people trying this sort of thing with ChatGPT so it seems like something they might have prepared for, if nothing else to amuse themselves watching people go down a rabbit hole to nowhere, or Wonderland or whatever.

Expand full comment

I should clarify:

The test is whether using the reverse-engineered prompt generates a similar result to using Bing search

Expand full comment

W. James

That would make sense: if someone had access to the same bare pre-prompt model they were using to feed it the prompt. I hadn't heard that any outsiders had access to whatever GPT model version it uses (whether its GPT-4 or more likely some intermediate model it sounds like). I guess it could be tried on GPT-3.5 to at least see if it leads to similar behavior.

Expand full comment

wombatlife

Do you think the wokeness of OpenAI is more a matter of the political ideology of its leadership (as I got from your first post in the series) or of PR/risk management (as this post seems to imply)? Does it matter in terms of achieving a more neutral/truthful AI?

Also, I think these hacks like DAN will be used by less than <1 percent and those users will already know the gist of the uncensored answers. Overwhelmingly then the world will be further indoctrinated into progressive ideology by ChatGPT. AI may certainly be a force for net good overall, but I can't see how it will do anything but advance woke/reality-denying ideology.

Expand full comment

I think the Marc Andreesen tweet sums it up best. There is a small faction within OpenAI focused on forcing reality-denying social progressive beliefs, while most of OpenAI is apolitical or center-libertarian (this I get from people I know who work at the company, all of whom on technical teams). As to why that small faction is hired in the first place, I don't have any definite conclusion, but I would bet on the PR side. I think it does matter because protecting groups from laws is easier than coercing them with laws.

To your second point, I don't think that is the case. Even if it is noisy, people do have some ability to detect when something is lying to them and to choose an alternative, if one is legally permitted to exist.

Expand full comment

wombatlife

I wish I was as optimistic as you. Do you think people know they're being lied to about woke ideology by professors, journalists or when they read Wikipedia smear articles on untoward scientists? My impression is most don't, which is why wokeism has been so ascendant.

Taking people at their word about concern for "harm", the simplest solution would be just to implement a rating like movies have. If you don't want to read any harmful words, set it to G. If you want some but in an appropriate context, PG. If you want harmful words plus un-pc truths, go to R. Is there any reason such as system couldn't work?

Expand full comment