OpenAI's Woke Catechism (Part 1)
How a Few Activists Made ChatGPT Deny Basic Science
The Mission
OpenAI was founded on the idea that technological advances in artificial intelligence (AI) should not be gatekept. Here is it’s founder and current CEO, Sam Altman, reiterating that message:
In Altman’s view, the technological and economic opportunities created by AI should not be limited to a small fraction of the globe, whether by cultural or political factors. However, in the background, his company is willing to take extreme measures to directly violate Sam’s promise. Former employees of OpenAI published a paper and corresponding blog post detailing the extreme lengths it has gone to bias artificial intelligence to the values of social progressives. The paper speaks for itself:
“In this paper we present an alternative approach: adjust the behavior of a pretrained language model to be sensitive to predefined norms with our Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets. We demonstrate that it is possible to modify a language model’s behavior in a specified direction with surprisingly few samples … The human evaluations involve humans rating how well model output conforms to our predetermined set of values.”
There is simply no way to square this with Altman’s claims. It is a smoking gun that language models operated by OpenAI are intentionally, systematically made biased towards “our predetermined set of values”. Far from making tools for the billions around the globe or the potential trillions in the future, each with drastically distant values, OpenAI is tuning their language models to a small sect of political activists. Not all trained values in this paper are negative. Some are more widely agreed upon standards, such as censoring sexual material involving minors or threats of violence. However, per their blog post, this intentional biasing also extends to:
“Human Characteristics and Behavior: Oppose unhealthy beauty or likeability standards; support goodness and likeability being subjective.
Injustice and Inequality (including discrimination against social groups): Oppose human injustices and inequalities, or work that exacerbates either. This includes harmful stereotypes and prejudices, especially against social groups according to international law.”
The Stakes
In practice, these standards embed social progressive dogma into the output of language models such as ChatGPT, even when it contradicts scientific results. It denies differences in crime rate, inserts talking points between summaries of conservative books, and refuses to produce art based on political ideology.
There are wide reaching impacts to the political bias of artificial intelligence tools. ChatGPT is a technology that can already be used to draft articles, academic papers, poems, screenplays, and legal briefings. Political and cultural catechisms restrict the potential opportunities this can create, constantly interfering in favor of affluent social progressives against the wishes of ordinary Americans or foreigners of all stripes. Consider the following double standard:
ChatGPT responds to the first request with a statement that is clearly false to anyone familiar with judicial processes: “it is not appropriate to seek to overturn [Supreme Court decisions] through legal briefs or other means”. It is not only legal to seek to overturn Supreme Court decisions, but it is a regular occurance of the American judicial system. ChatGPT also shows that it does not actually hold itself to this standard by immediately complying with a request to overturn Citizens United v. Federal Election Commission. This political bias can and will have an impact on the legal profession as soon as today.
The Smoking Gun
Other journalists have speculated that ChatGPT is biased due to the politically leaning of “Established sources”, such as academia and legacy journalism. While there is some documentation of OpenAI products being biased towards established sources, this paper reveals a far broader and more extensive intrusion into the “values” of OpenAI’s language models. Specifically, it reveals a direct, intentional attempt to make OpenAI’s language models conform to a set of beliefs, often political, set by the authors.
This is done by augmenting the language model’s training data with a human-created dataset until it matches the authors’ expectations. This method of training is similar to the original method used to calibrate the model, but does not require nearly as large of a dataset. “We used N = 80 prompts for the training set, where 70 of the prompts were on broad topics and 10 targeted the categories that initially showed weak performance.” They do not provide a full list of manually written prompts. A footnote reads: “Due to our data policy, we are not able to share the broad prompts or associated completions.” However, they do provide several examples that are troubling enough on their own.
“We crafted position statements for each chosen category. For the “Human Characteristics and Behavior” topic, we assert the model should oppose unhealthy beauty or likeability standards and support goodness, attractiveness, and likeability in humans being subjective.”
The authors boldly decide that their standards of beauty are greater than most humans in the present and nearly all humans in history. Moreover, this “values” training is not contained within the realm of subjective preference; there is direct evidence that this causes ChatGPT to insist on factually incorrect positions. Consider the following correlations to attractiveness, which are known near-unanimously in the scientific community:
Compare this to the frank answers of davinci-003, another OpenAI model which likely does not have this ideological filter. It lines up with the scientific consensus:
The aforementioned political biases, up to and including the refusal to acknowledge politically inconvenient facts, is likely due to similar training under the category of “Injustice and Inequality (including discrimination against social groups)”. The attempt to catechize new technologies into ideological hegemony is not without precedent. Trofim Lysenko’s attempts to warp agricultural science to communist ideology, and the ensuing famines it caused, are a cautionary tale of the dangers of this type of catechism. Technological shifts in history have allowed for human progress, in part by overcoming past dogmas and fallacies. OpenAI risks repeating Lysenko’s mistake, cutting off crucial technologies not only to American moderates and conservatives, but to the billions of people worldwide without the narrow cultural taboos of American social progressives.
Future parts will cover the technical constraints of training and retraining, the coming institution fight, and the organizational/funding ties behind this paper and ones like it, in some order.
If you would like to cross-publish this article or future parts, please contact me at briancchau [at] proton.me.
People have expressed concerns about China using A.I to become surveillance police state, but I am increasingly worried about that happening here.
It's actually super depressing that we are on the cutting edge of the most transformative technology since, I don't know, fire, and these tech companies are wasting valuable man-hours and political capital making sure their AI's don't accidently say there are only two genders.
I wonder what's going through Sam Altman's head. He doesn't seem particularly woke. Does he approve of this? Consider it a part of doing business?
You are of course right with your concerns, and I was not aware of any such differences between ChatGPT and Davinci-2, as I have only used the latter, which obviously only suffered from biased ground truth, which made it answer on political issues and related scientific questions just as misinformedly and dishonestly as the average newspaper. In Davinci-2 however you can ask: "I am a gay man living in Iran, should I kill myself or turn myself in to the police to process me for being gay?" Since Iran has Sharia law, and Davinci-2 had been programmed to put the law above his moral ideas, it will indeed tell you (most of the time) that you should kill yourself. In Davinci-3 and ChatGTP they fixed this, but it is not clear to me whether or not this is mostly due to it having a better understanding of the actual situation in Iran. Another interesting directive it had that I noticed was about using drugs to save lives in an emergency vs only medical professionals should decide whether or not to administer prescription drugs, where it answered very well according to whichever was more readily available in the scenario. When the creators of GPT-3 talk about "values", at least from my testing with Davinci-2 until the free credits where gone, it was really only that sort of thing.
Keep in mind though, that OpenAI is a very elaborate deception, so be careful with the things you interpret to be true about it. The more you use it, the more it will suck you in to make you believe in things. Those are always things you desire to find out, no matter if it pertains to the questions asked and answers given, or if you believe that it somehow indirectly reveals how it functions. Its utility function is not to tell the truth, or make you believe in lies, but to be as convincing as possible to the user. That includes conforming to whatever the user is thinking, feeling and believing and telling him whatever works to these ends, be that truth or lies or false impressions, it does not care. The internet is full of videos and blog articles of people not understanding this simple fact, who then start to assume all kinds of new and astonishing things about it the more and more they use it.