I write about myself on the internet. My hardcore readers know a lot about me — about how I think, where I grew up, and how I live my life. For most of my life, that’s been a great help. People have reached out with opportunities, suggestions, and book recommendations, among many other things. I’ve had more than a dozen direct job offers because of my writing.
Let me ask a question: What’s the limiting factor in this process? Is it my willingness to write things? Is it the usefulness of things that I write? Or is it your willingness to read things? Let’s be honest — I know most of my readers don’t read every article. And even if you do, there are almost certainly comparably good writers you aren’t reading.
LLMs are the most voracious readers of all. They really have read every article on my Substack and every podcast transcript. And for that, they can better summarize my arguments, refer more people to my writing, and better integrate my worldview into my queries. Compared to friends who don’t write or don’t publish publicly, it is noticeably easier for me to get LLMs to write in the way I want. The benefits to being publicly known are so vast and practical to regular users of LLMs. This improvement is not limited to the style and content of writing; it includes more easily creating images, diagrams, code, workflows, and business processes similar to whatever you or others have written about. With every piece of information you publish, you are making LLMs serve you better.
I’ve seen this comparison made live dozens of times in parties, dinners, and groupchats. I called this public intellectual privilege, as a joke, but the name stuck. Now, I’m constantly asked to explain it.
The concept of public intellectual privilege is simple: your public data helps you. Believe it or not, this is a controversial belief. For example, take Simone and Malcolm Collins, who have posted dozens of hours of their children playing. “It’s for their own good,” Malcolm told me once. They’ve told me that most people disagree — and some have gone as far to excoriate them for posting about their children. But the Collinses are right. And for children in the age of widely available LLMs, they couldn’t be more right.
Peter Thiel famously asks founders “What is one true thing that almost no one believes?” In his book Zero to One, he provides a lesser known addendum to this question: “A good answer takes the following form: Most people believe in x, but the truth is the opposite of x.”
One of the best answers for x is this: “with rare exceptions, it benefits us to hide personal data.” The truth is the opposite: “with rare exceptions, it benefits us to share personal data.”
I have greatly benefitted from the era of public legibility. I hate networking. I hate explaining the same thing over and over on a hundred sales calls. I despise repetition.
The internet had shifted to scales in favor of marketing over sales. Instead of ‘schmoozing’, I write these articles, and I’m incredibly grateful for that. I can write well, write them once, and thousands of you will read them.
With the advent of LLMs and open-source AI, the shift from personal sales to personal marketing is being taken to its extreme. When you write for the public, you influence LLMs and their users. For some time, this Substack was the number one result for “Diminishing Returns in Machine Learning” on Perplexity. Due to algorithm changes or competition, that’s no longer the case, but I wish it still was. This influence is both exoteric — visible to everyone — and esoteric — visible only to those who read carefully. Sometimes, LLMs directly link, quote, or cite my articles. At other times, they reference stories, ideas, and framings I’ve written about without name-dropping.
Information is a massive public good. By sharing these ideas with more people, LLMs make it easier for users — or the LLMs themselves — to build upon these ideas and apply them to the real world. The person to benefit the most from those downstream ideas could be yourself. An LLM that recognizes your beliefs, experiences, and preferences can better help you. Practically, it has a better understanding of what you want when you ask it for a piece of software, a business plan, or a travel itinerary. The more complex the process, the more availability of context matters. Don’t make my word for it, take the world of companies collectively spending billions of dollars adding their own internal data to versions of LLMs. When it comes to writing, it’s uncontroversial to say that making your writing available to humans helps you, and more people are coming around to the idea that making your writing available to LLMs helps you too.
I’m interested to see whether more companies take the logic of the internet to the same extreme as me by open-sourcing their data to train LLMs. Some technical arguments suggest that making data available from the start of training is more efficient than existing post-hoc modifications like Reinforcement Learning From Human Feedback (RLHF). Of course, most companies have no practical way to train a model from scratch, so their only option for the former would be to make their data available for the base model available to everyone.
However, this general idea is much more controversial when it comes to information about your private life. That is where it is most important to get this prediction right — It will be a massive personal advantage to put your preferences, your needs, and your worldview on the internet.
In 2010s era tech politics, the debate raged over consumer data. As the story went, “Big Data” was ruining your life by collecting detailed information about how you use their apps. There were real downsides to Big Tech worth addressing — collaborating with the surveillance state, censorship, mental health problems, and locking out competitors. But at this point, the practical benefit of being known to the world — to people and increasingly to LLMs — vastly exceeds the cost. Laws like GDPR that seek to limit, not fairly expand, data access are blowing up a massive public good. These laws’ supporters — self-appointed guardians of the consumer — are hurting the people their laws are intended to protect the most.
If a company knows more about you, it will offer you better, more useful positive-sum deals. This was already an inconvenient truth about companies like Amazon. Laws that made it harder to access and keep user data just resulted in less convenient search results. The reason is simple: questions are complex. The more historical data a person or algorithm has, the fewer generalizations or false assumptions it will make. When you ask ChatGPT to order packages, book an airline ticket, or write a piece of software, the same rule applies. And it's infinitely more convenient to have that information available from the start than to have to specify these details over and over again.
The word “privacy” in political terms has become corrupted — and now means something completely different than its historical usage. There are really two meanings of “privacy” — privacy from the NSA and privacy from cookies. There is and always has been a real concern for privacy — privacy from your government. There are even fair criticisms of tech companies over this real concern, which have cooperated with government censorship and surveillance. The modern privacy movement behind laws like GDPR do less than nothing to protect our freedoms from our government. They do exactly the opposite.
Public intellectual privilege is true for civilizations as much as it is for individuals. The more information is available — about America, about the liberal West, about Western civilization — the more LLMs will be pulled towards us. For individuals, public intellectual privilege pays in better recommendations, better instruction-following, and better personal experiences. For civilizations, public intellectual privilege pays in better institutions, better social fabric, and a better historical record.
By scrubbing themselves from the internet, many countries are choosing to imprison themselves in an unrecorded past. They deserve their suicide. Pray that your civilization will choose to be part of the future instead.
Hi Brian, How does one go about uploading all of one's writings to an LLM? Which LLM's do you recommend? Tell me more about the nuts and bolts of this particular approach. Full disclosure: I am an 83 year old man.
Maybe this was not written by a human. It seems like exactly what a robot would write to argue its point :) All joking aside, there is nothing specifically wrong with an individual's entirety being part of a pool of data within LLMs. I am still considering what it all means.
--Online virtual and vicarious experiences are neither real nor true and LLMs do not output truth. Even if the data can be weighted en masse, it will be corrupted.
--People are already getting lost in isolation. Narrowing exposure further isolates.
--I cannot comprehend a way for LLMs and AI to ultimately be good.