Hardware Development and the Physical Frontier
Sorry, I should have first posted that this is great and it is useful to do, thanks for doing it. If there is a way to combat AGI fears, it does help. My immediate reaction from decades of debating true believers on various topics (from nonsense like homeopathy to socialists) was just that even if this is fully logical and accurate: long arguments leave room for true believers to find some way to rationalize a loophole.
There are so many uncertainties regarding predicting the future that I suspect its not going to be easy to get those driven by fear to think clearly enough to set it aside, unless/until there are simple short arguments. Of course sometimes the long arguments are needed first before people can figure out how to compress them, ala the famous saying "if i had more time this letter would be shorter".
Its just that there aren't enough people concerned with combatting the threat of regulations, and its a shame if AGI debate distracts attention from the difficult near term job of finding a way to head those off.
I agree AGI isn't imminent. Yann LeCun, Meta's head of AI, seems to suggest its not worth his time debating the more extreme doomsayers since there are more realistic immediate concerns to focus on. To those concerned about regulation (unfortunately he is a pro-regulation type) its an interesting theoretical debate distracting from the near term. The FTC complaint calling for them to regulate AI heavily, OpenAI pushing for near term regulations, etc.
Perhaps AGI fears may heighten concerns that underly some of that: but it it seems focusing on the near term may be more of use than getting distracted by AGI debates about something that is very unlikely imminent. Those who believe it isn't imminent should believe those debates can be postponed (or left up to those who don't have an interest in fighting regulations), even if its a shame some sharp folks waste time on AGI fears in the meantime.
While its unfortunate that there are some sharp people in doomsayer mode regarding AGI that get media coverage: it seems thats a distraction from the near term threat of regulation for non-AGI, and other more timely AI issues like pluralism. OpenAI's request for helping "democratically" decide AI steerage is essentially pushing the idea of regulation via that process, whether voluntary at first but then perhaps with a push for "democratic" governments to adopt it.
Then perhaps a push to use those democratic processes designed to limit AI speech as an excuse to find a way apply them to human speech (at least where there is no 1st amendment, while they figure out ways around that here and hope that big tech falls in line to comply with this "democratic" process).
Or the OpenAI democracy push may indicate a a possible implicit hope that maybe they can head towards a monopoly that is run in a "good" fashion: and push Microsoft to prevent add-ons that allow other AIs to be plugged into their office suite, search engine, etc, so that people use the "good" AI. Or at least require all AIs they allow (and Google allows in their office suites, and their AIs) to comply with this "democratic" process.
I suspect many were taken by surprise by the level of emergent behavior in LLMs. All projections of the future are problematic, ala Alan Kay's "The best way to predict the future is to invent it". Those concerned with AGI will merely fall back on the non-falsifiable prior that someone may invent something leading to equally unexpected advances. Unfortunately prior trends aren't guarantees of the future: which leaves true believers loopholes that its unclear can easily be squashed by long arguments with lots of data about trends that may change.
There are other near term issues. The article https://FixJournalism.com on fixing mainstream journalism by using AI to nudge the mainstream news to be more neutral to help them gain back trust, mentions the issue of AIs that summarize the news: which undermines the advertising revenue from the news outlets. That article was drawing attention to how AI can help the news, but it can harm it in ways that may lead them to call for regulation. All news will need to be behind paywalls, or there isn't any revenue to generate the "free news" for AI, but of course some even US news can come from free NPR or even foreign outlets like the BBC. The news media has already been pushing for regulations to hand them a piece of Google&Facebook et al's revenue, and for bailouts and government funding.
The real concern is human alignment in the near term, in terms of mindset regarding regulations. That site trying to point out the analogies between regulating speech and regulating AI is an attempt to try to get some in the public who still value free speech to consider protecting AI speech like human speech, https://PreventBigBrother.com
Perhaps that isn't the approach that'll work: but there need to be varied attempts to try to figure out what might get traction with the general public or politicians.
Excellent stuff here Brian. My thoughts on the hard push for regulation we’re hearing from the mega caps is because it’s an easy way for them to maintain a lead in the space. It already cost millions to train a large model, layer in government regulations and it’s a sure fire way for the incumbents to maintain dominance.
One point that’s not quite clear from the article: do you think doomers are wrong even conditional on AGI being developed soon? Or do you think we have reasons to be optimistic even in case timelines turn out to be short? (Which I understand you believe to be improbable, but presumably not completely impossible.)
Great article, and very comprehensible even for someone without a deep technical background in the ML field. Looking forward to the future installments.
This was an interesting post and I’m looking forward to the other two, if you write them (or did I already miss them?). I’m especially interested in how you wound up 95% confident that we won’t have AGI by 2043 by a path that involves substantially different AI algorithms than we are using today. In particular, 2043 is as far into the future as 2003 was in the past. And an awful lot has happened in AI since 2003—not just more compute but also new algorithmic ideas / paradigms, both within deep learning (which I believe was an obscure backwater in 2003; this is almost a decade before AlexNet, forget about ResNets, deep Q learning, BatchNorm, PPO, transformers, etc. etc.) and also outside deep learning (e.g. probabilistic programming is wildly more advanced today than 2003). Like, there’s a “scaling law” describing how LLM competence depends on training compute and data. But who’s to say that a future very different non-LLM AI algorithm couldn’t build a radically more competent AI out of the same training compute as today’s LLMs do?
In this post of mine — https://www.alignmentforum.org/posts/KJRBb43nDxk6mwLcR/ai-doom-from-an-llm-plateau-ist-perspective — I mostly agree with the main message of this post, except for (1) I argue that it does not imply that AGI concerns are overstated, (2) I suggest that AGI in the next 10 or 20 years is nevertheless quite plausible (I’d go way higher than 5%), per my previous paragraph.
I hope you continue this Newsletter Brian.
I liked this short era in which the best way to getting more intelligence was to feed it nicely measurable stuff: human-written data, parameters, compute. It's like some sort of resource pipeline in a game: insert resources, go up on the tech tree. But asking for it to continue indefinitely just so our world remains simple is false hope. What Sutton says is not that algorithms suck; it's that making intelligence work the way researchers think their own minds work is an uphill battle, with all the gains washed out by scale. What does not wash out at scale? Search, and learning. Both can and will be improved, and it's *very* hard to predict when and how.
There are some things that cannot be modeled by grounding to empirical reality. It is not only due to randomness and the research advances being unpredictable. No, it is also due to extraordinary power of choices that certain small groups of actors will make. Any readily available facts about the world can only help model the future outcomes that do not depend a lot on a small set of decisions.
Disclaimer: I am no expert but an ordinary nonML programmer, I have no inside information
A) How to get started understanding TPU benchmarks:
1) Compare fp32 vs fp16. These are both widely used data types, so there should be really good information out there about them. My personal opinion: designing a chip for fp32 is different than fp16, and you can get more operations per second out of fp16.
2) Google still uses TensorFlow, and documentation at https://www.tensorflow.org/lite/performance/post_training_quantization mentions "full integer quantization 4x smaller", which means it could perform almost like fp8.
3) A possible explanation for why Google’s TPU metrics look like outliers is that they may be measuring using fp16 or fp8
4) I think there's a debate between people who say quantization doesn't actually reduce training cost, but it is used pretty frequently on the inference side, so it probably is meaningful there. Google marks their TPUv4 as "inference-only".
Note: none of the below is significant enough to outweigh the "slowing down" hypothesis, but maybe it is enough to say "we're not THAT close to the very top of the S curve"
B) Personal opinions on progress
1) Because most of the biggest labs have gone closed-source (except for Facebook and UAE), that has slowed down progress and that's probably not reversing
2) Because of closed-source, it's possible that there may be AI model capability advances that we aren't seeing because they are too expensive to be economical, but in the future as people get better at using AI models, they may be willing to pay more to get more performance (because they will have learned how to better put more performance to good use)
C) AI hardware performance is also dependent on memory throughput.
1) Apple's M1 chip got surprisingly great benchmarks compared to older Apple chips because of increased memory throughput.
2) Cerebras is making much bigger chips, which I think also helps with memory throughput (but their benchmarks are not public).
3) It's possible that NVIDIA could copy Apple and Cerebras techniques to get better memory throughput
4) Stacked memory (SRAM) on top of compute transistors may be coming in lieu of smaller/faster compute transistors in future generations of hardware (https://fuse.wikichip.org/news/5531/amd-3d-stacks-sram-bumplessly/) (I don't know if better memory throughput really outweighs the lack of faster compute transistors though)
D) Learning by doing and competition can bring margins down
1) AMD has money now thanks to their advantage over Intel in the CPU market, and so can invest in a competitor to CUDA, and competition could bring prices down
2) If Moore's Law is ending, then learning by doing could lead to TSMC bringing down prices through better production efficiency (eg steps currently done by humans could perhaps instead be done by machine), and other competitors (eg Samsung, maybe Intel) will eventually catch up and enter the high-end GPU fabrication space and lower prices through competition
E) Software progress is not just on the training end but also on the inference end
1) Anthropic came out with a 100K context window, that doesn't affect the training costs, but it means a more capable model. Based on the previously-thought economics of inference, as context window goes up n, cost goes up n^2, so Anthropic almost certainly made an advance here (the other option is that every other company except Anthropic is way-overcharging)
2) If inference cost reductions (eg quantization) and inference capability measures (eg bigger context windows) improve, that makes training larger models make more economic sense.
I'm curious on your thoughts w/ regards to GPU's vs specialized hardware especially FPGAs. I'm not necessarily claiming that FPGAs will be a driver of cutting edge AI development, but I do wonder if they have a significant role to play. Microsoft for example has an interesting project:
Reading the article again, I’m left unconvinced, mainly because I think it fails to argue convincingly for any actual hard limit to ML progress along the lines it discusses.
Even if the current semiconductor technology turned out to be the final limit of what’s possible, there would still be plenty of room for making the same technology cheaper and more widely available. And even if programming models remained difficult and demanding special skills, the key players can surely find enough smart people capable of grappling with them to drive further progress.
Indeed, even if nobody managed to design a better chip than the A100 ever again, I’m sure that in not very many years we’d nevertheless end up with A100s (or equivalents) available for $1000, then $100, then probably $10 or even $1, instead of the current $10K or whatever. The astronomical sums needed to train the current state of the art models will inevitably fall by several orders of magnitude, even if technology makes no fundamental progress and we only see more of the same kinds of fabs but in a larger and more competitive market. It’s the same for not just chips, but also all other technologies needed to connect them on a large scale.
And while people skilled at parallel programming and intricate low-level optimization are rare, they’re not *that* rare. It’s far beyond the skills of an average programmer, but the big players are definitely able to hire teams of people who can deliver hand-optimized large-scale parallel applications on demand quickly.
So in the end, any argument against short AGI timelines would still need to hold in a hypothetical world where vast clusters of A100s can be obtained readily for couch change, and every big player has expert teams readily to write and hand-optimize applications for them on short order. Because that is indeed the world we’re likely to be in fairly soon, certainly much earlier than the 2040s.
Sorry, but you talk endlessly about hardware from a not very ML educated perspective, and where are any of your actual arguments? What can you even argue for? 1. that Moore's Law will break (hahaha good luck with that). 2. LLM plateu stage (yeah maybe, but 2045? come on that's ridiculous). 3. nothing. To the contrary, the open source revolution we are seeing with LLAMA now has shown that we are making huge leaps forward in reduction of parameter size and hence required compute power, i.e. that exponential growth might be an *underestimation* if you combine Moore's Law with global collective innovation. Also more compute power always directly translates to better model performance. The only thing that is really holding AGI back is lack of compute, not invention. This is very very clear. Takeaway: AGI might happen next year, if some rich wiz-kid rigs up something huge in his moms basement. This is where we are now, as ridiculous as that sounds.