‘Friendly AI’ is a field of research with the goal of ensuring that if and when a general purpose artificial intelligence (especially with greater than human intelligence) is developed, it won’t be harmful to us.
I think there are some problems with this. I write this post in part to attract people who want to say why I am wrong, and why Friendly AI research would actually be useful.
First, if you want an introduction to what this is all about, read Superintelligence by Nick Bostrom. He explains the concept of smarter than human intelligence (this could come by means other than AI), and the many problems it presents. No perfect solution is given for them, and they are said to be hard. From Bostrom’s (ch. 9), here’s a list of some proposed solutions to control a Superintelligence.
Boxing methods The system is confined in such a way that it can affect the external world only through some restricted, pre-approved channel. Encompasses physical and informational containment methods.
Incentive methods The system is placed within an environment that provides appropriate incentives. This could involve social integration into a world of similarly powerful entities. Another variation is the use of (cryptographic) reward tokens. “Anthropic capture” is also a very important possibility but one that involves esoteric considerations.
Stunting Constraints are imposed on the cognitive capabilities of the system or its ability to affect key internal processes.
Tripwires Diagnostic tests are performed on the system (possibly without its knowledge) and a mechanism shuts down the system if dangerous activity is detected.
Direct specification The system is endowed with some directly specified motivation system, which might be consequentialist or involve following a set of rules.
Indirect normativity Indirect normativity could involve rule-based or consequentialist principles, but is distinguished by its reliance on an indirect approach to specifying the rules that are to be followed or the values that are to be pursued.
Augmentation One starts with a system that already has substantially human or benevolent motivations, and enhances its cognitive capacities to make it superintelligent.
The problems for trying to make AI safe are many.
First, let us think about how the first Superintelligence will come into existence. Let us further assume that it will be an Artificial Intelligence.
I see this happening either in an institutional setting or in a small team of tinkerers. If the former, Friendly AI research can be useful. If the latter, Friendly AI research won’t be of much use, because there is no guarantee every single developer will be using FAI research in their own project.
Then, even if an institutional team who follows FAI recommendations is first in achieving this, a second team not following the guidelines may also develop a Superintelligence. We are assuming that the first team’s Superintelligence will be controlled somehow, so it wouldn’t be able to control the teams that came after it. Even if it were programmed to do that, there would be some time between Superintelligence appearance and Superintelligence deploying effective surveillance all around the world.
In any event, I’m expecting the Superintelligence not to appear because it has been proven to be safe and then implemented. Technological progress rarely, if ever, happens like that. The first Industrial Revolution itself got started without much input from Science, but by tinkering. I’m not saying that trial and error can get you anywhere: it’s not the case. I’m making the weaker claim that ‘fundamental grounding’ as in simulating the full physics of a problem, or proving that something works in math or computer science, is not how things happen. Even basic science involves plenty of trial and error, and following promising results.
What will happen is that someone will be toying with some very advanced machine learning code, or trying to implement Schmidhuber’s Gödel Machine, or some version of AIXI in a realistic and useful way, and then, somehow, the system will end up, unexpectedly, Superintelligent.
This leads us to my core concern, and it is that I don’t see the possibility of formally proving or designing a Superintelligent system so that it is friendly, everywhere and always. Surely we can keep it contained, but that is the second best solution. Ideally, it would be free to do its function, but being friendly at the same time, without being excessively constrained.
Shane Legg, a researcher at Deepmind, proved in his PhD thesis that there exist no simple but powerful AI algorithms, and that the algorithms powerful enough to be of interest cannot be proven to work. From that, it’s likely that the first Superintelligence will appear serendipitously, and not because of careful design.
We have shown that there does not exist an elegant constructive theory of prediction for computable sequences, even if we assume unbounded computational resources, unbounded data and learning time, and place moderate bounds on the Kolmogorov complexity of the sequences to be predicted. Very powerful computable predictors are therefore necessarily complex. We have further shown that the source of this problem is the existence of computable sequences which are extremely expensive to compute. While we have proven that very powerful prediction algorithms which can learn to predict these sequences exist, we have also proven that, unfortunately, mathematical analysis cannot be used to discover these algorithms due to Gödel incompleteness.
Perhaps a similar proof could be provided for the impossibility of provably Friendly AI. Legg himself thought he had a sketch of a proof ten years ago, but in the end it wasn’t solid.
To recap, the argument is:
- It won’t be possible to design an AI that’s provably Friendly (My core claim)
- Even if it were possible, there exist the possibility of someone designing an AI that doesn’t follow that design
- Given the tradition of openness in AI research, it’s likely that after the first Superintelligence is developed, others will follow
- There won’t be time for the first Superintelligence to stop other researchers from developing their own.
- From the above, there will exist a Superintelligence that is not proved to be Friendly, regardless of Friendly AI research.
- Hence, Friendly AI research is possibly futile.
We should be careful not to think that just because my conclusion implies no certainty about a Superintelligence destroying us this has to mean that my conclusion is wrong. If with high certainty, the Superintelligence will be unfriendly (e.g. a paperclip maximiser), then it follows that the human race will disappear as soon as that Superintelligence is created, and there’s little we can do to stop it. This is an unfortunate implication, but its grimness does not affect what I’m saying here. Except perhaps in the way that even if there is a really really low probability of me being wrong, it still pays off to do Friendly AI research ‘just in case’.
Now I relax and wait for someone to correct me. Perhaps there’s something I’ve missed, or something that should lead us to expect that the problem is actually solvable.
[Update: I’m more optimistic now about the possibility of Friendly AI than I was when I wrote this post]