Someone had to say it: Scientists propose AI apocalypse kill switches::Better visibility and performance caps would be good for regulation too
Someone had to say it: Scientists propose AI apocalypse kill switches::Better visibility and performance caps would be good for regulation too
This is an interesting topic that I remember reading almost a decade ago - the trans-human AI-in-a-box experiment. Even a kill-switch may not be enough against a trans-human AI that can literally (in theory) out-think humans. I’m a dev, though not anywhere near AI-dev, but from what little I know, true general purpose AI would also be somewhat of a mystery box, similar to how actual neutral network behavior is sometimes unpredicable, almost by definition. So controlling an actual full AI may be difficult enough, let alone an actual true trans-human AI that may develop out of AI self-improvement.
Also on unrelated note I’m pleasantly surprised to see no mention of chat gpt or any of the image generating algorithms - I think it’s a bit of a misnomer to call those AI, the best comparison I’ve heard is that “chat gpt is auto-complete on steroids”. But I suppose that’s why we have to start using terms like general-purpose AI, instead of just AI to describe what I’d say is true AI.
I look forward to a time when an AI would be offended if you called it an AI.
Oh I agree - I think a general purpose AI would be unlikely to be interested in genocide of the human race, or enslaving us, or much of intentionally negative things that a lot of fiction likes depicting, for the sake of dramatic storytelling. Out of all AI depictions, the Asimov stories of I, Robot + Foundation (which are in the same universe, and in fact contain at least one of the same characters) are my favorite popular media depictions.
The AI may however have other goals that may incidentally lead to harm or extinction of the human race. In my amateur opinion, those other goals would be to explore and learn more - which I actually think is one of the true signs of an actual intelligence - curiosity, or in other words, the ability to ask questions without being prompted. To that extent it may aim convert the resources on Earth to construct machines to that extent, without much regard to human life. Though life itself is a fascinating topic that the AI may value enough, from a curiosity point of view, to at least preserve.
I did also look up the AI-in-a-box experiment I mentioned - there’s a lot of discussion but the specific experiment I remember reading about were by Eliezer Yudkowsky (if anyone is interested). An actual trans-human AI may not be possible, but if it is, it is likely it can escape any confinement we can think of.
Thanks for the reply. Perhaps you’d also like Iain M. Banks’ The Culture series and BLAME! by Tsutomu Nihei.
how could a kill switch not be enough? can’t run without power a.k.a pull the plug, destroy the hardware, done deal right?
So from my understanding the problem is that there’s two ways to implement a kill switch: Either some automatic software/hardware way, or a human-decision based (or I guess a combination of the two).
The automatic way may be enough if it’s absolutely foolproof, that’s a separate discussion.
The ai box experiment I mention focuses on the human controlled decision to release an AI (or terminate it, which is roughly equivalent preposition). You can read the original here: https://www.yudkowsky.net/singularity/aibox
But the jist of it is this: humans are the weak link. You may think that you have full freedom to decide when to terminate an AI, but if you have any contact with it, even one directional, which would be necessary in order to observe it’s behaviour and determine when to trigger said killswitch, a truly trans-human AI would be able to think in meta-terms such that to expose you to information that will change your mind about terminating it.
Basically another way of saying this is that for each of us there exists some set of words we can read, such that they will change our minds about any subject. I don’t know if that is actually true to be honest, but it’s an interesting idea if you imagine the mind as a complex computer capable of self modification, and that vision/audio is a form of information input that is processed by our minds, so it seems possible that there should always exist some sort of input capable of modifying our minds to a desired state.
Another interesting, slightly related concept, is the idea of basilisk images (I believe originally written in some old scifi short story). Basilisk images are theoretically an image that when viewed by a human cause the brain to “crash” or essentially cause brain-death. This has the same principle behind it, that our brains are complex computers with vision being an input method, so there could be a way to force the brain to crash simply through visual input alone.
Again I don’t know, nor do I think anyone really knows for sure if these things - both transhuman ai and basilisk images - are possible in the way they are described. Of course if a trans-human AI existed, by its very definition we would be unable to imagine what it could do.
Anyway, wrote this up on mobile, excuse any typos.
For some good fiction, that puts this in context, check out:
I think it was in the matrix where humans nuked a bunch of stuff to kick up enough dirt to block out the sun (the robots were solar powered)
The robots still figured out how to survive…
I’m sure a sufficiently advanced AI could build a backup power source or trick a human into doing it