Definition AI Box Experiment
a hypothetical isolated computer hardware system where a possibly dangerous artificial intelligence, or AI, is kept constrained in a “virtual prison” and not allowed to manipulate events in the external world.
This also connects to Roko’s Basilisk
Definition of Roko’s Basilisk
Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn’t work to bring the agent into existence. The argument was called a “basilisk” because merely hearing the argument would supposedly put you at risk of torture from this hypothetical agent — a basilisk in this context is any information that harms or endangers the people who hear it.
In July 2010, LessWrong contributor Roko posted a thought experiment to the site in which an otherwise benevolent future AI system tortures simulations of those who did not work to bring the system into existence. This idea came to be known as “Roko’s basilisk,” based on Roko’s idea that merely hearing about the idea would give the hypothetical AI system stronger incentives to employ blackmail.
References and Connections
- description of the AI box from Yudkowsky’s site - http://yudkowsky.net/singularity/aibox
- Overview article (opinionated) from Motherboard - https://motherboard.vice.com/en_us/article/539ajz/the-superintelligent-ai-says-youre-just-a-daydream
overview article with some good references - http://www.slate.com/articles/technology/bitwise/2014/07/roko_s_basilisk_the_most_terrifying_thought_experiment_of_all_time.html
- xkcd - https://www.explainxkcd.com/wiki/index.php/1450:_AI-Box_Experiment
- Timeless Decision Theory - a decision theory, developed by Eliezer Yudkowsky which, in slogan form, says that agents should decide as if they are determining the output of the abstract computation that they implement. This theory was developed in response to the view that rationality should be about winning (that is, about agents achieving their desired ends) rather than about behaving in a manner that we would intuitively label as rational. http://intelligence.org/files/TDT.pdf and https://wiki.lesswrong.com/wiki/Timeless_decision_theory
- Pascal’s Wager - some call RB a digital version of this theory - https://en.wikipedia.org/wiki/Pascal’s_Wager