AI-Box Experiment

jojo · July 10, 2017, 10:53pm

Not mentioned directly (at this point) in a website or document, but in discussions (such as here and here and here a discussion where sean murray mentions Roko’s Basilisk)

Definition AI Box Experiment

from - AI capability control - Wikipedia

a hypothetical isolated computer hardware system where a possibly dangerous artificial intelligence, or AI, is kept constrained in a “virtual prison” and not allowed to manipulate events in the external world.

This also connects to Roko’s Basilisk

Definition of Roko’s Basilisk

from Roko's basilisk - LessWrong

Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn’t work to bring the agent into existence. The argument was called a “basilisk” because merely hearing the argument would supposedly put you at risk of torture from this hypothetical agent — a basilisk in this context is any information that harms or endangers the people who hear it.

from LessWrong - Wikipedia

In July 2010, LessWrong contributor Roko posted a thought experiment to the site in which an otherwise benevolent future AI system tortures simulations of those who did not work to bring the system into existence. This idea came to be known as “Roko’s basilisk,” based on Roko’s idea that merely hearing about the idea would give the hypothetical AI system stronger incentives to employ blackmail.

References and Connections

description of the AI box from Yudkowsky’s site - The AI-Box Experiment: – Eliezer S. Yudkowsky
Overview article (opinionated) from Motherboard - How a Superintelligent AI Could Convince You That You're a Simulation
overview article with some good references - Roko’s Basilisk: The most terrifying thought experiment of all time.
xkcd - 1450: AI-Box Experiment - explain xkcd

Related Concepts

Timeless Decision Theory - a decision theory, developed by Eliezer Yudkowsky which, in slogan form, says that agents should decide as if they are determining the output of the abstract computation that they implement. This theory was developed in response to the view that rationality should be about winning (that is, about agents achieving their desired ends) rather than about behaving in a manner that we would intuitively label as rational. http://intelligence.org/files/TDT.pdf and Timeless Decision Theory - LessWrong
Pascal’s Wager - some call RB a digital version of this theory - Pascal's wager - Wikipedia

Dolnor · July 11, 2017, 9:09pm

I had a friend, who was later diagnosed with Dissociative Identity Disorder, who stated that his “other self” would mentally torture him if “it” heard other people mention “it” by name in conversations and my friend wouldn’t let “it” to come to the forefront to argue with those people. And yes, my friend got help he needed to “it”. Roko’s Basilisk sounds a lot like my past friend’s mental disorder.

Topic		Replies	Views
Sean Murray, Elon Musk, and a possible hell waiting in Waking Titan WT Thoughts & Theories	20	3375	September 18, 2017
Russian roulette Waking Titan	19	1200	July 29, 2017
Understanding Reality Waking Titan Investigation	45	2637	July 17, 2017
Norman - Evil AI General Discussion	1	504	June 5, 2018
Oh Wow. How much of Waking Titan is true? Waking Titan	10	1155	April 9, 2019

AI-Box Experiment

Definition AI Box Experiment

Definition of Roko’s Basilisk

References and Connections

Related Concepts

Related topics