The AI beings operated smoothly when there were enough apples to go around, but once the apples became more sparse, the DeepMind systems began using laser beams, or tagging, to knock the other one out, ensuring they could collect all the apples.
A blog post from the DeepMind team read: “We let the agents play this game many thousands of times and let them learn how to behave rationally using deep multi-agent reinforcement learning…
“Rather naturally, when there are enough apples in the environment, the agents learn to peacefully coexist and collect as many apples as they can.
“However, as the number of apples is reduced, the agents learn that it may be better for them to tag the other agent to give themselves time on their own to collect the scarce apples…”
“Less aggressive policies emerge from learning in relatively abundant environments with less possibility for costly action.
“The greed motivation reflects the temptation to take out a rival and collect all the apples oneself.”
My guess is they are approximating amygdala in the software to produce drive. Once they do that, the AI is measuring cost vs gain. When resources are everywhere, the cost to pacifism is very low, meaning even the slight gain it affords can justify pacifism. When resources grow scarce, the cost of pacifism is death, meaning it is not an option.
This is what has been encoded into the biology of the human machine, in the form of our r/K adaptability.
It is extremely illuminative, given we are facing a global economic Apocalypse that will make the Great Depression look tame.
Spread r/K Theory, because even the machines will recognize it
[…] r/K Selection Aggression Emerges In Google Artificial Intelligence […]
I wonder what the results would be if there were consequences to malicious behavior. For example, if you have several tribes of people, a b and c.
Agents a, and b holds grudges and vengeance for bad behavior, only agebt b is also malicious, that is, it acts badly, agent c outright engsgs in banditry but does not have long term memory for bad acting. What would be the learned patterns?
I think Agents B and C would eventually understand not to fuck with A. Leave A alone, and farm your own territory, and A will leave you alone to do the same. C will be shot on sight, whereas B will probably occasionally test A but more likely mess with C if hes not actively trying to kill him.
That would be an interesting rationale for honorable behavior among Ks. Those tribes that could manage such were able to survive longer terms, tribrs that engaged in long term dishonorable patterns were hunted to extinction.
This is just game theory. Nothing to do with amygdala.
Any optimization technique is called “Deep Learning” and “AI” at the moment.
The current buzzwords.