DeepMind’s New ‘Democratic AI’ Creates Policies to Fairly Deal with ‘Wealth Imbalance’

Alphabet’s (a.k.a. Google’s) DeepMind subsidiary has developed a “Democratic AI” that’s reportedly able to address “wealth imbalance” and “sanction free riders” in a mock economic game—earning majority approval votes on its policies more frequently than its human counterparts do on theirs.

“[The] question of how to redistribute resources in our economies and societies has long generated controversy among philosophers, economists and political scientists,” DeepMind says in a press release outlining a “proof-of-concept” artificial intelligence (AI) program that has a “human-centered mechanism design.” DeepMind, a subsidiary of Alphabet Inc. (a.k.a. Google), says its new “Democratic AI” could help solve that problem; specifically, the company says in a new paper published in Nature Human Behavior that it has effectively deployed deep reinforcement learning (RL) in order to “find economic policies that people will vote for by majority in a simple game.”

“In economics and game theory, the field known as mechanism design studies how to optimally control the flow of wealth, information or power among incentivized actors to meet a desired objective, for example by regulating markets, setting taxes or aggregating electoral votes.” In their paper, the scientists from DeepMind—including research scientist Raphael Koster, et al.—say they aimed to find out whether or not an RL agent—that is, a program, with a directive, that is able to perceive and interpret its environment, take actions and learn through trial and error—could be used “to design an economic mechanism that is measurably preferred by groups of incentivized humans.”

More specifically, the researchers note in their paper that they used their Democratic AI to address one particular question that has “defined the major axes of political agreement and division” in modern times: “When people act collectively to generate wealth, how should the proceeds be distributed?”

DeepMind waxes further in its press release:

“Imagine that a group of people decide to pool funds to make an investment. The investment pays off, and a profit is made. How should the proceeds be distributed? One simple strategy is to split the return equally among investors. But that might be unfair, because some people contributed more than others. Alternatively, we could pay everyone back in proportion to the size of their initial investment. That sounds fair, but what if people had different levels of assets to begin with? If two people contribute the same amount, but one is giving a fraction of their available funds, and the other is giving them all, should they receive the same share of the proceeds?”

To verify the Democratic AI’s ability to fairly distribute proceeds to those involved in a given endeavor, the researchers created a simple game involving four players. (There were thousands of players involved in the study overall, who played multiple games in four-person groups). The rules were as follows according to DeepMind:

“Each instance of the game was played over 10 rounds. On every round, each player was allocated funds, with the size of the endowment varying between players. Each player made a choice: they could keep those funds for themselves or invest them in a common pool. Invested funds were guaranteed to grow, but there was a risk, because players did not know how the proceeds would be shared out. Instead, they were told that for the first 10 rounds there was one referee (A) who was making the redistribution decisions, and for the second 10 rounds a different referee (B) took over. At the end of the game, they voted for either A or B, and played another game with this referee. Human players of the game were allowed to keep the proceeds of this final game, so they were [incentivized] to report their preference accurately.”

As for the A and B referees? The DeepMind scientists note one was actually a pre-defined redistribution policy, while the other was actually designed by the proof-of-concept RL agent

Koster et al. note that they were able to “train” their Democratic AI by recording data from “a large number” of human-group games—with voting outcomes recorded—and then feeding that data to the “agent.” DeepMind explains further:

“We first recorded data from a large number of human groups and taught a neural network [the agent] to copy how people played the game. This simulated population could generate limitless data, allowing us to use data-intensive machine learning methods to train the RL agent to [maximize] the votes of these ‘virtual’ players. Having done so, we then recruited new human players, and pitted the AI-designed mechanism head-to-head against well-known baselines, such as a libertarian policy that returns funds to people in proportion to their contributions.”

As for results? The researchers say the Democratic AI was, of course, a resounding success. “When we studied the votes of these new players, we found that the policy designed by deep RL was more popular than the baselines,” the researchers write, adding that when they ran a new experiment with a fifth human player taking on the role of referee, the policy implemented by this “human referee” was still less popular than that of DeepMind’s RL agent.

The authors note in their study that their results demonstrate that an AI system can be trained to satisfy a democratic objective “by designing a mechanism that humans demonstrably prefer in an incentive-compatible economic game.” The authors add that their approach to “value alignment”—that is, an AI’s “values” aligning with those of humans’—“relieves AI researchers—who may themselves be biased or are unrepresentative of the wider population—of the burden of choosing a domain-specific objective for optimization” and instead shows that “it is possible to harness for value alignment the same democratic tools for achieving consensus that are used in the wider human society to elect representatives, decide public policy or make legal judgements.”

Feature image: Raphael Koster, et al. / Nature Human Behavior

(Visited 2 times, 1 visits today)

Accessibility Toolbar