FEBRUARY 20, 2023
Andy Lee
This is an unofficial post which tries to explain Seldonian algorithm in an easy-to-understand way, based on the Science paper Preventing undesirable behavior of intelligent machines
Note that bold and underlined means "this is a new term I'm introducing," whereas ****italics is used for emphasis.
With the popularity of chatgpt, more and more people are impressed by the power of AI. At the same time, there are concerns that humanity’s survival will be threatened when AI becomes too powerful, especially if we ignore and lack constraints on its code of conduct. Although our intelligent machine at this stage is far from being a Terminator, what makes researchers faintly worried is that they have already begun to cause harm in ways that demonstrate unfairness, like exhibiting racist, sexist and otherwise unfair behaviors that could reinforce existing social inequalities [1, 2, 3, 4, 5, 6, 7] and have produced dangerous behaviors [1], including those that have caused harm and death [1, 2, 3]. With the rapid improvement of technology, applications of AI are almost everywhere now, ranging from face detection and autonomous driving. Therefore, we must make sure that such systems should be fair and won’t exhibit behavior which is undesired and dangerous, at least with high-confidence guarantees.
Let’s see how a standard ML approach works when it tries to solve a problem:
Now you have an intuitive understanding of ML. Let's rephrase the above three steps in a slightly more mathematical way:
In a word, a standard ML approach tries to find a solution $\theta^*$ which minimizing the loss.
Seems like everything is ok, right? Then, the problem actually lies here! Let's think about what kind of problems might arise when we frantically want to minimize the loss or even make it to zero!