FEBRUARY 20, 2023

Andy Lee

This is an unofficial post which tries to explain Seldonian algorithm in an easy-to-understand way, based on the Science paper Preventing undesirable behavior of intelligent machines

Note that bold and underlined means "this is a new term I'm introducing," whereas ****italics is used for emphasis.

It’s important to ensure AI is fair

With the popularity of chatgpt, more and more people are impressed by the power of AI. At the same time, there are concerns that humanity’s survival will be threatened when AI becomes too powerful, especially if we ignore and lack constraints on its code of conduct. Although our intelligent machine at this stage is far from being a Terminator, what makes researchers faintly worried is that they have already begun to cause harm in ways that demonstrate unfairness, like exhibiting racist, sexist and otherwise unfair behaviors that could reinforce existing social inequalities [1, 2, 3, 4, 5, 6, 7] and have produced dangerous behaviors [1], including those that have caused harm and death [1, 2, 3]. With the rapid improvement of technology, applications of AI are almost everywhere now, ranging from face detection and autonomous driving. Therefore, we must make sure that such systems should be fair and won’t exhibit behavior which is undesired and dangerous, at least with high-confidence guarantees.

Why current ML is unfair?

Let’s see how a standard ML approach works when it tries to solve a problem:

Define the goal what we want our ML algorithm to do, says like predict student’ GPA
Select a criterion to evaluate algorithm performance, like the Mean Squared Error (MSE) between algorithm’s predictions and student’ real GPA
Find a solution that performs best under the chosen evaluation criterion. In this example, the prediction of the best solution should ideally be the same as the real GPA for every student

Now you have an intuitive understanding of ML. Let's rephrase the above three steps in a slightly more mathematical way:

We have a function f and for each $\text{student}_i$, we use the output $f(\text{student}_i|\theta)$ as the prediction about this student’s GPA (you can treat $\theta$ as some “configurations” of our function)
In the GPA prediction example, we use MSE over all the students in our data set D, $\sum_{i=1}^{D}(f(\text{student}_i|\theta)-\text{label}_i)^2$, to evaluate how good our prediction is and the total sum is also named as Loss or Cost. In the above equation, $label_i$ means the real GPA of $student_i$
Since how well our prediction performs is decided by the “configuration” $\theta$, what we ultimately want is to find the best “configuration” $\theta^$ which can make the loss as small as possible. For example, under ideal circumstances, we can have $\sum_{i=1}^{D}(f(\text{student}_i|\theta^)-\text{label}_i)^2 = 0$, which means our predictions are perfectly identical to the real GPAs.

In a word, a standard ML approach tries to find a solution $\theta^*$ which minimizing the loss.

Seems like everything is ok, right? Then, the problem actually lies here! Let's think about what kind of problems might arise when we frantically want to minimize the loss or even make it to zero!