Part 1. What It Is and Why We Do It

In this posting, I’ll introduce an interesting method that we often use in our research, called the Error Regression Scheme (ERS) [1-2]. In short, the ERS is a sort of online optimization technique, but it is different from other techniques in several ways. For instance, during the ERS, the weights are not updated. Instead, the neuron’s actual values are updated to minimize the error at the output. The ERS is a kind of prediction error minimization mechanism and there are several (philosophical) thoughts behind it. I’ll talk about them later in other postings. Let’s begin with what it is and why we use it.

What is the Error Regression Scheme (ERS)?

The simplest way to understand the ERS is to consider it as a sort of optimization techniques. During the (conventional) training process, the model’s learnable parameters (such as weights and biases) are updated to minimize the training error. For example, the weight (w) can be updated using the gradient descent.
w^{(k+1)}\leftarrow w^{(k)}-\eta\frac{\partial E}{\partial w}
E is a training error, often represented by discrepancy between model’s output and the desired output (target or teaching signal). $\eta$ is the learning rate and k indicates the index of the training epoch.

The ERS is also an optimization algorithm. That is, the ERS is used to minimize the error at the output. But there is a major difference between the training process and the ERS. During the training process, model’s parameters such as weights, biases (and initial states) are optimized in the direction of minimizing error at the output layer. During the ERS, weights and biases are not updated. Instead, the neurons’ values are updated (optimized) in the direction of minimizing the error while weights and biases are fixed. This is different from the conventional training process. And the ERS is used after training (after we obtained the weights and biases).

Why We Do It?

Adapting to Changing Environment

As mentioned earlier, the ERS is a sort of optimizing the model to minimize the error in an online manner. Let’s say, we trained the NN model for a robot which can perceives and predicts the visual images. When the robot is being operated with a trained NN model, there can be some differences between what the robot predicts and what the robot perceives. Because what the robot observes can be different from what it has learned during training, for example. Then, the robot can minimize prediction error by means of the ERS. During the ERS, the robot’s thinking (i.e. neuron’s values) will be updated to make correct prediction. In this sense, the ERS gives a robot an ability to adapt to changing environment.

Let’s say, we don’t use the ERS. In that case, the only way (or force) that changes (updates) the neurons’ values are observation. In other words, what the model observes drives the model to change neurons’ values. This sounds ok - What we observe changes what we think. In our studies, this way of updating neurons’ values is often called “Sensory Entrainment”, meaning that sensory information is entraining the model dynamics (This will be covered later in other postings).

Let’s take another approach - the ERS. Let’s say, we consistently make prediction about the world. Because our world cannot be predicted perfectly, there’ll be always an error between what we expect (prediction) and what we observe (observation). This error is a prediction error. And we consistently update our thinking to minimize prediction error (and this is what ERS does). In this case, the neural dynamics is driven by error minimization mechanism, rather than direct observation.

Can we say which scheme (either sensory entrainment or error regression) is better? Should neurons’ values be updated by direct observation? or by error minimization mechanism? It depends on the situation, I guess. But there is a study by Murata et al., [3] and it showed that the ERS elicited the faster reaction time than sensory entrainment. That’s kinda expected since the ERS will update the neurons’ values directly to minimize the error. But one drawback of the ERS is the computational cost - it can be very expensive if you use the high dimensional data. So I’ll leave it here unanswered - which is better than which -, and I’ll get back to this topic (including pros/cons of the ERS) later.

Recognizing Intention behind Observed Action

Let’s go a bit further. Difficulty of minimizing prediction error depends on the task and situation. Let’s say, we predict the trajectory of a ball falling from table. It’s not hard to make prediction about it. It will follow the rule of gravity (in most cases). We can make a quite accurate prediction of a falling ball. This kind of inanimate motion is relatively easier to predict since they’ll be ruled by the laws of physics.

Then, now, let’s consider we predict the trajectory of a boy running on the playground. Compared to the falling ball, it’s kinda harder to predict. Because his action is not only ruled by the laws of physics, but the action is caused by his intention. By keep observing his action, we can make more precise prediction about how he’ll move. For instance, if we consider only 1 second time-span of his action, it’s hard to make accurate prediction. But if we observe his action for a several seconds, then probably we’ll be able to figure out what he’s doing, what he wants (intention) and eventually, we’ll be able to make relatively accurate prediction about his motion. The computation behind this process is prediction error minimization. Kilner et al. [4] argued that the underlying cause of the observed action could be inferred by minimizing the PE. Similarly, the ERS can be used to infer intention behind the observations through minimizing prediction error.


  • The ERS is a sort of optimization technique, but it’s different from conventional training. Instead of optimizing weights and biases, the ERS updates the neuron’s values directly while weights and biases are fixed.
  • The ERS is performed after training. During the ERS, the prediction error can be minimized in an online manner. So it gives a robot an ability to adapt to changing environment.
  • The ERS can be also used to recognize the underlying intention in the observed pattern.

In the later postings, I’ll talk about how we actually performed the ERS in our experiments.


  1. J. Tani, Exploring Robotic Minds: Actions, Symbols, and Consciousness As Self-Organizing Dynamic Phenomena. New York, NY, USA: Oxford Univ. Press, 2016.
  2. J. Hwang, J. Kim, A. Ahmadi, M. Choi and J. Tani, “Dealing With Large-Scale Spatio-Temporal Patterns in Imitative Interaction Between a Robot and a Human by Using the Predictive Coding Framework,” in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. PP, no. 99, pp. 1-14.
  3. S. Murata, Y. Yamashita, H. Arie, T. Ogata, S. Sugano and J. Tani, “Learning to Perceive the World as Probabilistic or Deterministic via Interaction With Others: A Neuro-Robotics Experiment,” in IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 4, pp. 830-848, April 2017.
  4. J. M. Kilner, K. J. Friston, and C. D. Frith, “Predictive coding: An account of the mirror neuron system,” Cogn. Process., vol. 8, no. 3, pp. 159–166, 2007.