Wednesday, January 16, 2013

What is value--accumulated reward or evidence? III


Friston, K. Adams, R. Montague, R. (2012) What is value -- accumulated reward or evidence? Frontiers in Neurorobotics 6(11)

The mountain car problem
Need to park up mountain, but car can't make it. Have to climb up opposing valley to gain momentum to reach parking spot. Going to use inference to get it right in 1 trial, instead of learning which is the classic method.

x is the continuous position and velocity (the full state space). a(u) is the real valued action associated with the control state, u. a(u) = [-2, -1 , 0, 1, 2] - can accelerate left (negative) or right (positive) and have strong and moderate acceleration levels.

The state-space was discretized to simulate some noise, so the observed states are the discrete continuous states. Prior beliefs about the final state specify the goal x = (1, 0). The sampling (R(s_{t+1}|s_t, a_t)) probabilities are based on the equations of motion.

the value of an observed state is prescribed by a generative model in terms of the probability a state will be occupied.

valuable behavior simply involves sampling the world to ensure model predictions are fulfilled. prior beliefs about future states have a simple form: future states will minimize uncertainty about our current beliefs.

Perception corresponds to hypothesis testing -- sensory sampling are experiments that generate sensory data. eye movements are optimal experiments to test beliefs about the causes of the data gathered.


Definition: Active inference rests on the tuple (O, X, S, A, R, q, p) that comprises the following:

• A sample space O or non-empty set from which random fluctuations or outcomes ω ∈ O are drawn
• Hidden states X: O × A × → |R—states of the world that cause sensory states and depend on action
• Sensory states S : O × A × → |R—the agent’s sensations that constitute a probabilistic mapping from action and hidden states
• Action A : S × R → |R—an agent’s action that depends on its sensory and internal states
• Internal states R : R × S × O→ |R—the states of the agent that cause action and depend on sensory states
• Generative density p(s, ψ|m)—a probability density function over sensory and hidden states under a generative model denoted by m
• Conditional density q(ψ) := q(ψ|μ)—an arbitrary probability density function over hidden states ψ ∈ X that is parameterized by internal states μ ∈ R

The imperative is to minimize dispersion of sensory and hidden states with respect to action. Don't have access to hidden states, so minimize decomposition of entropy: entropy of the sensory states and conditional entropy of hidden states.

A lot more...

Neurobiological implementations of active inference

• The brain minimizes the free energy of sensory inputs defined by a generative model.
• This model includes prior expectations about hidden controls that maximize salience.
• The generative model used by the brain is hierarchical, non-linear, and dynamic.
• Neuronal firing rates encode the expected state of the world, under this model.

"The third assumption is motivated easily by noting that the world is both dynamic and non-linear and that hierarchical causal structure emerges inevitably from a separation of temproal scales. Finally, the fourth assumption is the Laplace assumption that, in terms of neural codes, leads to the Laplace code that is arguably the simplest and most flexible of all neural codes (Firston, 2009)


This looks a lot like the Maass bayesian stuff in the end. They make a generative model that is hierarchical. Prediction errors are propagated up the hierarchy, while predictions are sent back down.

FIGURE 4 | Schematic detailing the neuronal architecture that might
encode conditional expectations about the states of a hierarchical
model. This shows the speculative cells of origin of forward driving
connections that convey prediction error from a lower area to a higher area
and the backward connections that construct predictions (Mumford, 1992).
These predictions try to explain away prediction error in lower levels. In
this scheme, the sources of forward and backward connections are
superficial and deep pyramidal cells, respectively. The equations represent
a generalized descent on free-energy under the hierarchical models
described in the main text: see also Friston (2008). State-units are in black
and error-units in red. Here, neuronal populations are deployed
hierarchically within three cortical areas (or macro-columns). Within each
area, the cells are shown in relation to cortical layers: supra-granular (I–III),
granular (IV), and infra-granular (V and VI) layers. For simplicity, conditional
expectations about control states had been absorbed into conditional
expectations about hidden causes.


Advantages of the value as evidence formalism:

• A tractable approximate solution to any stochastic, non-
linear optimal control problem to the extent that stan-
dard (variational) Bayesian procedures exist. Variational or
approximate Bayesian inference is well-established in statis-
tics and data assimilation because it finesses many of
the computational problems associated with exact Bayesian
inference.
• The opportunity to learn and infer environmental constraints
in a Bayes-optimal fashion; particularly the parameters of
equations of motion and amplitudes of observation and hidden
state noise.
• The formalism to handle system or state noise: currently, opti-
mal control schemes are restricted to stochastic control (i.e.,
random fluctuations on control as opposed to hidden states).
One of the practical advantages of active inference is that
fluctuations in hidden states are modeled explicitly, rendering
control robust to exogenous perturbations.
• The specification of control costs in terms of priors on control,
with an arbitrary form: currently, most approximate stochas-
tic optimal control schemes are restricted to quadratic control
costs. In classical schemes that appeal to path integral solutions
there are additional constraints that require control costs to be
a function of the precision of control noise; e.g., Theodorou
et al. (2010) and Braun et al. (2011). These constraints are not
necessary in active inference.


No comments:

Post a Comment