In this course, we will motivate the use of imprecise probability theories as models for situations where the available information does not allow us to elicit a (precise) probability model with certain guarantees [W1996] [Cozman]. Specifically, we shall focus on Peter Walley's theory of coherent lower previsions [M2008] [Walley1991], a theory that generalises de Finetti's approach to subjective probability in order to allow for indecision. In particular:
1. We shall see how to model the uncertainty about the outcomes of an experiment by means of a set of probability measures, or credal set;
2. We will explain how to summarise this set by means of the associated lower and upper expectation functionals, called lower and upper previsions;
3. We will review a number of rationality criteria for lower and upper previsions, such as avoiding sure loss and coherence, and show how these models can be equivalently axiomatised by means of sets of desirable gambles;
4. We will discuss how to make inferences from the existing models while still satisfying the rationality criteria, using the notion of natural extension.
Finally, we study how to extend the notion of coherent lower previsions under the light of new information. We define then the so-called conditional lower previsions. We see how to extend the notion of coherence to this case, and the main difficulties that arise.
During the course we shall present the above topics together with a number of examples and exercises.
Recommended readings for this course are [A1965] [CD2013] [CS2008] [CS2011] [D1967] [DP1988] [DP2009] [DMH2005] [DDC2008] [FKHOG2007] [K1987] [M2011] [S1976] [SK1994].
The course is divided into two parts. First, we deal with some particular imprecise probabilities models (capacities, random sets, possibility measures, etc.) Secondly, we use random sets as representations of ill-observed attributes, and we will show how to solve hypothesis testing problems in this setting. More specifically, the course covers the following items:
1. Particular cases of lower/upper previsions. Formal connections between uncertainty theories and types of information justifying each representation.
a) The case of plain incomplete information: epistemic sets Boolean possibility and necessity measures. Link with modal logic.
b) Capacities that define credal sets: supermodular set-functions, and generalisations of possibility and necessity.
c) Belief/plausibility functions : the formal setting of epistemic random sets, basic probability assignments and Moebius transforms in the finite case and the induced set-functions. Examples where belief functions naturally appear (set-valued data, multivalued mappings and indirect measurement, unreliable testimonies). Shafer and Smets vs. Dempster approaches.
d) Special cases: p-boxes, numerical possibility and necessity measures (consonant belief functions).
e) Examples where possibility measures naturally appear: fuzzy sets, probabilistic inequalities, likelihood functions, confidence intervals, relative peakedness of probability densities.
2. Statistics with interval data.
a) Representation of ill observed attributes by means of multi-valued mappings.
b) Descriptive statistics: representation of the incomplete information about different parameters like the mean, the median, the variance, etc.
c) Belief-plausibility model associated to the ill-observed attribute. Set-valued parameters: Aumann expectation, Kruse variance, etc.
d) Hypothesis testing: generalized tests as multi-valued mappings, representation of imprecise information about the critical value by means of interval probabilities.
We start with the fundamentals by defining the basic problem of inductive inference. Using the simple example of inference from a series of iid Poisson observations, we introduce the difference and relationships between predictive and parametric inference. We use this example as well to show how we can move from precise probabilistic to imprecise probabilistic inference.
Once the basic ideas are in place, we focus on the structural assumptions underlying all inference models. We use the practically omnipresent exchangeability assumption as the main example. We show the impact of this assumption on inference models for categorical sampling.
We then move towards the most widely used imprecise probabilistic inference models: the predictive IDMM and its parametric sibling, the IDM. This is done by constructing simple but enlightening special cases of these models starting from the exchangeability assumption, which are then generalized.
Once the IDM and IDMM are in place, we have a look at the other properties these models have, i.e., what other structural assessments they satisfy: representation invariance and specificity. To illustrate that not all these structural assessments are equally compelling, we compare the IDMM with other models found in the literature.
Throughout the course, the students have to make small exercises to push them to come to grips with the concepts introduced and experience that using them is within their reach.
Probabilistic graphical models like Bayesian networks are important tools in AI for reasoning with uncertainty and data mining. The quantification of a Bayesian network requires sharp (i.e., precise) assessments of the model probabilities. This seems to be problematic when coping with qualitative expert knowledge or when learning probabilities from few or incomplete data. To cope with these issues, credal networks [C2000] have been proposed as a generalization of Bayesian networks based on imprecise probabilities: credal sets instead of single distributions should be assessed, thus providing higher expressiveness and realism during the modeling phase.
We describe a systematic approach to the implementation of knowledge-based expert systems based on credal networks [PAZ2010]. This technique is demonstrated, together with specific software tools, by means of a real-world application to military decision making developed during the last NATO Multinational Experiment and based on credal networks [AHZLCL2013]. High-level information fusion and decision making based on these models is also discussed.
We consider indeed the problem of learning credal networks from data and the corresponding classification algorithms, called credal classifiers, which generalize the traditional Bayesian classifiers that are based on a single prior density and on a single likelihood. Credal classifiers are instead based on (i) a set of priors, thus removing the need for subjectively choosing a prior and (ii) possibly also on a set of likelihoods, to allow robust classification even with missing data. Credal classifiers can return more classes if the assignment to a single class is too uncertain; in this way, they preserve reliability (e.g., [CZ2008] and [CA2012]). Algorithms for credal classification and comparison with traditional classifiers on a large number of data sets are presented. Also the problem of evaluating performance of a classifier possibly returning multiple outputs [ZCM2012] and alternative quantification techniques are discussed.
This short course has a simple goal: to present algorithms and methods that let one manipulate imprecision in probability values in a simple and efficient manner. Imprecision can be variously encoded through different formalisms, but we focus on a simple representation using sets of probability measures, and we bring into the discussion the relations with other representations as they become needed [Walley1991]. The main tools are based on linear and non-linear programming, which are manipulated for the treatment of credal sets and the computation of expectations [AdCHZ2013] [dCC2004]. We resort to established tools from mathematical programming and optimization as much as possible [PS1998]. They are explained in enough detail so as no prior knowledge of those topics is required.
The concepts of stochastic independence and stochastic conditional independence are central to probability theory: for instance, they are used to build large distributions out of marginal and conditional distributions [P1988]. One such case usually appears under the name of graphical models, or more especially as the well-known Bayesian networks, which have been spread out in a wide variety of areas from Statistics, Engineering and Computer Science to Human and Biological Sciences. Among their goals are the simplification of modeling and the efficiency of computations. We explore different concepts of independence that have emerged for imprecise probabilities and can be used as basis to build graphical models, even if we keep the attention on computational ideas and algorithms.
Finally we consider theoretical and practical aspects of algorithms and approximation methods related to imprecise probability models [MdCZ2012]. The goal is to clarify their computational complexity in a simplified language and to discuss on practical applications of algorithms for imprecise probability in other areas. Many computational challenges that are faced by imprecise methods have appeared elsewhere or can be used there. The goal is to identify some of these possibilities, study their relations and applicability. Examples exist in decision making and planning, learning with censored/missing data, treatment of logical or qualitative assessments, among others. As before, the presentation is self-contained and no knowledge of computational complexity is needed.
Since Abraham Wald [W1939], decisions have been at the center of classical statistical inference. As we will see, they also play a central role in the interpretation of imprecise probability [SSK1995], and in the practice of inference under uncertainty when using imprecise probabilities [Walley1991].
In this course,
1. we investigate the decision theoretic foundation to imprecise probability and its link to standard Bayesian decision theory ([Walley1991] Sections 3.7-3.9);
2. we critically review some fundamental problems encountered when applying imprecise probability theory to decision making [HT2009] [HT2011] ;
3. we discuss the most popular decision criteria, and why you might use them or not [T2006]; and finally,
4. we demonstrate some algorithms that can be used to solve decision problems with imprecise probability in practice [HT2008] [HT2012] [KCC2005] [KCF2011] [T2007] [THF2010] [UA2005]
To achieve these goals, we will start with a simple example to highlight the typical issues one encounters when trying to making decisions under severe uncertainty.
In earlier lectures, you will have seen how lower and upper expectations are useful as models for severe uncertainty, applicable for instance when only partial probability specifications are available, and when we are worried about the consequences of implicit assumptions not reflected by data or expert opinion. We explore how such lower and upper expectations naturally arise in decision theory, simply by allowing for indecision, or incomparability between options.
Next, we discuss a few specific decision criteria, again using our simple example as a starting point: Gamma-maximin, interval dominance, maximality, E-admissibility, and info-gap. For each criterion, we explore why you might use it, how you can calculate it, and how algorithms and techniques seen in earlier lectures can be exploited most effectively for the purpose of decision making.
Finally, we move on to sequential decision making. We briefly discuss how our usual intuition about sequential decisions no longer applies due to subtree perfectness being necessarily violated as soon as we allow for indecision. We then review the basic principles behind backward induction, and we investigate the conditions a decision criterion must satisfy for such algorithms to work. We end with demonstrating backward induction on some practical examples.
The focus of exercises will be on solving actual decision problems.
Current practice with respect to the representation and propagation of uncertainties in the field of environmental risk analysis still relies largely upon the classical probabilistic paradigm. In the latter, uncertain parameters are represented by single probability distributions, albeit in the face of imprecise/incomplete information. Since about 20 years, joint propagation methods have been developed which allow the manner in which information is represented and propagated through the estimation of risk, to be more consistent with the nature of that information. Such methods have been applied to the estimation of risks in a wide variety of areas including water contamination, human health, building stability, earthquake or landslide events, radionuclide migration, etc. Very recently, joint propagation has been applied to uncertainty treatment in LCA (Life Cycle Analysis); i.e., a comprehensive methodology aimed at evaluating the environmental impacts of a product or a service all along its life cycle.
This presentation will describe these methods from the viewpoint of a practitioner of environmental risk assessments. Three specific application areas will be addressed: health risk assessments, Life-Cycle Analysis and Material Flow Analysis. These applications contrast in terms of methodology. While the first two pertain to the joint representation and propagation of stochastic and epistemic uncertainties through a "model", the latter is a problem of data reconciliation under fuzzy constraints. The presentation will touch on the issue of communicating the results of these analyses with decision-makers. Recent proposals for the computation of single estimators of the likelihood of risk will be described. An important message in terms of communication is that, while decision-makers typically expect methods that promote increased ?exactness?, the proposed methods aim rather to promote increased ?consistency? between available information and mathematical representation. Expected benefits are increased reliability and robustness.
Positron Emission computed Tomography (PET) is an important diagnostic imaging modality used in nuclear medicine to visualize and measure in vivo biochemical and physiological processes.
The usual reconstruction methods in PET are based on a discrete representation of both the measured data and the reconstructed image. The choice of the image basis functions before reconstruction (typically voxels basis in 3D) leads to parametric models with a fixed and finite but huge number of parameters. Furthermore, the voxel size acts as a tuning parameter since the greater the voxel, the more regularized the image and the poorer the spatial resolution.
We present an approach for the problem of PET reconstruction in a continuous space. Observations are discrete projections of detected random emission locations whose spatial (3D) or space-time (3D+t) probability distribution has to be estimated. We follow a nonparametric Bayesian approach where regularization of the inverse problem relies entirely on the nonparametric prior. Namely, the spatial activity distribution is viewed as a probability density on $\mathbb{R}^3$. As a prior for this density, we consider a Dirichlet process mixture (DPM) with a Normal-Inverse Wishart as base distribution. A key point of the proposed approach is to deal with the infinite representation of the distribution without resorting to arbitrary truncation of models. We develop a conditional sampler for DPM using a strategy that allows to sample a finite number of parameters at each iteration. The MCMC algorithm is thus able to generate draws from the posterior distribution of the spatial intensity in order to estimate desired functionals and point-wise credible intervals. Finally, we present results for synthetic as for real PET data.