Montpellier, France, 21-25 July 2014

In this course, we will motivate the use of imprecise probability theories as models for situations where the available information does not allow us to elicit a (precise) probability model with certain guarantees [W1996] [Cozman]. Specifically, we shall focus on Peter Walley's theory of coherent lower previsions [M2008] [Walley1991], a theory that generalises de Finetti's approach to subjective probability in order to allow for indecision. In particular:

1. We shall see how to model the uncertainty about the outcomes of an experiment by means of a set of probability measures, or credal set;

2. We will explain how to summarise this set by means of the associated lower and upper expectation functionals, called lower and upper previsions;

3. We will review a number of rationality criteria for lower and upper previsions, such as avoiding sure loss and coherence, and show how these models can be equivalently axiomatised by means of sets of desirable gambles;

4. We will discuss how to make inferences from the existing models while still satisfying the rationality criteria, using the notion of natural extension.

Finally, we study how to extend the notion of coherent lower previsions under the light of new information. We define then the so-called conditional lower previsions. We see how to extend the notion of coherence to this case, and the main difficulties that arise.

During the course we shall present the above topics together with a number of examples and exercises.

Recommended readings for this course are [A1965] [CD2013] [CS2008] [CS2011] [D1967] [DP1988] [DP2009] [DMH2005] [DDC2008] [FKHOG2007] [K1987] [M2011] [S1976] [SK1994].

The course is divided into two parts. First, we deal with some particular imprecise probabilities models (capacities, random sets, possibility measures, etc.) Secondly, we use random sets as representations of ill-observed attributes, and we will show how to solve hypothesis testing problems in this setting. More specifically, the course covers the following items:

1. Particular cases of lower/upper previsions. Formal connections between uncertainty theories and types of information justifying each representation.

a) The case of plain incomplete information: epistemic sets Boolean possibility and necessity measures. Link with modal logic.

b) Capacities that define credal sets: supermodular set-functions, and generalisations of possibility and necessity.

c) Belief/plausibility functions : the formal setting of epistemic random sets, basic probability assignments and Moebius transforms in the finite case and the induced set-functions. Examples where belief functions naturally appear (set-valued data, multivalued mappings and indirect measurement, unreliable testimonies). Shafer and Smets vs. Dempster approaches.

d) Special cases: p-boxes, numerical possibility and necessity measures (consonant belief functions).

e) Examples where possibility measures naturally appear: fuzzy sets, probabilistic inequalities, likelihood functions, confidence intervals, relative peakedness of probability densities.

2. Statistics with interval data.

a) Representation of ill observed attributes by means of multi-valued mappings.

b) Descriptive statistics: representation of the incomplete information about different parameters like the mean, the median, the variance, etc.

c) Belief-plausibility model associated to the ill-observed attribute. Set-valued parameters: Aumann expectation, Kruse variance, etc.

d) Hypothesis testing: generalized tests as multi-valued mappings, representation of imprecise information about the critical value by means of interval probabilities.

We start with the fundamentals by defining the basic problem of inductive inference. Using the simple example of inference from a series of iid Poisson observations, we introduce the difference and relationships between predictive and parametric inference. We use this example as well to show how we can move from precise probabilistic to imprecise probabilistic inference.

Once the basic ideas are in place, we focus on the structural assumptions underlying all inference models. We use the practically omnipresent exchangeability assumption as the main example. We show the impact of this assumption on inference models for categorical sampling.

We then move towards the most widely used imprecise probabilistic inference models: the predictive IDMM and its parametric sibling, the IDM. This is done by constructing simple but enlightening special cases of these models starting from the exchangeability assumption, which are then generalized.

Once the IDM and IDMM are in place, we have a look at the other properties these models have, i.e., what other structural assessments they satisfy: representation invariance and specificity. To illustrate that not all these structural assessments are equally compelling, we compare the IDMM with other models found in the literature.

Throughout the course, the students have to make small exercises to push them to come to grips with the concepts introduced and experience that using them is within their reach.

Probabilistic graphical models like Bayesian networks are important tools in AI for reasoning with uncertainty and data mining. The quantification of a Bayesian network requires sharp (i.e., precise) assessments of the model probabilities. This seems to be problematic when coping with qualitative expert knowledge or when learning probabilities from few or incomplete data. To cope with these issues, credal networks [C2000] have been proposed as a generalization of Bayesian networks based on imprecise probabilities: credal sets instead of single distributions should be assessed, thus providing higher expressiveness and realism during the modeling phase.

We describe a systematic approach to the implementation of knowledge-based expert systems based on credal networks [PAZ2010]. This technique is demonstrated, together with specific software tools, by means of a real-world application to military decision making developed during the last NATO Multinational Experiment and based on credal networks [AHZLCL2013]. High-level information fusion and decision making based on these models is also discussed.

We consider indeed the problem of learning credal networks from data and the corresponding classification algorithms, called credal classifiers, which generalize the traditional Bayesian classifiers that are based on a single prior density and on a single likelihood. Credal classifiers are instead based on (i) a set of priors, thus removing the need for subjectively choosing a prior and (ii) possibly also on a set of likelihoods, to allow robust classification even with missing data. Credal classifiers can return more classes if the assignment to a single class is too uncertain; in this way, they preserve reliability (e.g., [CZ2008] and [CA2012]). Algorithms for credal classification and comparison with traditional classifiers on a large number of data sets are presented. Also the problem of evaluating performance of a classifier possibly returning multiple outputs [ZCM2012] and alternative quantification techniques are discussed.

This short course has a simple goal: to present algorithms and methods that let one manipulate imprecision in probability values in a simple and efficient manner. Imprecision can be variously encoded through different formalisms, but we focus on a simple representation using sets of probability measures, and we bring into the discussion the relations with other representations as they become needed [Walley1991]. The main tools are based on linear and non-linear programming, which are manipulated for the treatment of credal sets and the computation of expectations [AdCHZ2013] [dCC2004]. We resort to established tools from mathematical programming and optimization as much as possible [PS1998]. They are explained in enough detail so as no prior knowledge of those topics is required.

The concepts of stochastic independence and stochastic conditional independence are central to probability theory: for instance, they are used to build large distributions out of marginal and conditional distributions [P1988]. One such case usually appears under the name of graphical models, or more especially as the well-known Bayesian networks, which have been spread out in a wide variety of areas from Statistics, Engineering and Computer Science to Human and Biological Sciences. Among their goals are the simplification of modeling and the efficiency of computations. We explore different concepts of independence that have emerged for imprecise probabilities and can be used as basis to build graphical models, even if we keep the attention on computational ideas and algorithms.

Finally we consider theoretical and practical aspects of algorithms and approximation methods related to imprecise probability models [MdCZ2012]. The goal is to clarify their computational complexity in a simplified language and to discuss on practical applications of algorithms for imprecise probability in other areas. Many computational challenges that are faced by imprecise methods have appeared elsewhere or can be used there. The goal is to identify some of these possibilities, study their relations and applicability. Examples exist in decision making and planning, learning with censored/missing data, treatment of logical or qualitative assessments, among others. As before, the presentation is self-contained and no knowledge of computational complexity is needed.

Since Abraham Wald [W1939], decisions have been at the center of classical statistical inference. As we will see, they also play a central role in the interpretation of imprecise probability [SSK1995], and in the practice of inference under uncertainty when using imprecise probabilities [Walley1991].

In this course,

1. we investigate the decision theoretic foundation to imprecise probability and its link to standard Bayesian decision theory ([Walley1991] Sections 3.7-3.9);

2. we critically review some fundamental problems encountered when applying imprecise probability theory to decision making [HT2009] [HT2011] ;

3. we discuss the most popular decision criteria, and why you might use them or not [T2006]; and finally,

4. we demonstrate some algorithms that can be used to solve decision problems with imprecise probability in practice [HT2008] [HT2012] [KCC2005] [KCF2011] [T2007] [THF2010] [UA2005]

To achieve these goals, we will start with a simple example to highlight the typical issues one encounters when trying to making decisions under severe uncertainty.

In earlier lectures, you will have seen how lower and upper expectations are useful as models for severe uncertainty, applicable for instance when only partial probability specifications are available, and when we are worried about the consequences of implicit assumptions not reflected by data or expert opinion. We explore how such lower and upper expectations naturally arise in decision theory, simply by allowing for indecision, or incomparability between options.

Next, we discuss a few specific decision criteria, again using our simple example as a starting point: Gamma-maximin, interval dominance, maximality, E-admissibility, and info-gap. For each criterion, we explore why you might use it, how you can calculate it, and how algorithms and techniques seen in earlier lectures can be exploited most effectively for the purpose of decision making.

Finally, we move on to sequential decision making. We briefly discuss how our usual intuition about sequential decisions no longer applies due to subtree perfectness being necessarily violated as soon as we allow for indecision. We then review the basic principles behind backward induction, and we investigate the conditions a decision criterion must satisfy for such algorithms to work. We end with demonstrating backward induction on some practical examples.

The focus of exercises will be on solving actual decision problems.

Current practice with respect to the representation and propagation of uncertainties in the field of environmental risk analysis still relies largely upon the classical probabilistic paradigm. In the latter, uncertain parameters are represented by single probability distributions, albeit in the face of imprecise/incomplete information. Since about 20 years, joint propagation methods have been developed which allow the manner in which information is represented and propagated through the estimation of risk, to be more consistent with the nature of that information. Such methods have been applied to the estimation of risks in a wide variety of areas including water contamination, human health, building stability, earthquake or landslide events, radionuclide migration, etc. Very recently, joint propagation has been applied to uncertainty treatment in LCA (Life Cycle Analysis); i.e., a comprehensive methodology aimed at evaluating the environmental impacts of a product or a service all along its life cycle.

This presentation will describe these methods from the viewpoint of a practitioner of environmental risk assessments. Three specific application areas will be addressed: health risk assessments, Life-Cycle Analysis and Material Flow Analysis. These applications contrast in terms of methodology. While the first two pertain to the joint representation and propagation of stochastic and epistemic uncertainties through a "model", the latter is a problem of data reconciliation under fuzzy constraints. The presentation will touch on the issue of communicating the results of these analyses with decision-makers. Recent proposals for the computation of single estimators of the likelihood of risk will be described. An important message in terms of communication is that, while decision-makers typically expect methods that promote increased ?exactness?, the proposed methods aim rather to promote increased ?consistency? between available information and mathematical representation. Expected benefits are increased reliability and robustness.

Positron Emission computed Tomography (PET) is an important diagnostic imaging modality used in nuclear medicine to visualize and measure in vivo biochemical and physiological processes.

The usual reconstruction methods in PET are based on a discrete representation of both the measured data and the reconstructed image. The choice of the image basis functions before reconstruction (typically voxels basis in 3D) leads to parametric models with a fixed and finite but huge number of parameters. Furthermore, the voxel size acts as a tuning parameter since the greater the voxel, the more regularized the image and the poorer the spatial resolution.

We present an approach for the problem of PET reconstruction in a continuous space. Observations are discrete projections of detected random emission locations whose spatial (3D) or space-time (3D+t) probability distribution has to be estimated. We follow a nonparametric Bayesian approach where regularization of the inverse problem relies entirely on the nonparametric prior. Namely, the spatial activity distribution is viewed as a probability density on $\mathbb{R}^3$. As a prior for this density, we consider a Dirichlet process mixture (DPM) with a Normal-Inverse Wishart as base distribution. A key point of the proposed approach is to deal with the infinite representation of the distribution without resorting to arbitrary truncation of models. We develop a conditional sampler for DPM using a strategy that allows to sample a finite number of parameters at each iteration. The MCMC algorithm is thus able to generate draws from the posterior distribution of the spatial intensity in order to estimate desired functionals and point-wise credible intervals. Finally, we present results for synthetic as for real PET data.

[AdCHZ2013] A. Antonucci, C. P. de Campos, D. Huber, and M. Zaffalon. Approximating Credal Network Inferences by Linear Programming. In Proceedings of the 12th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU), Lecture Notes in Artificial Intelligence 7958, Berlin Heidelberg, pages 13-25, 2013.

[AHZLCL2013] Antonucci, A., Huber, D., Zaffalon, M., Luginbuehl, P., Chapman, I., Ladouceur, R. (2013). CREDO: a military decision-support system based on credal networks. In Proceedings of the 16th Conference on Information Fusion (FUSION 2013), pp. 1-8.

[A1965] J. Aumann. Integral of set valued functions. J. Math. Analysis and Aplications 12 (1965) 1-12.

[CZ2008] Corani, G., Zaffalon, M. (2008). Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2. Journal of Machine Learning Research 9, pp. 581-621.

[CA2012] Corani, G., Antonucci, A. (2012). Credal Ensembles of Classifiers . Computational Statistics & Data Analysis.

[CD2013] Inés Couso, Didier Dubois, Statistical reasoning with set-valued information: Ontic vs. epistemic views, International Journal of Approximate Reasoning, In Press, Corrected Proof, Available online 29 July 2013

[CS2008] Inés Couso, Luciano Sánchez, Defuzzification of fuzzy p-values, In: Soft Methods for Handling Variability and Imprecision, D. Dubois et al. (Eds.), Springer, 2008.

[CS2011] Inés Couso, Luciano Sáhchez, Mark-recapture techniques in statistical tests for imprecise data, International Journal of Approximate Reasoning 52, 240-260, 2011.

[Cozman] Fabio Cozman. Introduction to the Theory of Sets of Probabilities. http://www.cs.cmu.edu/~qbayes/Tutorial/ Last visit: 13/11/2013

[C2000] Fabio Cozman. Credal networks, Artificial Intelligence Journal, vol. 120, pp. 199-233, 2000.

[dCC2004] C. P. de Campos and F. G. Cozman. Inference in credal networks using multilinear programming. In Second Starting AI Researcher Symposium, Valencia, pages 50-61, 2004. IOS Press.

[D1967] A.P. Dempster. Upper and lower Probabilities Induced by a Multi-valued Mapping. Ann. Math. Statistics 38 (1967) 325-339.

[DP1988] Dubois D. and Prade H., 1988. Possibility Theory, Plenum Press, New York.

[DP2009] Didier Dubois, Henri Prade. Formal representations of uncertainty. Decision-making Process- Concepts and Methods. Denis Bouyssou, Didier Dubois, Marc Pirlot, Henri Prade (Eds.), ISTE London & Wiley, Chap. 3, p. 85-156, 2009.

[DMH2005] Thierry Denoeux, Marie-Hélène Masson, Pierre-Alexandre Hébert, Nonparametric rank-based statistics and significance tests for fuzzy data, Fuzzy Sets and Systems 153 (2005) 1?28.

[DDC2008] S. Destercke D.Dubois, E. Chojnacki, Unifying practical uncertainty representations ? Part I: Generalized p-boxes. Part II: Clouds. International Journal of Approximate Reasoning, 49, 2008, 649-677.

[FKHOG2007] S. Ferson, V. Kreinovich, J. Hajagos, W. Oberkampf, L. Ginzburg. Experimental Uncertainty Estimation and Statistics for Data Having Interval Uncertainty, Technical Report. SAND2007-0939, Unlimited Release (2007).

[HT2008] Nathan Huntley and Matthias C. M. Troffaes. An efficient normal form solution to decision trees with lower previsions. In Didier Dubois, M. Asunción Lubiano, Henri Prade, María Ángeles Gil, Przemyslaw Grzegorzewski, and Olgierd Hryniewicz, editors, Soft Methods for Handling Variability and Imprecision, Advances in Soft Computing, 419-426. Springer, Sep 2008.

[HT2009] Nathan Huntley and Matthias C. M. Troffaes. Characterizing factuality in normal form sequential decision making. In Thomas Augustin, Frank P. A. Coolen, Serafín Moral, and Matthias C. M. Troffaes, editors, ISIPTA'09: Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications, 239-248. Durham, UK, July 2009.

[HT2011] Nathan Huntley and Matthias C. M. Troffaes. Subtree perfectness, backward induction, and normal-extensive form equivalence for single agent sequential decision making under arbitrary choice functions. ArXiv:1109.3607v1 [math.ST], September 2011. arXiv:1109.3607.

[HT2012] Nathan Huntley and Matthias C. M. Troffaes. Normal form backward induction for decision trees with coherent lower previsions. Annals of Operations Research, 195(1):111-134, May 2012. arXiv:1104.0191, doi:10.1007/s10479-011-0968-2.

[KCC2005] Daniel Kikuti, Fabio G. Cozman, and Cassio P. de Campos. Partially ordered preferences in decision trees: computing strategies with imprecision in probabilities. In Ronen Brafman and Ulrich Junker, editors, Multidisciplinary IJCAI'05 Workshop on Advances in Preference Handling, 118-123. 2005.

[KCF2011] Daniel Kikuti, Fabio Gagliardi Cozman, and Ricardo Shirota Filho. Sequential decision making with partially ordered preferences. Artificial Intelligence, 175(7-8):1346-1365, 2011. doi:10.1016/j.artint.2010.11.017.

[K1987] R. Kruse. On the variance of random sets. Journal of Mathematical Analysis and Applications 122 (1987) 469-473.

[MdCZ2012] D. D. Maua, C. P. de Campos, M. Zaffalon. Updating Credal Networks is Approximable in Polynomial Time. International Journal of Approximate Reasoning, 53(8):1183-1199, 2012.

[M2011] Gilles Mauris Possibility distributions: A unified representation of usual direct-probability-based parameter estimation methods, International Journal of Approximate Reasoning, Volume 52, Issue 9, December 2011, Pages 1232-1242

[M2008] Enrique Miranda. A survey of the theory of coherent lower previsions. International Journal of Approximate Reasoning, 48(2):628-658, 2008.

[PS1998] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, 1998, Dover Books.

[PAZ2010] Piatti, A., Antonucci, A., Zaffalon, M. (2010). Building knowledge-based expert systems by credal networks: a tutorial. In Baswell, A.R. (Ed), Advances in Mathematics Research 11, Nova Science Publishers, New York.

[SSK1995] Teddy Seidenfeld, Mark J. Schervish, and Jay B. Kadane. A representation of partially ordered preferences. The Annals of Statistics, 23:2168-2217, 1995.

[S1976] Shafer G. (1976). A Mathematical Theory of Evidence, Princeton University Press, Princeton.

[SK1994] Smets P., Kennes R. (1994) The transferable belief model. Artificial Intelligence, 66, 191-234.

[T2007] Matthias C. M. Troffaes. Decision making under uncertainty using imprecise probabilities. International Journal of Approximate Reasoning, 45(1):17-29, May 2007. doi:10.1016/j.ijar.2006.06.001.

[THF2010] Matthias C. M. Troffaes, Nathan Huntley, and Ricardo Shirota Filho. Sequential decision processes under act-state independence with arbitrary choice functions. In Eyke Hüllermeier, Rudolf Kruse, and Frank Hoffmann, editors, Information Processing and Management of Uncertainty in Knowledge-Based Systems, volume 80 of Communications in Computer and Information Science, 98-107. Heidelberg, 2010. Springer.

[UA2005] Lev V. Utkin and Thomas Augustin. Powerful algorithms for decision making under partial prior information and general ambiguity attitudes. In Fabio G. Cozman, Robert Nau, and Teddy Seidenfeld, editors, Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications, 349-358. July 2005.

[W1939] Abraham Wald. Contributions to the theory of statistical estimation and testing hypotheses. The Annals of Mathematical Statistics, 10(4):299-326, December 1939. doi:10.1214/aoms/1177732144.

[Walley1991] Peter Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991.

[W1996] Peter Walley. Measures of uncertainty in expert systems. Artificial intelligence, 83(1):1-58, 1996.

[ZCM2012] Zaffalon, M., Corani, G., Mauá, D.D. (2012). Evaluating credal classifiers by utility-discounted predictive accuracy. International Journal of Approximate Reasoning 53(8), pp. 1282-1301.

Webmaster: Kevin Loquin <kevin.loquin@gmail.com>
Last Update: January 13th, 2014