I am currently a researcher at Orange Labs (France). My fields of research lie in Machine Learning and Data Mining. I’m particularly interested in the following topics:
In practice, the labeling of training examples is a costly task in many supervised classification problems. Active Learning strategies handle partially labelled datasets, and iteratively select examples to be labelled in order to improve the classifier. The main challenge of Active Learning is to find a compromise between the “exploration” which allows to discover unknown parts of the underlying pattern of data, and the “exploitation” which helps to refine the classifier on already known parts of the pattern.
The paradigm of Data Streams aims at processing very large amount of data. In a data stream, the order of the emitted tuples and their emission rate are not controlled. The objective of these approaches is to maximize the quality of an analysis, given the constraints imposed by the data stream. The processing of data streams is generally based on online algorithms which avoid the exhaustive storage of data. I am particularly interested in adapting Machine Learning approaches to the treatment of time series which are emitted within data streams.
MODL approaches are based on a Bayesian formalism and a MDL modeling (Minimum Description Length). These approaches are able to solve a wide variety of learning problems : supervised classification, regression, multivariate density estimation, co-clustering. This field of research provides generic and statistically reliable approaches. During my PhD, I had the opportunity to study these approaches and put them into practice. Today, I’m still working in cooperation with Marc Boullé, the inventor of these methods, on several problematics : i) compression of time series by symbolic representations; ii) change detection in data streams; iii) clustering of functional data; iv) hierarchical coclustering approaches … etc.