I am a member of the GRAPPA Team of the university of Lille. GRAPPA Is a research team focused on machine learning.

Part of Grappa has formed with part of STC Team (focused on specification, checking and modeling) the MOSTRARE Project with INRIA - Lille Nord Europe. The aim of this project is to study Information Extraction on the Web.

My research Project

In the framework of the Mostrare Project, my actual researches project consist in designing formal tools for information extraction (mainly around tree automata) and study their learnability. This has lead to design of tools that helps user of a web navigator to build easily and without any prior knowledge queries.

For instance, a web site that compare prices of article from other web sites needs to regularly query a lot of those web sites for this information (couples of Article name / Price). A program can always be written to do this task automatically but it needs expertise and takes time (especially considering that websites change their layout regularly). The solution we propose is that, through an easy to use interface, the user go to some web pages of the target web site and select the information he wants (some couple article/ price). Then a system infers from that a program that automatically selects all the couples article / price from all web pages of the web site.

Mostrare implements a prototype that performs this task. Several technics are studied. I personally chosed to study how tree automata technics can be used for this task. This is the research topic of Jerôme Champavère in his thesis, and before, the one of Julien Carme.

Also, as a continuation of this work, I study automatic inference of techniques of transformation of semi-structured document (XML or HTML for examples). This is done using tree transducers.

