Research

Thesis topic

I have defended my PhD thesis in September 2005, in the University of Lille 3.

Its topic was "Tree structure inference applied to information extraction". My supervisor was Rémi Gilleron, and I was working with some of the researchers of the INRIA Mostrare team, especially Joachim Niehren, Aurélien Lemay, Alain Terlutte and Marc Tommasi.

Summary

The theoretical part of my work concerns the study of monadic queries in semi-structured documents. By monadic queries, we mean selection of relevant nodes in trees, that is selection of relevant elements in documents. This study follows two main issues:

This second point leads to the practical part of my work: the conception of an information extraction tool in web pages or XML documents. A protoype of this tool should be available soon, as an extension of the web browser Mozilla.

Further informations can be found on the Mostrare website

Publications

The list of my publications is available in the Grappa publication database.

Software

SQuiRReL a prototype of firefox extension for information extraction, is now available.

Data Sets

In order to evaluate the efficiency of our algorithms, I built some page sets from some famous web sites. They are available here.