Alberto Bartoli - RSS Central

Source code for regex generator is now public

noreply@blogger.com (Alberto Bartoli) — Tue, 28 Apr 2015 15:44:00 +0000

Alberto Bartoli Lab Blog
We have received several requests to make the source code for our regex generator by examples tool public.We were reluctant to do so because we cannot offer any support, but eventually we decided to make it public: http://ift.tt/1z9JvWG it!...The engine is a developement release that implements the algorithms published in our articles:Bartoli, Davanzo, De Lorenzo, Medvet, Automatic Synthesis of Regular Expressions from Examples, IEEE Computer, 2014Bartoli, De Lorenzo, Medvet, Tarlao, Learning Text Patterns using Separate-and-Conquer Genetic Programming, 18th European Conference on Genetic Programming (EuroGP), 2015, Copenhagen (Denmark)More details about the project can be found on http://ift.tt/1GsQwPO hope that you find this code instructive and useful for your research or study activity.If you use our code in your reasearch please cite our work and please share back your enhancements, fixes and modifications....

from Machine Learning Lab - News

Learning multiple patterns for text extraction

noreply@blogger.com (Alberto Bartoli) — Fri, 09 Jan 2015 08:03:00 +0000

Alberto Bartoli Lab Blog
This new year has started very well. A paper describing one of the key improvements to our new regex generator tool has been just accepted at the prestigious 18-th European Conference on Genetic Programming (EuroGP)---despite its European qualification, it is "the" reference forum for GP.The paper is titled "Learning Text Patterns using Separate-and-Conquer Genetic Programming". Essentially, it provides the "ability of discovering automatically whether the text extraction task may be solved by a single pattern, or rather a set of multiple patterns is required"---including, of course, the generation of all such patterns.We obtain this property by implementing a separate-and-conquer approach. Once a candidate pattern provides adequate performance on a subset of the examples, the pattern is inserted into the set of final solutions and the evolutionary search continues on a smaller set of examples including only those not yet solved adequately. As observed in the very deep and insightful observations that we received from the anonymous reviewers---thanks a lot!---this approach can be seen from a different point of view: we actually implemented an ensemble of classifiers, that is, the GP search generates several slightly different classifiers, each tuned on slightly different portions of the training set.Of course, turning this idea into practice is difficult for a number of reasons, including the identification of suitable criteria for identifying the ``adequate'' level of performance that triggers the shrinking of the training set, the termination of the current search and the starting of the next one.The full abstract follows.The problem of extracting knowledge from large volumes of unstructured textual information has become increasingly important. We consider the problem of extracting text slices that adhere to a syntactic pattern and propose an approach capable of generating the desired pattern automatically, from a few annotated examples. Our approach is based on Genetic Programming and generates extraction patterns in the form of regular expressions that may be input to existing engines without any post-processing. Key feature of our proposal is its ability of discovering automatically whether the extraction task may be solved by a single pattern, or rather a set of multiple patterns is required. We obtain this property by means of a separate-and-conquer strategy: once a candidate pattern provides adequate performance on a subset of the examples, the pattern is inserted into the set of final solutions and the evolutionary search continues on a smaller set of examples including only those not yet solved adequately. Our proposal outperforms an earlier state-of-the-art approach on three challenging datasets.BTW, the "state-of-the-art" approach was our previous tool as described in the December 2014 issue of IEEE Computer (IEEE Xplore link, pdf on our publications page).

from Machine Learning Lab - News</<>

New regex generator tool online!

noreply@blogger.com (Alberto Bartoli) — Fri, 19 Dec 2014 14:51:00 +0000

Alberto Bartoli Lab Blog
Our new regex generator is online!Let's summarize briefly what has happened.Epoch 1At ACM GECCO 2012 we presented a paper in which we described our work "Automatic generation of regular expressions from examples with genetic programming". We greatly improved over the existing state-of-the-art, demonstrating a tool capable of synthesizing a regex for text extraction tasks of practical complexity:automatically,based only on examples of the desired behavior,without any external hint about how the target regex should look like.(probably we may claim "for the first time")Epoch 2We continued to work intensively on this topic. We greatly improved our algorithm and had another paper accepted on IEEE Computer. We made this result publicly available as a webapp.This tool generates regular expressions for extracting text snippets and attempts to generalize beyond the provided examples, i.e., it attempts to infer the general pattern that the user has in mind.A side-effect of this work was another tool capable of playing regex golf (see here and here) automatically. Our results were published at ACM GECCO 2013 and we were also a finalist at the annual Human-Competitive awards. The regex golf tool classifies input strings and overfits the examples---i.e., no extraction, no generalization.Epoch 3We kept on working intensively and greatly improved our algorithm further, from many points of view. The new tool made public today is much more powerful than the previous one. That's why we have called it "Regex++".Next month our IEEE Computer paper will come to press: we are now preparing another submission describing the new tool... stay tuned.

from Machine Learning Lab - News</<>

Automatic Generation of Access Control Policies

noreply@blogger.com (Alberto Bartoli) — Tue, 02 Dec 2014 12:41:00 +0000

Alberto Bartoli Lab Blog
New publication of the lab at the (biennal and prestigious) 8-th International Conference on Evolutionary Multiobjective Optimization.---in collaboration with Prof. Elena Ferrari and Prof. Barbara Carminati. In this work we consider a challenging security-related problem: how to generate access control rules expressed in a modern attribute-based access control language automatically, starting from a set of examples in the form of a log of requests to be allowed and of requests to be denied.The interest in attribute-based access control policies is increasingly growing due to their ability to accommodate the complex security requirements of modern computer systems.With this novel paradigm, access control policies consist of attribute expressions which implicitly describe the properties of subjects and protection objects and which must be satisfied for a request to be allowed. Since specifying a policy in this framework may be very complex, approaches for policy mining, i.e., for inferring a specification automatically from examples in the form of logs of authorized and denied requests, have been recently proposed.We solve this problem with an evolutionary approach that is capable of dealing successfully with case studies of realistic complexity. We designed and implemented this approach that exhibits several interesting features:We use a multi-objective optimization strategy tailored to this problem.We designed and implemented a problem representation suitable for evolutionary computation;We designed and implemented several search-optimizing features which have proven to be highly useful in this context: a strategy for learning a policy by learning single rules, each one focused on a subset of requests; a custom initialization of the population; a scheme for diversity promotion and for early termination.This work greatly benefited from our strong experience in automatic generation of regular expressions from examples. It also allowed us to identify new strategies that are extremely useful also for those problems---we will describe them publicly soon, stay tuned.This multi-objective optimization problem is probably more interesting but certainly less funny than other problems that we have considered earlier in the lab (see "Design of Footbal Teams").

from Machine Learning Lab - News</<>

Recommending the right publication venue: we'll show how at ICTAI

noreply@blogger.com (Alberto Bartoli) — Fri, 05 Sep 2014 13:37:00 +0000

Alberto Bartoli Lab Blog
Our work on Publication Venue Recommendation based on Paper Abstract has been accepted for presentation at the IEEE International Conference on Tools with Artificial Intelligence (ICTAI) which will be held at November in Cyprus.In this paper, we propose three methods for proposing a suitable publication venue for an ongoing research work, for which only the title and abstract are available. Two methods are based on Latent Dirichlet Allocation, whereas the best performing one is based on comparing n-grams language profiles.A contribution of our proposal, which we evaluated experimentally on a dataset of 58,000 papers, is that we use only title and abstract. No full-text is needed for obtaining meaningful recommendations, nor authorship or references, as in previous works. Hence, the recommender can be used in the early stages of the authoring process. Moreover, it may greatly simplify the building and maintenance of the knowledge base.

from Machine Learning Lab - News</<>