main page

The Mining Mart approach

 
 
 
 
   
 
   
 
   

The first component are operators that perform data transformations such as, e.g., discretization, handling null values, aggregation of attributes into a new one, or collecting sequences from time-stamped data. The operators directly access the database and are capable of handling large masses of data. Machine learning is not restricted to a data mining step, but is also applicable in preprocessing. This view offers a variety of learning tasks that are not as well investigated as is learning classifiers. For instance, an important task is to acquire events and their duration (i.e. a time interval) on the basis of time series (i.e. measurements at time points).

See the available operators with some technical descriptions.

The second component are successful cases of knowledge discovery. Since most of the time is used to find chains of operator applications that lead to good answers to complex questions, it is cumbersome to develop such chains over and over again for very similar discovery tasks and data. Currently, even the same task on data of the same format is implemented anew every time new data are to be analysed. Therefore, the re-use of successful cases would speed up the process considerably. Cases of successful preprocessing are stored for their re-use.

Metadata of cases can be adapted to similar cases. A library of best-practice cases in the form of their meta-data is currently being collected. MiningMart presents cases from areas ranging from on-line monitoring in intensive care to direct mailing actions.

The particular approach of the MininjgMart project is to allow the re-use of cases by means of meta-data, also called ontologies. Meta-data describe the data as well as the operator chains. A compiler generates the SQL code according to the meta-data.
Read more
about the advantages of meta-data driven software generation.

MiningMart Architecuture

The MiningMart project has developed a model for meta-data together with its compiler and implements human-computer interfaces that allow database managers and case designers to fill in their application-specific meta-data. The system will support preprocessing and can be used stand-alone or in combination with a toolbox for the data mining step.