An implementation of the well known Apriori algorithm for the data mining step. It works on a sample read from the database. The sample size is given by the parameter SampleSize.
The input format is fixed. There is one input concept (TheInputConcept) having a BaseAttribute for the customer ID (parameter: CustID), one for the transaction ID (TransID), and one for an item part of this customer/transaction's itemset (Item). The algorithm expects all entries of these BaseAttributes to be integers. No null values are allowed.
It then finds all frequent (parameter: MinSupport) rules with at least the specified confidence (parameter: MinConfidence). Please keep in mind that these settings (especially the minimal support) are applied to a sample!
The output is specified by three parameters. TheOutputConcept is the
concept the output table is attached to. It has two BaseAttributes,
PremiseBA for the premises of rules and ConclusionBA for the conclusions.
Each entry for one of these attributes contains a set of whitespace-separated
item IDs (integers).
ParameterName | ObjType | Type | Remarks |
TheInputConcept | CON | IN | inherited |
CustID | BA | IN | customer id (integer, not NULL) |
TransID | BA | IN | transaction id (integer, not NULL) |
Item | BA | IN | item id (integer, not NULL) |
MinSupport | V | IN | minimal support (integer) |
MinConfidence | V | IN | minimal confidence (in [0,1]) |
SampleSize | V | IN | the size of the sample to be used |
PremiseBA | BA | OUT | premises of rules |
ConclusionBA | BA | OUT | conclusions of rules |
TheOutputConcept | CON | OUT | inherited |