next_inactive up previous


The YALE GUI Manual

Simon Fischer

Ingo Mierswa

Abstract:

This document describes the usage of the graphical user interface (GUI) of the data mining software package YALE. Since the interface was intended to ease the usage of YALE avoiding handling of XML configuration files, there should be almost nothing to explain. Hence this document describes the very basic ideas of YALE and their representations in the GUI. We suggest that you first make the online tutorial and read this document. After that, you will find it much easier to read the YALE Tutorial and skip a lot of its contents.


Contents

General Information

Although the YALE tutorial and this GUI manual contain a huge amount of information about YALE and all of its parts, it is often very convenient to get the desired information during work. Therefore, we added an online help function to almost all parts of YALE. Each parameter, operator and GUI item displays information as tool tip text which appears after holding the mouse cursor a few moments above the object at hand.

Setting up an experiment

When YALE starts up it presents you a welcome screen that lets you choose between five possibilities

The online tutorial

When you start YALE the first time you should probably make the online tutorial. It explains the main concepts, the GUI usage and shows many of the operators provided by YALE.

The Wizard

We assume that you have chosen to start the Wizard (figure 1). The Wizard is also available from the File menu. The Wizard guides you during the process of creating a new experiment. You start by selecting a template experiment from a list. This template serves as a kind of skeleton for your experiment.

Figure 1: The YALE experiment Wizard is a simple way to create basic experiments.
\begin{figure}\center
\epsfig{file=screenshot_template_valid.eps,width=0.88\linewidth}\end{figure}

Experiments in YALE are made up from a set of nested operators. An operator consumes a set of input objects and produces some output objects. These objects can be data files, models, performance criteria, and more. Simple operators like a learner consumes an example set and produces a model that can be used by an applier for prediction. Moreover, some operators can have inner operators.For Example, a k-fold cross-validation splits up an example set into training set and test set and applies its inner operators which are a learner and an applier. Each time a disjoint test set is used.

If you click on the radio button next to the cross validation template, the structure of the sample experiment is depicted on the right. You see an operator chain consisting of an ExampleSource that reads data from a file. This data is then passed to the cross-validation which itself has inner operators, in this case learner and applier for a support vector machine (SVM). See the YALE Tutorial for more information on SVMs and the individual operators.

Now that you have chosen the template, click on next. In this step you can enter some of the most important parameters (figure 2). In case of a cross-validation this is e.g. the number of validations.

Figure 2: The parameter editing step of the Wizard.
\begin{figure}\center
\epsfig{file=screenshot_template_edit_parameter.eps,width=0.88\linewidth}\end{figure}

A lot of Experiments in YALE need a data file as input which is known in supervised learning as an exampleset. The example sets have to be in a special format and requires that the attributes are described in a seperate XML file. In other words the file in special format is the set of examples where each example is a vector and the XML file describes the semantics of every value in the vector.

In order to create such an XML file now you can press the small Edit button next to an attribute description file property. The dialog popping up is described in section 2.6 and is called Attribute Editor. The files generated with help of the Attribute Editor can also be used as input for the ExampleSource operator, which is the standard data input operator for YALE.

An even more convenient way of loading almost arbitrary data files into YALE and to define attribute description files for your data is to use the Example Source configuration wizard. Just press the Start Configuration Wizard... button at the top of the parameter table of the operator. Configuration wizards are also available for other operators which are hard to define, e.g. for the DatabaseExampleSource operator.

The tree view and other experiment views

As operators can have inner operators and each operator except the root operator is enclosed within another operator the natural representation of an experiment is a tree. If you have used the Wizard, you see your experiment on the left side. If you did not use the Wizard you see an empty experiment consisting only of an empty operator chain. Figure 3 shows this main experiment view which is called ``tree view''.

Figure 3: Main tree view of the experiment editor window.
\begin{figure}\center
\epsfig{file=main_experiment_view.eps,width=0.88\linewidth}\end{figure}

By clicking on the XML tab you see the XML configuration file that describes your experiment (figure 4). If you like you can always edit it by hand using your favorite text editor. For more information about the XML configuration files see the YALE Tutorial. Selecting the Box tab shows a nicer box representation of your experiment you already know from the Wizard. You can use this view for printing. Clicking on Tree brings you back to the tree view.

Figure 4: The YALE XML configuration editor
\begin{figure}\center
\epsfig{file=screenshot_xml_edit.eps,width=0.88\linewidth}\end{figure}

Editing parameters

To the right of the tree you see a table with two columns labeled ``Key'' and ``Value''. Depending on the selected operator you can enter the parameters of this operator. Mandatory parameters are set in bold face. Some of the parameters may have a default value which will be used if the cell is no other value was specified. If the value you entered is out of range it will be corrected automatically. Some parameters accept only numbers, others let you select from a list of values. For file parameters the file name can be entered into the text field or the file selected by means of a file chooser dialog, that pops up when pressing the [...] button. All file names can be defined relatively against the location of the experiment file. Of course this only works after the experiment was saved.

Tip:

keep the mouse cursor a few moments above a parameter. Useful information about the parameter will be displayed as tool tip text.

Creating, deleting, moving and replacing operators

If you started with a blank experiment or you want to modify an experiment created by the Wizard, the thing you probably want to do is create a new operator.

Since this feature seemed most important to us it is accessible from many places: from the Edit menu, the icon bar below the menu bar, the icon bar below the tree, the context menu popping up whenever you right-click on an operator of the tree, or simply by pressing [Ctrl-I]. Figure 5 shows the icon for operator adding.

Figure 5: The new operator icon.
\begin{figure}\center
\epsfig{file=new_operator.eps}\end{figure}

Whichever way you choose to activate this function, it will pop up an operator browser that lets you choose the operator type (select a group or input or output types first if you want to decrease the number of possible selections). This browser displays all available information about the currently selected operator. You should name the operator before you add it into your experiment. You can always rename an operator by pressing [F2] or triple clicking it in the tree view. Figure 6 shows the operator browser.

Figure 6: The Operator Browser for information and operator adding.
\begin{figure}\center
\epsfig{file=operator_selection.eps,width=0.88\linewidth}\end{figure}

Another way to add an operator is to right click on the parent operator and select the operator from the New operator submenu. Adding operators directly from the context menu is a very fast way of experiment design. Both ways of adding are only possible if an operator chain is selected, i.e. an operator which can have children.

Tip:

keep the mouse cursor a few moments above an operator in the submenus. Useful information about the operator will be appear as tool tip text.


You can replace an operator by selecting the new operator from the submenu Replace operator of the context menu. This is similar to adding a new operator from the context menu instead of using the operator browser. When you replace an operator chain the inner operators will remain. Therefore, operator chains containing any children can only be replaced by other operator chains.


Removing the operator is even easier. Just press [Delete] or select the corresponding menu item from the context menu or the Edit menu. Figure 7 shows the icon for operator deleting.

Figure 7: The remove operator icon.
\begin{figure}\center
\epsfig{file=delete_operator.eps}\end{figure}

You can use the arrows below the tree to move an operator up or down within its enclosing operator chain or wrapper. If you want to move an operator from one chain to another, you may cut and paste it.

Operator info

Pressing [F1] brings up a dialog with useful information about the currently selected operator. This option is also available from the context menu of each operator or the Tools menu. This operator info dialog displays:

You can specify HTML comments to this operator in this dialog which are saved in the XML files. If you specify a comment for the root operator of the experiment, this comment is displayed in a dialog each time the experiment was loaded.


The attribute editor

Example sets or instance sets in YALE are described using a separate XML document. This attribute description file contains information about the type of data and its source. Data sets can be distributed over several files. This may be particularly useful if the label is stored within a file of its own. The YALE Tutorial will give help in case you want to edit this file yourself.

The GUI displays a small Edit button next to a attribute description file property (e.g. the parameter attributes of an ExampleSource) in the property editor. A dialog called Attribute Editor will pop up containing a table with one column for each attribute (figure 8). If the property does not yet reference a proper attribute description file, the dialog will be empty. If you want to follow the instructions below which describe how to create the XML description file, you can click on clear, which is in the menue bar under table to start from scratch.

Figure 8: The Attribute Editor dialog for data loading and attribute description file creation.
\begin{figure}\center
\epsfig{file=attribute_editor.eps,width=0.88\linewidth}\end{figure}

Assume, you have a data file containing 50 rows of whitespace or comma separated attribute values, five each row. Click on load data to open that file. After that you should see five columns with some headers each and the data in the table cells. Question marks (``?'') indicate missing values. The following enumeration explains the meanings of the table headers:

  1. The first header contains the source file and column index. This is not editable but just for your information.
  2. The second row indicates, what the data is used for. It can either be an ordinary attribute, a label for classification or regression tasks, or a weight that can be used with certain algorithms. There can be at most one label and one weight attribute.
  3. The third row is the SI unit given as a sequence of of basic unit and exponent. An example: The unit Newton would be specified as kg1m1s-2, because 1N = $ {\frac{{kg\cdot m}}{{s^2}}}$. An exponent of 1 can be omitted. This feature is useful if you want additional features to be generated automatically, e.g. by a FeatureGenerationOperator operator. Units are taken into account to restrict useless attribute combinations, e.g. adding a time and a distance attribute.
  4. The fourth row is the value type. Most interesting are the choices real / integer and nominal. YALE should have automatically detected these correctly.
  5. The last header row is the block type. Most interesting are single_value (default) and value_series. For some experiments value series are treated in a special way. Do not forget to assign value_series_start and value_series_end to the first and last column respectively.
You can change the values according to your needs and load an arbitrary number of data files. Finally click on Save attribute description file, which you can find in the file menue, to write the XML file to disk.

Tip:

If you do not want to use this YALE standard data format, you can use one of the provided special format operators which can read Arff files, comma separated value files (csv), bibtex files, dBase files, C4.5 files, and more.

Tip:

An even more convenient way of loading almost arbitrary data files into YALE and to define attribute description files for your data is to use the Example Source configuration wizard. Just press the Start Configuration Wizard... button at the top of the parameter table of the operator. Configuration wizards are also available for other operators which are hard to define, e.g. for the DatabaseExampleSource operator. The wizards are self explaining, just follow the steps and read the instructions.

Validating your experiment

Before you run your experiment you should validate it. You can click on Validate experiment in order to check if all operators are nested correctly, provided with their necessary input and mandatory properties are set (figure 9). Although this might be useful, you do not need to do it manually, since these checks are performed automatically before an experiment is started.

Figure 9: The validate experiment icon.
\begin{figure}\center
\epsfig{file=validate_button.eps}\end{figure}

The result of the validation including all error messages is printed into the message viewer at the bottom of the main frame. Additionally they are indicated by an exclamation mark next to the operator in the tree view. An example is shown in figure 10. You can display these messages and more information about the operator by pressing [F1] or selecting the Operator info menu item from the context menu or from the Tools menu.

Figure 10: Error Messages in the Message Viewer (lower part) highlighting the errors in the experiment.
\begin{figure}\center
\epsfig{file=error_message.eps,width=0.88\linewidth}\end{figure}

Experiment validation is very important in order to create proper experiments and can help to understand the concepts of Yale. We therefore recommend to use experiment validation as often as possible, at least once before you start your experiment. Together with the breakpoints from the operator context menu it is usually much easier to design new or complex experiments.

Running your experiment

Running your experiment is quite easy. Select Run from the Experiment menu or click the corresponding play button which is shown in figure 11.

Figure 11: The experiment run icon.
\begin{figure}\center
\epsfig{file=play_button.eps}\end{figure}

You may follow the progress of your experiment by observing the output which is displayed in the Message Viewer. Note, that in GUI mode, the output does not need to be written to a log file. If you did not specify a log file you can always save the message viewer contents to a file by selecting the corresponding menu item in the Message Viewer's context menu. You can also perform a search in the Message Viewer. This option is also accessible from the context menu of the viewer.

Tip:

The amount of logging messages can be defined by the parameter log_verbosity of the experiment operator (the root of the experiment tree). This operator also provides other parameters useful for the complete experiment.

Evaluating the results

When your experiment is finished, you will be presented with the results, i.e. all output returned by the outermost operator. This can be performance statistics, a decision tree or anything else. YALE automatically selects the Results tab.

When your experiment was conducted successfully, the view automatically switches to the Results tab. As far as your experiment chain produces an output, this tab shows you a visualization or a text describing the output. Figure 12 shows a decision tree learned from the golf data set.

Figure 12: A decision tree as result of an experiment.
\begin{figure}\center
\epsfig{file=decision_tree.eps,width=0.88\linewidth}\end{figure}

At any time you can stop or pause (and resume) the experiment using the appropriate buttons or menu items in the Experiment menu. In any case the operator currently being executed will finish its execution. Since this might take some time (e.g. if the current operator was a learner) this might lead to a delay for experiment termination. Please be patient.

If you want to observe your experiment closely, you can set breakpoints before and after every operator (via the operator context menu). In that case, each time a breakpoint is reached, intermediate results are presented similar to the dialog popping up at the end of the experiment. You have to change to the Results tab to view the intermediate results. While your experiment is running you can observe the progress on the Monitor tab. You can see a plot of the memory usage and the progress bar.

Plotter

In case you used the ExperimentLogOperator, you also find the produced plots as shown in figure 13. The results can be viewed online during the run of an experiment. On the left hand side you can select the value that is assigned to the x-axis and one or more of the values that should be plotted on the y-axis. The example shows a plot of the squared error depending on the SVM parameter C.

Additionally, the standard scatter plotter can use three dimensions. There are two 3D plot modes integrated in YALE. The first one (3D (color) scatter plot) produces 3D plots, which can be rotated by dragging the mouse over the plot. The second 3D plot mode of YALE is the 2D color plot mode. The first two dimensions build an 2D layer and the third dimension is mapped on different colors. Other plotters exist for scatter plots or histograms. Table 1 gives an overview over all existing plotters.

There exist several objects which can also be plotted, e.g. Example sets have a meta data view, a data view, and a plot view. In plot view some features can be selected to build the dimensions of a plot. Figure 14 gives an example for the plot view of the well known Iris dataset.


Table 1: The most important YALE plotters.
Plotter Description
Scatter A 2D scatter plot which is also capable of plotting lines
  and a third dimension by colorizing the third dimension
Scatter Matrix A matrix plot of scatter plotters
Scatter 3D A 3D scatter plot
Scatter 3D Color A 3D scatter plot colorizing a 4th dimension
Parallel A plotter where each dimension is plotted on parallel
  coordinates
Survey A survey plot plotting sorted histograms on parallel
  coordinates
SOM A Self-Organizing Map plot using a Kohonen net for
  dimensionality reduction
Density A 2D plotter using two dimensions as axis, one dimension
  as density color, and one dimension for point colors
Andrews Curves A parallel coordinates plot after a sort of Fourier
  transform
Histrogram A histogram plot for one of the dimensions
Histrogram Color A histogram plot for one of the dimensions where the
  values are binned according to another (nominal)
  dimension beforehand
Histrogram Matrix A histogram plot for all dimensions
Histrogram Color Matrix A color histogram plot for all dimensions binned
  by one (nominal) dimension beforehand
Quartile A quartile plot (aka box plot) for one of the dimensions
Quartile Color A quartile plot for one of the dimensions where the
  values are binned according to another (nominal)
  dimension beforehand
Quartile Color Matrix A color quartile plot for all dimensions binned
  by one (nominal) dimension beforehand
Bars 2D A 2D bars plot, i.e. a 2D scatter plot where all points
  are connected to the x-axis
Bars 3D A 3D bars plot, i.e. a 3D scatter plot where all points
  are connected to the x-y-plane
Box 2D A 2D scatter plot where another dimension can define a
  box around the points (often used for variance)
Box 3D A 3D scatter plot where another dimension can define a
  box around the points (often used for variance)
RadViz A plotter with radial visualisation anchors
GridViz A plotter with grid visualisation anchors
Surface 3D A 3D surface plot (only available for less data points)
Hinton A Hinton plot for one of the dimensions
Bound A Bound plot for one of the dimensions


Zooming

Drag a rectangle to zoom into the selected range. Right click sets the view back to maximal range. This type of zooming is only supported for the 2D scatter plots. For some of the other plotters zooming might be implemented during a slider in the options pane on the left or even by simply turning the mouse wheel.

Figure 13: A plot of a parameter optimization run.
\begin{figure}\center
\epsfig{file=screenshot_plot.eps,width=0.88\linewidth}\end{figure}

Figure 14: A plot of the well known Iris example set
\begin{figure}\center
\epsfig{file=plot_iris.eps,width=0.88\linewidth}\end{figure}

Settings

You can open a dialog for global settings from the File menu. These settings specify the behavior of YALE for different tasks, e.g. if the system should beep at the end of experiment or how many examples are displayed in the result viewer pane. The settings dialog allows to set all global properties described in the YALE tutorial, including the path to the executables of external programs. You can apply the changed settings only for the current session or save them for future sessions.


next_inactive up previous
yale-team@lists.sourceforge.net