Previous: Locating the Berlin and Kay Color Stimuli in the Color Space Up: From Color Space to Color Names Next: Theoretical Evaluation of the Category Model

Fitting the Model to the Data

After choosing a category model and quantifying the data set, we now need to fit the model to the data. I will describe a one-pass procedure to accomplish this, but in principle it is also possible to ``learn'' the fit incrementally, using one example at a time.

The problem is complicated by having a continuous-valued model (the normalized Gaussian) but discrete data (member of the category or not, focal example or not). We need to choose some threshold value to be associated with the category boundaries, and another value to be associated with the foci. Given our model it is natural to choose a value of unity for the focus, and for the threshold value I have arbitrarily chosen . After some experimentation, I have obtained the best results with a fitting procedure that minimizes the following quantity (independently computed for each category):

which is of course a sum squared error criterion. In equation , E is the total sum squared error for a particular category, composed of three separate terms. The first term quantifies the error of fit with respect to the vertices of the bounding polygon, with representing the number of vertices, the normalized Gaussian function from equation , a boundary vertex vector, a linear scaling vector for the initial focus of the normalized Gaussian, the ``width'' parameter of the normalized Gaussian, and the threshold value for category membership, in our case equaling . The second term quantifies the error of fit with respect to the focal stimuli, with representing a fixed weight for this error term, set to by trial-and-error, representing the number of focal stimuli (indicated by black dots in Figure ), a focal stimulus vector, and the remaining symbols as in the first term. The third term quantifies the error of fit with respect to other-category representatives, with representing the number of other-category representatives, a threshold function (see below), representing an other-category representative, and the other symbols as before. The other-category representatives are collected by selecting from the boundary or focus stimuli of each other category those that are closest (in terms of Euclidean distance) to the initial focus of the current category (see below), for each dimension separately. The union of the sets of such stimuli over all dimensions becomes the set of other-category representatives. It is necessary to select the nearest points in each dimension separately, because the dimensions may be scaled independently in a subsequent step. The threshold function T is defined as

The effect of using T in the third term is to create a horizon for other-category representatives: as long as their function value for the current category's model function is below some value , it is not counted at all. I have used . The minimization of is carried out with respect to and (each component independently) by Mathematica's standard FindMinimum function, which finds local minima in a function by following the path of steepest descent from any point it reaches. Loosely speaking, this amounts to moving the focus of the model function (the normalized Gaussian) around in the color space so as to bring the boundary stimuli's membership values as close as possible to , the focus stimuli's membership values as close as possible to 1, and the membership values for any other category's representatives below the horizon value .

The use of other-category representatives in the error function may cause the category boundaries of a category to shift to some extent as a function of the distance, or presence or absence, of other categories nearby. There is some evidence in Berlin and Kay's work that the extent of categories may indeed be influenced by the total number of basic categories, and hence by their mutual distance, but I have not examined this any further.

Since the minima found by the FindMinimum function are local, the initial values for the variables and are quite important. For the initial value of I have used the identity scaling vector , which combined with a value of given by

(with representing the number of dimensions of the space, and the -th coordinate of focal stimulus ) places the initial model focus at the center of gravity of the focal stimuli, and hence on or near the OCS surface. In order to determine an initial value for , we find the same-category stimulus (focal or boundary, but typically boundary) that is farthest away from as just defined (using Euclidean distance in the color space), and then determine such that . Since the value of the normalized Gaussian is a monotonic function of distance to the focus, and is the point in the set farthest away from the focus, any value of larger than (i.e. a ``wider'' curve) that will result in a monotonically increasing square error over the complete set (for a constant focus), and values smaller than that will lead to the nearest local minimum. Even if this minimum is not the global one, this procedure will ensure it is the ``same'' minimum for all categories. In principle the error landscape as a function of can have up to local minima for points over which the error is computed. In practice the minimum found in this way seems to be usually the global one, or at least very close to it. Figure illustrates this for the 1-dimensional case.

After determining the values of and through error minimization, a second minimization step is applied, which involves distorting the Euclidean nature of the space. The first step used a Euclidean distance metric, resulting in spheroid contours of equal membership value. The second step allows the dimensions of the space to be scaled individually for each category model, which can effectively results in an non-spheroid contour of equal membership value (cf. the discussion of Shepard's model in section ). The procedure for this scaling involves an other error minimization comparable to the first, with the objective function defined by

which is basically the same as equation , but having both data points and category foci scaled by the vector , and with where is the result of the first minimization step. The minimization is carried out as before by Mathematica's FindMinimum function, with initial values after the first minimization step, and . Informally speaking, this amounts to stretching (or compressing) the dimensions of each category to make it fit the data better. Alternatively, one can think of it as stretching the dimensions of the color space to make it fit the categories better, but locally only. The obtained scaling vectors for the XYZ, L*a*b*, and NPP spaces are shown in Figure .

One way to interpret this figure is that the closer the vectors cluster to the main diagonal of the unit cube, the more Euclidean the space is (or the categories are), since that corresponds to spheroid surfaces of equal membership value. There are clear differences between the spaces in this respect, but there does not seem to be one ``best'' space in this respect.

The result of fitting the category models to Berlin and Kay's data for (American) English, using the two-step method described above, is shown in Figures to . These figures show the locations of the category centers (the values of for each category model) only; the boundaries will be discussed below.

It is interesting to note that some of the foci lie outside of the OCS volume, which means that they do not correspond to physically realizable (and perceivable?) surface colors. Technically, this is the result of allowing the foci to ``float'' in the first part of the fitting procedure. While it is possible to constrain the foci to lie on or below the OCS surface, that results in a considerably poorer fit, and degraded performance for the algorithms using the category models, to be discussed in Chapter . I am not sure what might the the neurophysiological correlate of these ``virtual foci'', if any.

lammens@cs.buffalo.edu