Previous: Comparing Performance for Different Color Spaces Up: From Color Space to Color Names

A Sketch of a Developmental Model

Learning might very well be a domain where angels fear to tread, so my brief remarks on the subject in this section should not be taken as having any relevance for, or being encumbered by any knowledge of, the subject at large.

The fitting of the category model to the color naming data as discussed in Section might be considered as an optimization problem, or perhaps a problem of parameter setting in a prior structure, to use the terminology of [Brown 1994], but probably not as a learning approach. Another distinction that [Brown 1994] makes is the one between experience-expectant and experience-dependent processes. The former would depend on species-specific experience that might have (had) evolutionary importance, and might be described as ``biased learning'' or development, while the latter would depend on individual-specific experience, and would be closer to what is generally understood by learning. Biased learning is also related to Edelman's Theory of Neuronal Group Selection [G. Edelman 1989][G. Edelman 1992], in which the concept of selection plays an important role, as opposed to recognition. In a nutshell, the idea is that categorization (and, by extension, cognition) may be a matter of the selection of certain neuronal groups by certain stimuli, rather than the recognition of a stimulus by a general categorization mechanism. The connection I see between biased learning and selection is that one might consider each of the neuronal groups to ``implement'' a different bias (or ``attractor'') for the feature/parameter space over which development and/or learning takes place.

Bringing the discussion back to color categorization, one might consider the foci of the Basic Color categories to represent biases or attractors of this kind. Indeed, Berlin and Kay suggest that the 11 basic color categories represent a universal inventory out of which particular languages choose to lexicalize some number up to 11, presumably dependent on their environment and needs (Section ). I have not been able to relate the location of the foci to any particular neurophysiological phenomena, but let's assume that they are indeed universal, for the purpose of the discussion. Given that we know the locations of the foci in (a particular) color space, a simple developmental or experience-expectant algorithm for determining both the extent and the labels (names) of the categories might go like this:

  1. Given: a stimulus, represented as a point in color space
    optional: a label , represented as a symbol or a string

  2. collect the membership values of for every category in the set of known categories:

  3. select the best candidate category:

  4. if necessary, adjust the width parameter of such that , without changing the categorization of any other stimulus not belonging to the null category

  5. if a label was provided :
    1. if no label is associated with , associate with it:
    2. if a label is associated with , and , do nothing
    3. if a label is associated with , and , shake.

Step 4 could be implemented by keeping a list of examples of each category on hand, and doing a minimization as described in equation , with the appropriately chosen other-category representatives, or such a list could be generated as needed each time around by selecting for each other category , the point that is closest to the focus of the best candidate category , such that . This list has to be recomputed every time because the may change in the course of development. Step 5.3 is a somewhat complicated case, which could either be due to contradictory input, to a problem with the category model itself, or to a previously overgeneralized category. In any of these circumstances, the system has to be ``shaken up'' or relaxed into a new maximally consistent state, but I have not considered any algorithms for doing this. The sketched algorithm also needs the associated with each category to be initialized to some default value, which has an effect on the categorization behavior in its initial stages: choosing the initial to be small will result in categories that are the smallest possible to contain all examples ``seen'' to date, while choosing it to be large will result in the largest possible categories that do not conflict with any examples seen to date. These two possibilities may converge to the same state eventually after seeing sufficiently many examples, but I have not investigated that.

lammens@cs.buffalo.edu