Steps towards a Design Theory for Virtual Worlds
Steps towards a Design Theory
for Virtual Worlds
Joseph A. Goguen
Department of Computer Science and Engineering
University of California at San Diego
La Jolla CA 92093-0114 USA
phone: +1-858-534-4197 fax: +1-858-534-7029
email: jgoguen@ucsd.edu
Abstract Virtual worlds, construed in a sense broad
enough to include text-based systems, as well as video games, new media,
augmented reality, and user interfaces of all kinds, are increasingly
important in scientific research, entertainment, communication, commerce, and
art. However, we lack scientific theories that can adequately support the
design of such virtual worlds, even in simple cases. Semiotics would seem a
natural source for such theories, but this field lacks the precision needed
for engineering applications, and also fails to addresses interaction and
social issues, both of which are crucial for applications to communication
and collaboration. This paper suggests an approach called "algebraic
semiotics" to help solve these and related problems, by providing precise
application-oriented basic concepts such as sign, representation, and
representation quality, and a calculus of representation that includes
blending. This paper also includes some theory for narrative and metaphor,
and reports case studies on information visualization, proof presentation,
humor, and user interaction.
Keywords: Design, Algebraic Semiotics, Virtual World,
Semiotic Morphism, Blend, User Interface, Metaphor, Information Visualization.
1 Introduction: Motivation,
Difficulties and Approaches
The term "virtual world" is used in many ways, but perhaps virtual worlds can
be broadly characterized as the class of media experiences that provide some
sense of immersion and closure. By immersion, which is sometimes
also called virtuality, we mean a sense of being engaged with
non-physically present entities through material mediation in the immediate
real world, but not with the other aspects of the immediate real world, and by
closure we mean that the virtual world gives an appearance of
relative completeness, although it may of course be changing. A lecture, a
conversation, a movie, a magazine, a formal paper, a video game, a user
interface, can all be virtual worlds in this sense. A major factor in
creating immersion and closure is the coherence of the world; of
course, there are many other factors, relating for example to the situation,
background, and interests of participants, but this paper is focused on ways
to achieve coherent representations.
Given the enormous cultural and economic
importance of current media for communication, entertainment, and art, as well
as the promise of new media, there would be many uses for scientific theories
that could provide guidance for difficult tasks, such as the following:
- designing new media (e.g., virtual reality environments with haptics);
- creating new metaphors (e.g., beyond the desktop for PCs);
- making new hardware (such as wireless applicances) more usable;
- designing new genres (such as interactive poems); and
- supporting non-standard users (e.g., with disabilities).
Because virtual worlds are user interfaces in some broad sense, and
because user interface design is a well-developed area of computer science
(which is also known as human-computer interaction, or HCI, or sometimes
CHI), this would seem a good place to look for appropriate theory. But most
HCI results are either very precise but also highly specialized and therefore
not very useful (e.g., Fitt's law), or else they are very general but of
uncertain reliability and generality (e.g., protocol analysis,
questionnaires, case studies, usability studies).
Another plausible place to seek a theory of
virtual world design would be semiotics, a subject founded by Charles Sanders
Peirce [Peirce, 1965] and Ferdinand de
Saussure [Saussure, 1976] in
the late nineteenth century. Peirce was an American logician concerned with
problems of meaning and reference, who concluded that these are relational
rather than denotational, and who also made an influential distinction among
modes of reference, as symbolic, indexical, or iconic. Saussure, a Swiss
linguist, wanted to understand how features of language relate to meanings,
and he emphasized binary features and denotational meaning. More recent
thinkers like the French literary theorist Roland Barthes [Barthes, 1968] combined and
extended these theories, creating a powerful language for cultural and media
studies, which in various versions has been called semiotics, semiology,
structuralism, and finally post-structuralism. Unfortunately, this tradition:
- does not have the mathematical precision needed to integrate well with
engineering processes;
- does not consider representing signs in one system by signs in another,
as is needed for the study and design of interfaces;
- has not addressed dynamic signs, which are necessary for the study and
design of interaction;
- has not much considered social issues, such as arise in shared worlds;
- tends to ignore the situated, embodied aspects of sign use;
- tends towards a Platonistic view of signs, as actual existing abstract
entities; and
- often considers only single (complex) signs (e.g., a novel or a film),
rather than systems of signs;
Therefore semiotics needs to address some significant problems before it
can meet all our needs. This paper sketches how algebraic semiotics attempts
to bridge this gap. The theory originated in an attempt to understand data
from an early experimental study of multimedia learning [Goguen and Linde, 1984],
and was later elaborated for applications to user interface design; more
complete expositions appear in [Goguen,
1999a; Goguen and Harrell, 2003] and
[Goguen, 2003], though the theory
is still evolving. Here we focus more on motivation and applications.
There are at least two perspectives that one
might take towards the study of signs and representations: pragmatic
and theoretical. The first is the perspective of a designer, who has
a job to get done, often within constraints that include cost, time, and
stylistic guidelines; we may also call this an engineering perspective, and it
will generally involve negotiating trade-offs among various values and
constraints. The second is the perspective of a scientist who seeks to
understand principles of design, and is thus engaged in a process of
constructing and testing theories. From the second perspective, it makes
sense to describe semiotic theories in a detailed formal way, and to test
hypotheses by doing calculations and experiments with users. But from the
pragmatic perspective, it makes sense to formalize only where this adds value
to the design process, e.g., in especially tricky cases, and even then, only
to formalize to the minimum extent that will get the job done. Our experience
is that one can often get considerable benefit from applying principles of
algebraic semiotics, such as identifying and preserving key features of the
source theory, without doing a great deal of formalization.
From either the pragmatic or theoretical perspective, one should seek to model
semiotic theories as simply as possible, since this will simplify later tasks,
whether they are engineering design or scientific theorizing and
experimentation (not forgetting that the conceptual simplicity of a theory
does not necessarily correspond to the simplicity of its expression in any
particular language). However, from a pragmatic perspective, good
representations need not be the simplest possible, for reasons that include
engineering tradeoffs, the difficulty (and inherent ambiguity) of measuring
simplicity, and social and cultural factors, e.g., relating to esthetics.
Similar considerations apply, though to a notably lesser extent, to the
simplicity of semiotic theories, since creating such theories is itself a
design task, subject to various trade-offs. It may be reassuring to be
reminded that in general there is no unique best representation.
Sections 2 and 3 develop some basic theory of algebraic semiotics. Two
main concepts are semiotic theory and semiotic morphism, which generalize the
conceptual spaces and conceptual mappings of Fauconnier and Turner [Fauconnier and
Turner, 1998; Fauconnier and Turner,
2002], by taking account of structure and dynamics. Some measures of
quality and design principles are given, including a trade-off between form
and content. Although similar principles can be found in many places, none
seem to be either as precise or as general as those described here. This
section also discusses metaphor and blending in natural language, and gives
some basics of a calculus of representation. Section 4 describes a number of case studies, including
information visualization, proof presentations, humor, and user interaction;
a discussion of narrative is also included here. Some conclusions, future
research directions, and social implications for virtual worlds are discussed
in Section 5.
Before beginning, it may help to be clear about the philosophical orientation
of this work, because it is very common in Western culture for mathematical
formalisms to claim and be given a status beyond what is warranted. For
example, Euclid wrote, "The laws of nature are but the mathematical thoughts
of God." Similarly, the "situations" in the situation semantics of Barwise
and Perry, which resemble conceptual spaces (but are more sophisticated -
perhaps too sophisticated) are considered to be actually existing,
ideal Platonic entities [Barwise and Perry, 1983]. Somewhat less grandly, one
might consider that conceptual spaces are somehow directly instantiated in the
brain. However, the point of view of this paper is that such formalisms are
constructed in the course of some task, with the heuristic purpose of
facilitating consideration of certain issues in that task, which might be
scientific study or engineering design. Under this non-Platonist view, all
theories are situated social entities, mathematical theories no less than
others; of course, this by no means implies that they cannot be useful.
Acknowledgements
I thank Drs Kirstie Bellman and Christopher Landauer of Aerospace Corp for
inviting me to the 2001 Virtual Worlds and Simulation Conference, and for
their valuable comments. I thank Fox Harrell for many valuable conversations
and insights on topics related to this paper, and for the renderings of
Figures 5, 6 and 7. I also thank students in my UCSD classes CSE 171 and 271
for their feedback and their patience.
2 Algebraic Semiotics
The basic notions of algebraic semantics are sign (or semiotic) system,
semiotic morphism, and representation quality; these are discussed in the
following subsections.
2.1 Signs and Sign Systems
The definition of sign system below incorporates
the insight of Saussure that individual signs should not be studied in
isolation, but rather as elements of systems of related signs [Saussure, 1976], of Peirce that
signs may have parts, subparts, etc., that play different roles [Peirce, 1965], and of Goguen that sign
parts have different saliencies, depending on the roles that they play [Goguen, 1999a].
The structure of a sign system can be described
by an algebraic theory, since they are in particular abstract data types, and
it is well known that these can be defined algebraically [Goguen and Malcolm, 1996]. In addition, signs become what
they are by virtue of attributes that differ from those of other signs, as
shown for example by vowel systems (how the space of possible vowel sounds is
divided into specific vowels for a given dialect of a given language), as
well as by traffic signs, alphabets, numerals, etc. However, these
attributes need not be binary, as was supposed by Saussure and his followers
in the French structuralist movement, such as Levi-Strauss and early Barthes.
Also, the same sign in a different system can have a different meaning, as
illustrated by the way similar characters in different alphabets can take
different meanings, e.g., in the Roman and Cyrillic alphabets, the token "P"
denotes different sounds.
We formalize1 sign system as many sorted loose
algebraic theories with data, plus two additional items that are specifically
semiotic:
Definition 1 A sign system, or
semiotic system or semiotic theory, consists of:
- a signature, which declares sorts, subsorts and operations
(including constructors and selectors);
- a subsignature of data sorts2 and data functions;
- axioms (e.g. equations) as constraints;
- a level ordering on sorts, including a maximum element called
top; and
- a priority ordering on constructors at the same level.
The non-data sorts classify signs and their parts, just as in
grammar, the "parts of speech" classify sentences and their parts. There are
two kinds of operation: constructors build new signs from old signs as
parts, while selectors pull out parts from compound signs. Data sorts
classify a special kind of sign that provides values serving as attributes of
signs. Axioms act as constraints on what count as allowable signs for this
system. Levels indicate the whole/part hierarchy of a sign, with the top
sort being the level of the whole; priorities indicate the relative
significance of subsigns at a given level; social issues play a dominant role
in determining these. The above definition follows [Goguen, 1999a], where the special treatment of data sorts
follows [Goguen and Malcolm, 2000].
The first four items constitute what is called an algebraic theory
when all axioms are equations (e.g., [Goguen and
Malcolm, 1996; Goguen et al.,
1978]); it can be proved that this special case is sufficient for our
needs.
The approach of Definition
1 differs from the more traditional set-based approaches of Gentner [Gentner, 1983], Carroll [Carroll, 1982], etc., in that it
is axiomatic, i.e., it does not present signs as
particular models, but rather, a particular theory expressed in a formal
language describes a space of possible signs, which are models of
the theory, in the sense of that term in logic, providing concrete
interpretations for the things in the theory: sorts are interpreted as
sets; constant symbols are interpreted as elements; constructors are
interpreted as functions, etc.; i.e., the theory is a language for
talking about such models. This approach allows both multiple models
and open structure, both of which are important for
applications. The first point means, for example, that a semiotic space of
books should allow anything having the structure of a book as a model; it
also means that designers and implementers have the freedom to optimize
implementations so long as they respect the constraints of the given axioms.
The second point says that structure can be extended and combined without
violating the specifications, which is not necessarily the case for models.
For example, we might want to extend a basic sign system for books with some
further structure pertaining to a certain series of books from a particular
publisher. In addition, it is also more natural to treat levels and
priorities in axiomatic theories than in set-based models.
For an example of axioms, in formalizing indices
of books, we might well want to impose an axiom requiring that indexed items
must be phrases of 1, 2 or 3 words, but not more. In a sign system for
books, the top level might be occupied by the sort for books, the next level
by author, title, publisher, and the third level by the first and last names
of authors, and by the name and location of the publisher; see Figure 1. Here last name has priority over first name,
and publisher name has priority over publisher location. This is similar to
the nesting structure used in XML documents.
Figure 1: Levels for the Book Semiotic Theory
The following are some further informal examples
of sign systems: dates; times; bibliographies (in one or more fixed format);
tables of contents (e.g., for books, again in fixed formats); a fixed
website, such as the CNN homepage (in some particular instance of its
gradually evolving format); and newspapers (e.g., the New York Times
). Note that each of these has a large space of possible instances, but
a single fixed structure.
There is a basic duality between theories and
models. We have already discussed one aspect: A semiotic theory determines
the class of models that satisfy it, which we call its semiotic space3. The other aspect is
more subtle: a class of models has a unique (up to equivalence) most
restrictive theory whose models include it4. This duality helps to justify our
occasional use of the term "space" when we really mean "theory"; this is
mainly done for consistency of terminology when discussing conceptual blending
theory.
Gilles Fauconnier introduced mental
spaces for studying meaning in natural language from a cognitive point of
view [Fauconnier, 1985].
The abstract mathematical structure of a mental space is a set of atomic
elements together with a set of relation instances among those elements [Goguen, 1999a], and as such is a very special
case of a sign system. Any such representation necessarily omits the
qualitative, experiential aspects of what is represented (these aspects are
often called "qualia"), since formal representations cannot capture meaning
in any human sense. Moreover, mental spaces are not powerful enough for
designing virtual worlds or other applications where structure and
dynamics are important; obvious examples include wikis, web sites,
and music.
The conceptual spaces of Fauconnier and Turner
[Fauconnier and
Turner, 1998; Fauconnier and Turner,
2002] are mental spaces, and hence share their limitations. For example,
conceptual space theory can help us understand concepts about music,
but semiotic spaces and structural blending are needed for an adequate
treatment of the structure of music, e.g., how a melody can be
combined with a sequence of chords. Conceptual spaces are good for talking
about concepts about (e.g., how we talk about) things, but are awkward for
talking about the structure of things. It is also interesting to notice that
greater cultural variation can be found in conceptual blending than in
structural blending, because the former deals with concepts about something,
whereas they latter deals with the structure of its instances and/or its
representations. Mathematically, a conceptual space is a single model,
consisting of items and assertions that certain relations hold of certain of
those items; it is not a theory or a class of models.
Our suggested methods for determining semiotic
spaces are grounded in ideas from sociology, especially ethnomethodology, but
this paper is not the right place to discuss such issues; see [Goguen, 1997; Goguen,
1994], beyond noting that semiosis, which is the creation of
meaning, is always situated and embodied, and in particular always has a
social context. Immersion arises in part through embodiment (even if only
metaphorical embodiment, e.g. in text-based virtual worlds).
2.2 Representations
Mappings between structures became increasingly important in twentieth century
mathematics and its applications; examples include linear transformations (and
their representations as matrices), continuous maps of spaces, differentiable
and analytic functions, group homomorphisms, and much more. Mappings between
sign systems are only now appearing in semiotics, as uniform representations
for signs in a source space by signs in a target space. Since we formalize
sign systems as algebraic theories with additional structure, we should
formalize semiotic morphisms as theory morphisms; however, these must be
partial, because in general, not all of the sorts, constructors, etc. are preserved in real world examples. For example, the semiotic morphism that
produces an outline from a book, omits the sorts and constructors for
paragraphs, sentences, etc., while preserving those for chapters, sections,
etc. In addition to the formal structure of algebraic theories, semiotic
morphisms should also (partially) preserve the priorities and levels of the
source space. The extent to which a morphism preserves the various features
of semiotic theories is important in determining its quality, as the case
studies to follow will show.
The design of virtual worlds and more generally,
of user interfaces, is the art of creating representations, for example,
representing the functionality of an operating system using icons, menus,
buttons, etc., or using haptics and virtual reality. The basic insight is
that a representation is a mapping M : S1 ® S2 of sign systems that
preserves as much as is reasonable. The following formalizes this insight:
Definition 2 A semiotic morphism
M : S1 ®
S2 from a semiotic system S1 to another S2
consists of the following partial mappings:
- from sorts of S2 to sorts of S2, so as to preserve
the subsort relations,
- from operations of S2 to operations of S2, so as to
preserve their source and target sorts,
- from levels of S1 sorts to levels of S2, so as to
preserve the ordering relation, and
- from priorities of S1 constructors to priorities of
S2 constructors, so as to preserve their ordering relations,
so as to strictly preserve all data elements and their functions.
It is not always possible or even desirable
for a semiotic morphism to preserve everything. For example, sometimes we
just want to summarize some dataset, e.g., the table of contents of
a book, in which case much of the structure and information are intentionally
deleted. Another important observation is that not all representations are
equally desirable. For example, one way to parse the sentence "Time flies
like an arrow" in the following "bracket" (or "bracket-with-subscript")
notation, which is widely used in linguistics:
[[time]N[[flies]V[[like]P[[an]Det[arrow]N]NP]PP]VP]S
However, this notation is not very satisfactory
for humans, who would find it easier to discern the syntactic structure by
examining a parse tree, or using the algebraic "constructor" notation given
later. Some criteria for judging the quality of representations are discussed
in Section 2.4 below.
The duality between theories and models means
that there is an inherent ambiguity about the direction of a semiotic
morphism. For example, if B is is a sign system for books and
T is one for tables of contents, then books (which are models of
B) are mapped to their tables of contents, which are models of
T, but this map on models is determined by, and is dual to, the theory
inclusion T ® B, which
expresses the fact that the structure of tables of contents is a substructure
of that of books. In informal discussions we will often take the direction
to be that of the models, which is perhaps more intuitive; however, in formal
discussions, it is much better to use the direction of the underlying theory
morphism, which is opposite to that of the models.
There are at least three "modes" in which one
might consider representations: analytic, synthetic, and conceptual. In the
analytic mode, we are given one or more sign from the representation
(i.e., the target) space, and we seek to reconstruct both the source space
and the representation. In the synthetic mode, we are given the
source space and seek to construct a good representation for the signs in
that space, using some given technology (such as command line, or standard
GUI widgets, or virtual reality) for the target space. In the conceptual
mode, we seek to analyze the metaphorical structure of the
representation, in the style of cognitive linguistics [Turner, 1997; Fauconnier and Turner, 2002]; for example, how is Windows
XP like a desktop, or how is a scrollbar like a scroll? A treatment in this
mode will involve conceptual spaces, in the sense of cognitive linguistics
(see Section 3.1 below). In each mode, particular
aspects of the cultures involved can be very significant.
2.3 Simple Examples
This subsection gives rather informal descriptions of some simple examples of
semiotic theories and semiotic morphisms, to illustrate the concepts, rather
than to demonstrate their applicability to virtual world design, since these
examples could only be considered virtual worlds in a trivial sense. The
following sign systems are considered:
- Lists of (potential) words with punctuation, denoted
SW.
- Parse trees for sentences of a formal grammar G, denoted
ST.
- A printed page format, denoted SP.
Then the following are some interesting morphisms:
- Let P : SW ® ST give parse trees for lists from SW
that are G-sentences.
- Let H : ST ®
SP give bracket representations of parse trees.
- Let F : SW ®
SP give bracket representations of lists from
SW that are G-sentences.
The morphism F is "composed" from P and H, by first doing
P and then doing H; we denote this by P;H, where ";"
denotes the composition operation. Composing morphisms corresponds to
composing representations, which is the essence of iterative design, an
important technique for any complex design task. By Definition 2, a semiotic morphism M has four
component mappings, for sorts, operations, levels, and priorities; let us
denote these M1, M2, M3 and
M4, respectively. Then the composition M;M'
of morphisms M and M can be defined by the formula
(M;M')i = Mi;M'i for i=1,2,3,4.
The sign system SW for
punctuated lists of words can be described roughly as follows: Its sorts
are char, alpha, punc, puncword,
alphaword, word, and list, where the sorts
alpha and punc are subsorts of char, the sorts
alpha, puncword and alphaword are subsorts of
word, and the sort word is a subsort of list.
These subsort relationships are shown in Figure 2.
The sorts char, alpha, and punc are respectively
for character, alphabetic character, and punctuation character; the sorts
puncword, alphaword, and word are respectively for
words consisting of alphabetic characters with a final punctuation sign, words
with all alphabetic characters, and the union of these two.
Figure 2: Sorts and Subsorts for ST
The sort list is level 1, the "top" sort
of this system; word is level 2; alphaword and
puncword are level 3; and char, alpha and
punc are level 4. The punctuation characters are comma and period
(of course we could add more). The following defines concatenation
constructors for constructing a list of alphabetic characters as a
alphaword, a list alphaword followed by a punctuation as a
puncword, and a list of words as a list; a functional
notation is used:
_ _ : alpha alphaword -> alphaword
_ _ : alphaword punc -> puncword
_ _ : word list -> list
Here the two underbars give the syntactic form of the function, which is the
concatenatation of its two arguments, which respectively have the two types
given between the colon and the arrow; the result type then comes after the
arrow. These three operations also satisfy associativity equations, such as
the following:
(" W L L') (W L) L' =
W (L L') .
The context free grammar grammar G given
below allows some lists of punctuated words to be recognized as legal
sentences of that grammar; such sentences can be parsed, which means dividing
them into phrases, which are sublists, each with its "part of speech"
(or syntactic category) explicitly given. The grammar G will become
the signature of the sign system ST. The non-terminals of
G are S, NP, VP, N, Det,
V, PP and P, which stand for sentence, noun phrase,
verb phrase, noun, determiner, verb, prepositional phrase, and preposition,
respectively. Then the rules of a simple example G might be the
following:
S -> NP VP
NP -> N
NP -> Det N
VP -> V
VP -> V PP
PP -> P NP
The non-terminals of this grammar (i.e., the "parts of speech" S,
NP, VP, etc.) are the sorts of the sign system
ST, and the rules of the grammar will become its
constructors. For example, the first rule says that a sentence can be
constructed from an NP and a VP. There should also be some
constants of the various sorts, such as
N -> time
N -> arrow
V -> flies
Det -> an
Det -> the
P -> like
Then, for example, a parse tree for the sentence "time flies like an arrow" is
shown in Figure 3.
Figure 3: Parse Tree for ST
By the way, if we add the productions below to the grammar G, then the
sentence gets another parse, a fact which the reader might enjoy checking.
NP -> N N
V -> like
There is a systematic way to convert context free rules into constructor
operations in the signature of a sign system; for the above grammar G,
it is as follows, written in a functional notation:
sen : NP VP -> S
nnp : N -> NP
np : N Det -> N
vvp : V -> VP
vp : V PP -> VP
pp : P NP -> PP
time : -> N
flies : -> V
........
In this context, it is more elegant to regard N as a subsort of
NP, and V as a subsort of VP, rather than to have
monadic operations N -> NP and V -> VP. This sign
system gives what computer scientists call abstract syntax for
sentences; it gives an abstract algebraic representation for syntactic
structure, in which the operations above generate a free algebra of
terms that describe parses. For example, the term that represents the syntax
of our example sentence is:
sen(time, vp(flies, pp(like, np(an, arrow)))) .
Equations can be used in this algebraic setting to express constraints on
sentences, for example, that the number of the subject and of the verb agree
(i.e., both are singular or else both are plural). Each of the concrete ways
to realize abstract syntax (trees, terms, bracket notation, and lists) can be
considered to give a model of the sign system ST providing a
set of signs for each sort, and operations on those sets which build new signs
from old ones.
The sign system SP should have
sorts for lines and pages, and could also have different fonts and subscripts,
in order to display the bracket notation to display parses. We omit the
details, which are not very different from those above, except for an equation
to limit the length of lines, e.g., to 80 characters, such as the following:
(" L : line)
length(L) £ 80 .
The morphism P : SW
® ST is very partial,
since it is defined on a list l iff l can be parsed using
G; thus the subset of lists on which it is defined is the set of
sentences generated by G, which is usually denoted L(G). If
fr(t) denotes the frontier, or list of leaf nodes, of a parse
tree t, then fr(P(l)) = l for all l Î L(G), which is a strong preservation property,
although it only holds on a small subset of lists of words. The morphism
P also preserves all of the sort hierarchy in Figure 2.
We do not describe the morphism
H : ST ®
SP in as much detail as we did the morphism
P : SW ®
ST. The morphism H is essentially a pretty printer for
parse trees; it could use any of the representations we have been discussing,
and it could even just print the frontier of the parse tree, although this
would preserve much less structure (see the next subsection for more
discussion of structure preservation). As already mentioned, the morphism
F : SW ®
SP is the composition P;H; it pretty prints the parse
trees of those lists of words that can be parsed by G.
2.4 Quality of Representation
It is easy to define sort preserving, constructor preserving, level
preserving, content preserving (where content refers to the values of
selector operations, such as size and color), etc. But this is not as useful
as one might hope, because in practice, these are often not
preserved. Instead, we define the comparative notions of more sort
preserving, more level preserving, more constructor preserving, more content
preserving, etc. [Goguen, 1999a].
These notions define orderings on morphisms, which can be logically combined
to get the right one for a given application [Goguen, 1999a]. This is important because given morphisms
M,M¢, one may preserve more levels, while
the other preserves more content, and similarly for the other concepts.
Empirical work has validated the following general principles:
- It is more important to preserve structure than content (this is called
Principle F/C).
- It is more important to preserve level than priority.
- Structure and content at lower levels should be sacrificed in favor of
those at higher levels.
- Lower level constructors should be sacrificed in favor of higher level
constructors.
The first principle is perhaps the most important, and at first might
seem counter-intuitive, but many special cases can be found in the design
literature, e.g., [Tufte, 1983].
It asserts that when a trade-off is necessary, form should be weighted more
heavily than content; in general, the right balance between form and content
can only be determined after knowing how a representation will actually be
used. Also, we are fortunate that it is easier to describe structure than
content.
These principles do not explain everything, for
example, they do not explain why the tree representation of phrase structure
is better than the bracket representation, since these two representations
have exactly the same structure and content, but display them differently. In
fact, the advantage of the tree representation arises from human
visuo-cognitive capabilities, which prefer a more explicit diagrammatic
representation of phrases and subphrase relations as nodes and edges, over a
linear symbolic representation that requires counting brackets. Preservation
of form and content can respectively be formalized as preservation of
constructors and selectors, in the sense of abstract data type theory [Goguen et al., 1978; Goguen and Malcolm, 1996].
3 Fragments of a Calculus of
Representation
Section 2.3 defined the composition of semiotic
morphisms and showed that it can be important for applications. It is easy to
prove that this definition of composition obeys the following identity and
associative laws, in which A : R ® S, B : S ® T and C : T ® U,
A ; 1S = A
1S ; B = B
A ; (B ; C) = (A ; B) ; C
where 1S denotes the identity
morphism on S. These three laws are perhaps the most fundamental for a
calculus of representation, since they imply that semiotic theories and their
morphisms form what is called a "category" in the relatively new branch of
mathematics called category theory [Mac Lane, 1998]. The basic ingredients of a
category are objects, morphisms, and a composition operation that
satisfies the above three laws, and that is defined on two morphisms if and
only if they have matching source and target. Perhaps surprisingly, many
important mathematical concepts can be defined abstractly in the language of
category theory, without reference to how objects are represented, using only
morphisms and composition; moreover, many general laws can be proved about
such concepts, and these automatically apply to every category.
Three of the simplest categorical concepts are
isomorphism, sum and product. A morphism A : R ® S is an isomorphism if and only if
there is another morphism B : S ® R such that A ;B =
1R and B ;A = 1S, in which case B
is called the inverse of A and denoted A-1; it can be proved that the inverse of a
morphism is unique if it exists. The following laws can also be proved,
assuming that A : R ®
S and B : S ® T
are both isomorphisms (and no longer assuming that B is the inverse of
A).
1R-1 = 1R
(A-1)-1 = A
(A ; B)-1 = B-1 ; A-1
Because sign systems and their morphisms form a
category, these three laws apply to representations. In Section 3.2 below, we discuss sums of semiotic morphisms as a
special case of blends of semiotic morphisms (blends are discussed in the next
subsection), and we also give some laws for blends and sums in Section 3.2. The standard mathematical reference for category
theory is by Sanders Mac Lane [Mac Lane, 1998], but [Pierce, 1991] is one computer science oriented
introduction, among several others.
3.1 Metaphor and Blending
Research in cognitive linguistics by George Lakoff and others under the banner
of "conceptual metaphor theory" (abbreviated "CMT") has greatly deepened our
understanding of metaphor [Lakoff and Johnson, 1980; Lakoff, 1987], showing that many metaphors come in
families, called image schemas, that share a common pattern. One
example is BETTER IS UP, as in "I'm feeling up today," or "He's moving up into
management," or "His goals are higher than that." Some image schemas,
including this one, are grounded in the human body5 and are called basic image
schemas; they tend to yield the most persuasive metaphors, as well as to
enhance the sense of immersion in virtual worlds.
Fauconnier and Turner [Fauconnier and
Turner, 1998; Fauconnier and Turner,
2002] study blending, or conceptual integration, claiming
it is a basic human cognitive operation, invisible and effortless, but
nonetheless fundamental and pervasive, appearing in the construction and
understanding of metaphors, as well as in many other cognitive phenomena,
including grammar and reasoning. Many simple examples are blends of two
words, such as "houseboat," "jazz piano," "roadkill," "artificial life,"
"computer virus," "blend space," and "conceptual space." To explain such
phenomena, blending theory (abbreviated "BT") posits that concepts come in
clusters, called conceptual spaces, which consist of certain items
and certain relations that hold among them. Such spaces are relatively small
constructs, selected on the fly from larger domains, to meet an immediate
need, such as understanding a particular phrase or sentence6. The abstract
mathematical structure of a conceptual space consists of a set of atomic
elements together with a set of relation instances among those elements [Fauconnier, 1985].
Conceptual mappings are partial functions from the item and relation
instances of one conceptual space to those of another, and conceptual
integration networks are networks of conceptual spaces and mappings that
are to be blended together.
Figure 4: A Blend Diagram
We now describe our generalization of blending
from conceptual spaces to semiotic theories. A simple example where this
generality is needed is in the integration of a window with its scrollbar,
which is structural, not conceptual, although conceptual aspects of this blend
could also be studied; this example is discussed in considerable detail in [Goguen, 2003]. To indicate this
added generality, we will use the terms structural blending or
structural integration for the blending of semiotic systems, which in
general involves non-trivial constructors; but for consistency with BT, we use
the phrase "semiotic space" instead of "semiotic theory" in this discussion.
The simplest form of blend is as shown in Figure 4,
where I1 and I2 are called the input
spaces, B the blend space, and G the generic
space, which contains conceptual structure that is shared by the two input
spaces7. Let us call
I1, I2, and G together with the two
morphisms G ® I1
and G ® I2 an
input diagram. Then a blendoid over a given input diagram is a
space B together with morphisms I1 ® B, I2 ® B, and G ® B, called injections, such that the
diagram of Figure 4 weakly commutes, in the
sense that both compositions G ®
I1 ® B and G
® I2 ® B are weakly equal to the morphism
G ® B, in the sense that each
element in G gets mapped to the same element in B under them,
provided that both morphisms are defined on it8. The special case where all four spaces
are conceptual spaces gives conceptual blends. This diagram is "upside down"
from that used by Fauconnier and Turner, in that our arrows go up, with the
generic G on the bottom, and the blend B on the top. Our
convention is consistent with duality mentioned earlier, as well as with the
way that such diagrams are usually drawn in mathematics, and with the image
schema MORE IS UP (since B is "more"). Also, Fauconnier and Turner do
not include the map G ® B. By
definition, the maps G ®
I1 and G ®
I2 are total, not partial, and if the input spaces were
minimal, then the maps I1 ®
B, I2 ® B, G
® B would also be total.
Usually an input diagram has many blendoids,
only a few of which are interesting. Weak commutativity of the blend
diagram, which is included in the definition, is a good first step, but still
leaves too many possibilities. Therefore additional principles are needed
for identifying the most interesting possibilities, so that we can define a
blend to be a blendoid that is optimal with respect to these
principles. Fauconnier and Turner suggest a number of "optimality
principles" that serve this purpose (see Chapter 16 of [Fauconnier and Turner, 2002]), but they are too vague to
be easily formalized. A tentative and difficult but precise mathematical
approach is given in Appendix B of [Goguen and
Malcolm, 1996], based on a modification of the category theoretic notion
of "pushout" [Mac Lane,
1998]; this modification takes advantage of an ordering relation on
morphisms, along the lines discussed in Section 2.4.
The intuition is that nothing can be added to or subtracted from such an
optimal blendoid without violating consistency or simplicity in some way.
However, there can still be more than one blend in this sense, as an example
discussed below will make very clear. It should also be noted that this
notion of blend easily generalizes to any number of semiotic spaces, and even
to arbitrary diagrams of semiotic spaces and morphisms, for which there are
many significant applications. Thus, the emphasis in [Fauconnier and Turner, 2002] on blends having the form of
Figure Figure 4 seems inappropriate for algebraic
semiotics, because its major applications typically involve multiple spaces
and multiple morphisms among them.
It has perhaps not been sufficiently emphasized
in the BT literature that blending does not always give a unique result. For
example, the following are four different blends of conceptual spaces for
"house" and "boat":
- houseboat;
- boathouse;
- amphibious RV; and
- boat for moving houses.
The last may be a bit surprising, but I once saw such a boat in Oban,
Scotland, transporting prefabricated homes to a nearby island. There are also
some other, even less obvious blends [Goguen and Harrell, 2004].
In the UCSD Meaning and Computation Lab, Fox
Harrell and I have been experimenting with a blending algorithm, which has
generated novel metaphors, which in turn were used in generating poems [Goguen and Harrell, 2004], with some
success before a live audience. The algorithm uses dynamic programming to
generate blends in approximate order of optimality, and if requested, can
generate all possible blends, including even very bad ones. One surprise was
that there were so many blends, for example, 48 for the (small) house and boat
spaces.
The CMT view of metaphor associates aspects of
one domain to another, and describes this association using a mapping, of
which the target domain concerns what the metaphor is "about." On the other
hand, BT views metaphors as "cross-space mappings" that arise from blending
conceptual spaces extracted from the domains involved. For example, the
metaphor "my love is a rose" arises from blending conceptual spaces for "my
love" and "rose," such that the identification of the two items "love" and
"rose" in the blend space gives rise to a correspondence between certain items
in the rose space and the target love space. Such metaphoric blends
are asymmetric, in that as much as possible of the target space is
imported into the blend space, whereas only key aspects from the source space,
associated with elements that have been identified with elements of the target
space, are imported, e.g., sweet smell and attractive color; moreover, names
from the top space take precedence over those in the source space, so that
relations in the source space become "attributed" to items in the target
space. Our approach differs from orthodox BT not only in that we allow many
more kinds of structure in our spaces, but also in that we do not first
construct a minimal image in the blend space and then "project" that material
back to the target space, but instead, we construct the entire picture in the
blend space. Thus it is not the case for us that, in forming the blend,
elements are preferentially omitted from the target space, only to be restored
upon projection, as described in [Grady et al., 1999]. Since CMT has
been mainly concerned with families of metaphors having a shared pattern, and
BT has been more concerned with how novel metaphors can be understood, the two
theories are compatible, and can both play a role in understanding complex
language. This and related issues are discussed with many interesting details
and examples in [Grady et al., 1999].
Algebraic semiotics also goes beyond conceptual
spaces in allowing entities that have dynamic states; this is necessary for
applications to the dynamic entities that appear in user interface design and
virtual worlds. Actually, two kinds of dynamics are involved in blending: the
process of blending itself, and entities with internal states. Whereas
cognitive linguistics has so far focussed mainly on the former, algebraic
semiotics is more concerned with the latter. Another difference from BT is
that relations like causality are represented as ordinary relations rather
than being given a special ad hoc status as "vital relations."
The conceptual spaces, mappings and blending of
cognitive linguistics seem well adapted for treating many aspects of
literature, as in [Turner, 1997],
as well as some recent trends in art, including (the very aptly named)
conceptual art movement, and with the conceptual aspects of works in many
other styles, which are often designed to provoke conceptual conflicts or to
force unusual conceptual blends. One important application is the combination
of music with lyrics, as skillfully studied using cross-domain mappings by
Lawrence Zbikowski [Zbikowski,
2002]. Unfortunately, the framework of conceptual blending seems too
restricted for studying blending within music, e.g., harmony,
polyphony, polyrhythm, etc., because musical structure is inherently
hierarchical, and hence cannot be adequately described using only atomic
elements and relation instances among them. Understanding how a particular
melody, chord sequence, and rhythm can work together requires attention to the
component notes, phrases, chords and beats, as well as to their subcomponents.
However, it seems that the added generality of semiotic spaces and semiotic
morphisms is adequate for such purposes.
3.2 Some Laws
This subsection gives some further fragments of a calculus of representation;
see [Goguen, 1999a] for more detail. Here
a,b,c are semiotic morphisms, and xà
y denotes some choice of a blend that is maximal with respect to some
optimality criterion:
a à b @ b à a
a à (b à c)
@ (b;a) à c
(a à b) à c
@ a à (b; c)
In the following, A,B,C may be either
semiotic morphisms or just semiotic systems. Sums, denoted +, are the
special case of blend where the base theory is 1, which is the theory
having exactly one constant, its top element, and nothing else.
A + 1 @ A
1 + A @ A
A + B @ B + A
A + (B + C) @ (A + B) + C
It should be noted that products of models
correspond to sums of theories, that is, a model of a sum of theories is a
product of models of the summand theories, and vice versa, or even more
formally, there is an isomorphism of categories of models,
Mod(A+B) @ Mod(A) ×Mod(B) ,
where A, B and are semiotic theories; see
[Goguen, 1999a] for details.
4 Case Studies
This section surveys some case studies applying algebraic semiotics. Noting
that we have already discussed blending and metaphor, the following additional
case studies are considered:
- information visualization;
- proof presentation;
- humor; and
- user interaction.
The first category in this list actually contains three small case studies,
and the second can also be considered a special case of it; proof
visualization is our most extensive case study, part of a large project to
produce a web-based system to support theorem proving. The study of humor is
somewhat of a digression, but it is hoped that the reader will find it
amusing. Proof navigation is used to illustrate how interaction is treated in
algebraic semiotics, although many details are left out, because the formal
theory of dynamic signs is technically rather complex.
4.1 Information Visualization
Visualizing complex data can help to discover, verify and predict patterns,
and to quickly locate specific information; but it can be difficult to
construct the appropriate visualizations for these purposes. Because
visualizations are representations, our theory applies to them, and in
particular, our quality measures apply. The following subsections analyze
three real visualization systems as semiotic morphisms, and on that basis,
suggest some improvements. We found it convenient to use algebraic semiotics
in a semi-formal style, letting the ideas and results guide the re-design, and
introducing formal details only to the degree that they actually help with
decisions. Many aspects of these discussions follow [Goguen and Harrell, 2003].
Figure 5: A Code Browser
4.1.1 Code Visualization
A visualization tool for code developed at ATT Bell Labs, and discussed in [Shneiderman, 1997],
displays the large grain structure of code by omitting details; see Figure 5. This is an excellent illustration of Principle F/C
(Section 2.4): commands are indistinguishable lines,
but files and procedures are easily distinguished, and the age of code is
highlighted with color (though it shows up as shades of gray in this figure),
presumably because code age is so important for software maintenance, which
accounts for most of the cost of large software systems. Moreover, code at
the command level can be viewed in a separate window, which is activated by
"zooming in" from the main overview window. However, software engineers often
need to find other specific features of code, such as:
- occurrences of particular variables,
- certain uses procedure calls, and
- certain uses of pointers.
Or consider what would be needed to work on the Y2K problem. To support
this kind of flexibility, the system should allow users to select and
highlight a variety of features to be displayed with color, not just code age;
indeed, each feature listed above could be highlighted with a different color,
because these features are binary (i.e., they either occur or do not occur, at
any given point in the program), rather than, like age, being measured on a
(nearly) continuous scale.
Figure 6: FilmFinder Display
4.1.2 A Film Visualizer
Figure 6 shows FilmFinder, a system to help consumers find
films, designed in Ben Shneiderman's group at the University of Maryland, as
described in [Shneiderman,
1998]. The vertical axis indicates popularity, the horizontal axis
indicates the release date, and the color9 indicates the genre; the area on the
right side of the display is for controlling the system. This complex sign is
the image under an appropriate semiotic morphism of a sign in a space of
information about films. From this, we infer that the designer of the system
thought users would consider the popularity, date, and genre to be the most
important attributes of films.
Instead of thinking of it as a consumer product,
it is interesting to think of this system as a scientific tool for displaying
data about the movie industry. Using it in this way, we can see that the
density of films increases rapidly in the most recent years displayed, except
perhaps for those genres that are the least popular; and we can also easily
see some other facts, such as that there has always been a higher percentage
of drama, and that there are increasing percentages of action and horror.
However, this representation is less useful than
it could be for this purpose. The problem is again that too much content and
not enough structure have been preserved. For example, it would be better to
aggregate all films having approximately the same attributes of interest into
one blob, and then display the number of films in a blob using a distinct
visual attribute, such as size or brightness. Successive blobs of the same
kind could then be connected by lines having the same color as the blobs.
Users could click on a blob to see what's in it, preferably displayed
graphically in a new popup window. These revisions would facilitate
hypothesis formation, and would also make the tool more useful for consumers,
especially when (as in the most recent years that are not represented in the
figure) there are many more films.
Figure 7: SpotFire Version of FilmFinder
4.1.3 A Later Version
Figure 7 shows a later version (SpotFire from ivee
Development in Sweden) of the FilmFinder tool in Figure 6;
the main improvement is that users have more control over what is displayed
and how it is displayed. This particular display has length and date as its
axes, and again uses color for genre, although the genre color coding scheme
is not explicitly shown; prize winning films are highlighted by having a
larger size. It is interesting to observe a clustering at around 90 minutes
length. But once again, the display is difficult to use because there are too
many dots, even though this display cuts off at 1990! If the user is seeking
a particular film or class of films, she will want to narrow the search focus
by imposing additional constraints, but from this single display, it is
difficult to know how easily that could be done. We are presumably supposed
to assume that the (possibly imaginary) user who created this display
considered these particular attributes the most interesting at a certain point
during a sequence of displays constituting a search; but in fact, they do not
seem especially useful for any particular purpose.
We can also infer what the designer of this
version thought would be most important, by examining the controls on the
right of the display; we can hope that these were determined by polling an
adequate pool of typical users. But the key issue is how convenient these
controls are for scenarios that typical users find particularly important;
most likely, those typical users are looking for a good video to rent, rather
than analyzing trends in the movie industry, and so the controls should
reflect the key actions involved in those searches, rather than just the most
important general attributes of films. It would take some experimental work
to determine these most relevant search attributes, but we can still criticize
the design of the control console, because of its exclusive focus on simple
attributes instead of structure. And we can also criticize the fine grain
control that it gives users over length and year, and suggest instead that
soft constraints would be more appropriate; it also seems doubtful that length
is a highly significant attribute for search. Moreover, we can criticize the
design philosophy, advocating instead a more social approach that relates the
profile of one user to the profiles of other users to select films that
similar users have found interesting (there are numerous variations on this,
such as listing films that a user's friends have liked; Amazon has exploits
similar strategies very successfully). Finally, we can note that the design
ideas proposed to improve the previous version of this system also apply to
the new version.
4.2 Proof Representation and
Understanding
It is well known (perhaps too well known to many unhappy students)
that understanding mathematical proofs can be very difficult. But why
is it difficult? And how can this situation be improved?
The UCSD Tatami project [Goguen
et al., 2000] aims to make proofs more interesting and even
enjoyable to read, by viewing them as representations of their underlying
mathematics, so that algebraic semiotics, including its theory of
representation quality, can be applied.
The Kumo system generates proof websites, based
on user-provided sketches in a language called Duck. The pages are in XML,
displayed using XSL style sheets, and can be viewed over the web using any
browser. The complex signs that users actually see are called
proofwebs, consisting of English phrases and sentences, mathematical
signs, navigation buttons, formal input and output for a mechanical theorem
prover, etc. [Goguen et al.,
2000; Goguen, 1999b].
Our view of what constitutes the underlying
mathematics to be displayed is unusual: we consider it to include not just the
tree structure of proofs, decorated with formal sentences and rules, as is
common among computer scientists and logicians, but also:
- a dramatic structure, following Aristotle (see below);
- a narrative structure (following ideas of Labov [Labov, 1972] and Linde [Linde, 1981], as briefly described below);
- hyperlinks to related material, including tutorials for proof rules used,
input and output to a formal theorem prover (if available), and motivation and
explanation for proof strategies and steps; and
- image schemas (in the sense of Lakoff [Lakoff and Johnson, 1980; Lakoff, 1987]). As with any virtual world, image
schemas can make the language more direct and powerful, and hence easier to
follow.
Aristotle said, "Drama is conflict" [Aristotle, 1997], which suggests providing conflict
to add drama to proofs. Finding a non-trivial proof usually requires
exploring many dead ends, errors and misconceptions, some of which may be very
subtle. Therefore the process of proving can be full of disappointed hopes,
unexpected triumphs, repeated failures, and even fear and interpersonal
conflict. All this is typically left out when proofs are written up, leaving
only the map of a path that has been cleared through the jungle. But proofs
can be made much more interesting and understandable if some of the conflicts
that motivated their difficult steps are integrated into their structure;
proof obstacles are exactly what is needed for drama. Of course, this must be
done with care, and it should not be overdone, just as in a good novel or
movie. Our Kumo theorem proving system [Goguen et al., 2000] used these ideas to structure
the websites that it generates to display proofs. Aristotle also gave other
useful suggestions, including unity of time and place, and having a beginning,
middle, and end to a drama [Aristotle, 1997].
William Labov [Labov, 1972] showed that oral narratives of personal
experience have a precise internal structure, which includes the following:
- an optional orientation section, which provides basic orientation
information, such as the time and place of the story, and perhaps some major
characters;
- a sequence of narrative clauses which describe the events of the
story;
- the narrative presupposition, which by default assumes that the
ordering of the narrative clauses corresponds to the temporal ordering of the
events that they describe;
- evaluative material integrated with the narrative clauses, which
"evaluates" the events, in the sense of relating them to socially shared
values; and finally
- an optional closing section, which may contains a "moral" or a
summary for the story.
The above follows [Linde, 1981;
Linde, 1993], which describe
developments subsequent to the classic treatment of [Labov, 1972]. Although this empirical research used
oral narratives of personal experience as data, its results apply much more
broadly (though in general less precisely), since the class of narratives is
the core around which many other discourse types are built.
Figure 8: A Typical Tatami Homepage and Proof Page
To aid our discussion of proofs as representations, we introduce terminology
for the source and target semiotic spaces: let us call their elements
abstract proofwebs and proof displays, respectively, and perhaps
also use the term display proofweb for target signs. In addition, the
term unit refers to a block of information of the same kind in a proof
display. The display proofwebs generated by Kumo adhere to the following
style guidelines, called the tatami conventions [Goguen et al., 2000]. The first eight are justified
mainly by narratology:
- Homepages are provided for every major proof part; homepages
introduce and motivate the problem to be solved and the approach taken to the
solution for that part, and correspond to the orientation sections of Labov's
narrative structure; they may contain graphics, applets, and of course text.
Homepages appear in the same window as their tatami pages (see the next item)
because they are part of the same narrative flow. See Figure 8.
- Tatami pages, also called proof pages, are the basic
constituents of display proofwebs; they are XML pages containing one or more
proof unit, with its inference rule applications, interleaved with one or
more explanation units. This interleaving follows the integration of
narrative and evaluative material in Labov's theory. Limiting the number of
non-automatic proof steps on tatami pages to approximately 7 is justified by
classic work of Miller on limitations of human cognitive capacity [Miller, 1956]; this limitation also
makes it feasible to place both proof and explanation units on the same proof
page. See Figure 8.
- The explanation units of tatami pages are prover-supplied
informal discussions of proof concepts, strategies, obstacles, etc. They
correspond to the evaluative material in Labov's theory, and motivate
important proof steps by relating them to values shared in the appropriate
community of provers.
- Tatami pages can be browsed in an order designed by the prover to be
helpful and interesting to the reader; if possible, they should tell a story
about how obstacles were overcome (or still remain). This narrative order
again comes from Labov's theory, while including obstacles comes from work of
Campbell [Campbell, 1973] and others on "heroic" narratives.
- Major proof parts, including lemmas, have their own subwebsites, each
with the same structure as the main proof, including homepage and explanation
units. These appear in a separate dedicated persistent popup window. Having
separate hyperlinked subwebsites for major proof parts is similar to the way
that flashbacks and other temporal dislocations occur in stories. It is
helpful to have them in a separate window in order to clarify their relation
to the main sequence of proof steps.
- Tatami pages also have associated formal proof scores, which appear in
another separate popup window when summoned from a tatami page. The separate
window is convenient because users typically want to look at the formal proof
and its motivation at the same time as the proof score. Users can also
request proof score execution, and the result is displayed in the same window
as the score, so that one can easily alternate between them. (The proof
score is sent to an OBJ server and the result is returned for display.) This
hiding of routine details is similar to human proofs, which use it to
highlight the main ideas [Livingston, 1987].
- Major proof parts can have an optional closing page, to sum up
important results and lessons, again following Labov's theory. They appear
in the same window as proof pages, again because they are part of the same
narrative flow.
- A menu of open subgoals appears on each homepage, and error messages
are placed on appropriate pages. Open subgoals are important to provers when
they read a proof, since proving new results is a major value within this
community.
Now we give further style guidelines, with justifications based on algebraic
semiotics:
- Windows: The main contents of a display proofweb are its proof
steps, informal explanations, tutorials, and mechanical proof scores. These
four are also the main contents of abstract proofwebs, and their preservation
has much to do with the quality of their representation. These four basic
sorts of the abstract data type for proofwebs are reflected in our choice of
windows for displaying them. Because tatami pages are the main constituent of
proofwebs, theirs is the master window, and because explanation pages are so
closely linked, they share that window; each unit is enclosed is its own
"box." Tutorial and machine proof score pages each have a separate window.
All this preserves the hierarchical structure and priorities of the underlying
mathematics.
- Backgrounds: Each major sort of unit has its own background color:
proof units have light beige, explanations have light yellow, tutorials have
yellow marble, and proof scores have light purple. Although the choice of
colors is somewhat arbitrary, and is easily changed by editing the XSL style
file, their distinctness reflects the importance of distinguishing these four
units.
- Navigation: Similar considerations hold for navigation. Each
page has a title, supplied by the user in the Duck script (or a simple default
if no title is supplied). Buttons are used to move to other pages of the same
sort, and to open widows that display information of other sorts. Each
persistent window has somewhat different layout and navigation buttons,
reflecting its different typical uses. For example, the master tatami window
has buttons to step through the narrative ordering of tatami pages, both
forward and backward, and a button to return to the homepage.
- Mathematical Formulae: gif files are used for mathematical
symbols, in a distinctive blue color, because mathematical signs come from a
domain that is quite distinct from that of natural language.
Some additional applications of semiotic morphisms to the user interface
design of the Tatami system are described in [Goguen, 1999b], in a more precise
style than here, although they are based on an older version of the system.
For example, [Goguen, 1999b] shows that certain early designs for the status
window were incorrect because the corresponding semiotic morphisms failed to
preserve certain key constructors.
We have studied a corpus of over 50 "humorous
oxymorons" (phrases like "military intelligence," "good grief," and "almost
exactly"). Dictionaries say that an "oxymoron" is a phrase having
contradictory (or incongruous) components. But this is not what happens in a
humorous oxymoron: instead, there are two distinct meanings, one of which is
conventional, and the other of which has some contradictory components; i.e.,
there are two different blends, one of which has conflicts. When we are told
that something is an oxymoron, we seek out that second, conflictual blend,
and we feel pleasure when we find it.
We also studied over 40 newspaper cartoons, and found that about 75 percent
have a similar pattern, but instead of two blends existing simultaneously, the
reader is first led to form one blend, and then led by new information to form
a completely different blend, usually in partial conflict with the first; that
is, there is a kind of dynamic reblending.
Thus in each case, it is not just the existence of more than one blend, but
rather the process of reblending that produces the humorous effect,
and I conjecture that reblending in fact characterizes humor. This is
relevant to HCI and the design of virtual worlds, because humor is sometimes
used in computer system interfaces, often very badly. For example, the
paperclip in MicroSoft Office creates a poor impression in part because the
sensation of reblending loses loses its effect if it is repeated many times,
and eventually becomes "stale" or even unpleasant. See [Goguen, 2004a] for
additional details. These observations, which go back to about 1999, seem to
have potential for fascinating new application areas.
4.4 Interaction
Classical semiotics is concerned with static
signs; it does not allow for signs that change in response to user input, or
that move on their own. This section sketches how algebraic semiotics handles
dynamics, by extending its foundation from classical algebra to hidden
algebra. As a simple example, consider the problem of designing that part of
the Kumo interface that supports browsing proofs. Kumo provides buttons to
traverse in the proof author's chosen narrative order, labeled with iconic
triangles to indicate forward and backward motion, as well as buttons to
return to the homepage, to view the specification, etc. (see Figure 8). Common practice would suggest constructing an
automaton with a state for each proof tree node, and a transition label for
each traversal button. But this does not allow for the fact that different
proofs have different structures, and thus different automata, nor does it
account for the different displays that are produced in each state, nor for
the variety of possible implementations of transition lookup, e.g., using
lists, arrays, or hash tables. An automaton can describe how a single proof
instance can be navigated, but it cannot describe the general method which
generates proof navigation support for any given proof, nor the way that this
method is implemented, nor the quality of the resulting interface.
In fact, despite the formal character of the model itself, the construction
and use of transition diagrams (or the corresponding automata) in user
interface design is intuitive, and does not provide an adequate basis for a
rigorous mathematical analysis of possible designs. In order to address the
display, implementation and quality questions raised above, the automaton
model must be supplemented in various ad hoc ways, whereas hidden
algebra can handle all of these within a single unified framework. Another
example of dynamics in Kumo that would be difficult to handle with traditional
user interface modeling techniques is the facility to execute the proof script
for a proof part by downloading it to a BOBJ proof server and then viewing the
result on the local browser as it executes.
This is not the place for details (see [Goguen, 2003] for that), but we can
say that hidden algebra provides a precise way to handle both the display and
implementation aspects of examples like that described above, and the
corresponding extension of semiotic morphisms gives a precise basis for
comparing the quality of interface designs realizing the desired dynamics,
without bias towards any particular implementation. The dynamics of a window
with a scrollbar is discussed in considerable detail in [Goguen, 2003].
5 Summary, Future Research, and Social
Implications
This paper has presented theory and case studies to support the claim that
algebraic semiotics is a promising foundation for virtual world design, in
both theory and practice. The case studies on information visualization,
proof presentation, metaphor, humor, and interaction are encouraging, and
suggest that design problems can be successfully confronted directly, without
unreliable ad hoc methods and assumptions, such as analyses based on
prior systems that are only remotely related, or expensive, time-consuming
methods of experimental psychology and usability testing. These studies also
confirm our views that taking account of key social and cognitive factors is
crucial for success, and that formal methods can play a very helpful role, if
applied pragmatically rather than dogmatically. However, much more work is
still needed, such as:
- Combining Gibsonian affordances [Gibson, 1977] with algebraic semiotics, to provide a
socio-cognitive dimension for the interaction formalism discussed in Section
4.4.
- Studying immersion in virtual worlds, e.g., how closure and embodiment
relate to representational coherence, image schemas, affordances, choice of
media, etc.
- More work on social foundations and the processes of semiosis.
- More work on narrative structure, including flashbacks and flashforwards.
- More work on how to choose quality orderings on representations that are
appropriate to their actual use.
- More case studies, done more thoroughly.
Only the second of these is specific to virtuality, though all are
related. We hope that readers of this paper may find some benefit to the
algebraic semiotic approach, and will contribute to its further development.
I close this article with some words of warning,
along lines perhaps most closely associated with Jean Baudrillard [Baudrillard, 1994], who
wrote:
Simulation is no longer that of a territory, a referential being, or a
substance. It is the generation by models of a real without origin or
reality. ... By crossing into a space ... no longer that of the real, nor
that of truth, the era of simulation is inaugurated by a liquidation of all
referentials - worse: with their artificial resurrection in the systems of
signs, a material more malleable than meaning, in that it lends itself to all
systems of equivalences, to all binary oppositions, to all combinatory
algebra. It is no longer a question of imitation, nor duplication, not even
parody. It is a question of substituting the signs of the real for the real,
that is to say of an operation of deterring every real process via its
operational double, a programmatic, metastable, perfectly descriptive machine
that offers all the signs of the real and short-circuits all its
vicissitudes. Never again will the real have the chance to produce itself -
such is the vital function of the model in a system of death ...
If we translate this out of the stylistic conventions of recent French
intellectualism, the danger is that the virtual can replace the real in our
affections, so that we lose touch with our communities, our values, even the
very living quality of our lives. Baudrillard claims that exactly such
alienation is already characteristic of the contemporary world, and that it is
growing like a cancer. He does not offer any solution to this dilemma, but I
would like to suggest that compassion [Goguen, 2004b] is one way out of an
enervating absorption in virtuality. A sympathetic feeling for the suffering
of others, and action on their behalf, can generate positive emotionality and
re-engagement with real experience. And, contrary to Baudrillard, it seems
quite possible that technology, including virtual world technology, can assist
with such projects.
References
- [Aristotle, 1997]
-
Aristotle (1997). Poetics. Dover. Translation by
S.H. Butcher; original from approximately 330 B.C.
- [Barthes, 1968]
-
Barthes, R. (1968).
Elements of Semiology.
Hill and Wang.
Trans. Annette Lavers and Colin Smith.
- [Barwise and Perry, 1983]
-
Barwise, J. and Perry, J. (1983).
Situations and Attitudes.
MIT (Bradford).
- [Baudrillard, 1994]
-
Baudrillard, J. (1994).
Simulacra and Simulation.
Michigan.
Translated by Sheila Faria Glaser.
- [Campbell, 1973]
-
Campbell, J. (1973).
The Hero with a Thousand Faces.
Princeton.
Bollingen series.
- [Carroll, 1982]
-
Carroll, J. (1982).
Learning, using, and designing filenames and command paradigms.
Behavior and Information Technology, 1(4):327-246.
- [Fauconnier, 1985]
-
Fauconnier, G. (1985).
Mental Spaces: Aspects of Meaning Construction in Natural
Language.
Bradford: MIT.
- [Fauconnier and Turner, 1998]
-
Fauconnier, G. and Turner, M. (1998).
Conceptual integration networks.
Cognitive Science, 22(2):133-187.
- [Fauconnier and Turner, 2002]
-
Fauconnier, G. and Turner, M. (2002).
The Way We Think.
Basic.
- [Gentner, 1983]
-
Gentner, D. (1983).
Structure-mapping: A theoretical framework for analogy.
Cognitive Science, 7(2):155-170.
- [Gibson, 1977]
-
Gibson, J. (1977).
The theory of affordances.
In Shaw, R. and Bransford, J., editors, Perceiving, Acting and
Knowing: Toward an Ecological Psychology. Erlbaum.
- [Goguen, 1994]
-
Goguen, J. (1994).
Requirements engineering as the reconciliation of social and
technical issues.
In Jirotka, M. and Goguen, J., editors, Requirements
Engineering: Social and Technical Issues, pages 165-200. Academic.
- [Goguen, 1997]
-
Goguen, J. (1997).
Towards a social, ethical theory of information.
In Bowker, G., Star, L., Turner, W., and Gasser, L., editors,
Social Science, Technical Systems and Cooperative Work: Beyond the Great
Divide, pages 27-56. Erlbaum.
- [Goguen, 1999a]
-
Goguen, J. (1999a).
An introduction to algebraic semiotics, with applications to user
interface design.
In Nehaniv, C., editor, Computation for Metaphors, Analogy and
Agents, pages 242-291. Springer.
Lecture Notes in Artificial Intelligence, Volume 1562.
- [Goguen, 1999b]
-
Goguen, J. (1999b).
Social and semiotic analyses for theorem prover user interface
design.
Formal Aspects of Computing, 11:272-301.
Special issue on user interfaces for theorem provers.
- [Goguen, 2003]
-
Goguen, J. (2003).
Semiotic morphisms, representations, and blending for interface
design.
In Proceedings, AMAST Workshop on Algebraic Methods in Language
Processing, pages 1-15. AMAST Press.
Conference held in Verona, Italy, 25-27 August, 2003.
- [Goguen, 2004a]
-
Goguen, J. (2004a).
CSE 275 homepage: Social issues in science and technology.
www.cs.ucsd.edu/users/goguen/courses/275/.
- [Goguen, 2004b]
-
Goguen, J. (2004b).
Groundlessness, compassion and ethics in management and design.
In Boland, R. and Callopy, F., editors, Managing as Designing.
Stanford.
- [Goguen and Harrell, 2003]
-
Goguen, J. and Harrell, F. (2003).
Information visualization and semiotic morphisms.
In Malcolm, G., editor, Visual Representations and
Interpretations. Elsevier.
Proceedings of a workshop held in Liverpool, UK.
- [Goguen and Harrell,
2004]
- Goguen, J. and Harrell, F. (2004). Foundations for active
multimedia narrative: Semiotic spaces and structural blending. To appear in
Interaction Studies: Social Behaviour and Communication in Biological and
Artificial Systems.
- [Goguen et al., 2000]
-
Goguen, J., Lin, K., Ro su, G., Mori, A., and Warinschi, B. (2000).
An overview of the Tatami project.
In Futatsugi, K., Nakagawa, A., and Tamai, T., editors, Cafe: An
Industrial-Strength Algebraic Formal Method, pages 61-78. Elsevier.
- [Goguen and Linde, 1984]
-
Goguen, J. and Linde, C. (1984).
Optimal structures for multi-media instruction.
Technical report, SRI International.
To Office of Naval Research, Psychological Sciences Division.
- [Goguen and Malcolm, 1996]
-
Goguen, J. and Malcolm, G. (1996).
Algebraic Semantics of Imperative Programs.
MIT.
- [Goguen and Malcolm, 2000]
-
Goguen, J. and Malcolm, G. (August 2000).
A hidden agenda.
Theoretical Computer Science, 245(1):55-101.
Also UCSD Dept. Computer Science & Eng. Technical Report
CS97-538, May 1997.
- [Goguen et al., 1978]
-
Goguen, J., Thatcher, J., and Wagner, E. (1978).
An initial algebra approach to the specification, correctness and
implementation of abstract data types.
In Yeh, R., editor, Current Trends in Programming Methodology,
IV, pages 80-149. Prentice-Hall.
- [Grady et al., 1999]
-
Grady, J., Oakley, T., and Coulson, S. (1999).
Blending and metaphor.
In Gibbs, R. and Steen, G., editors, Metaphor in Cognitive
Linguistics. Benjamins.
- [Labov, 1972]
-
Labov, W. (1972).
The transformation of experience in narrative syntax.
In Language in the Inner City, pages 354-396. University of
Pennsylvania.
- [Lakoff, 1987]
-
Lakoff, G. (1987).
Women, Fire and Other Dangerous Things: What categories reveal
about the mind.
Chicago.
- [Lakoff and Johnson, 1980]
-
Lakoff, G. and Johnson, M. (1980).
Metaphors We Live By.
Chicago.
- [Linde, 1981]
-
Linde, C. (1981).
The organization of discourse.
In Shopen, T. and Williams, J. M., editors, Style and Variables
in English, pages 84-114. Winthrop.
- [Linde, 1993]
-
Linde, C. (1993).
Life Stories: the Creation of Coherence.
Oxford.
- [Livingston, 1987]
-
Livingston, E. (1987).
The Ethnomethodology of Mathematics.
Routledge & Kegan Paul.
- [Mac Lane, 1998]
-
Mac Lane, S. (1998).
Categories for the Working Mathematician.
Springer.
Second Edition.
- [Miller, 1956]
-
Miller, G. A. (1956).
The magic number seven, plus or minus two: Some limits on our
capacity for processing information.
Psychological Science, 63:81-97.
- [Peirce, 1965]
-
Peirce, C. S. (1965).
Collected Papers.
Harvard.
In 6 volumes; see especially Volume 2: Elements of Logic.
- [Pierce, 1991]
-
Pierce, B. C. (1991).
Basic category theory for computer scientists.
MIT.
- [Saussure, 1976]
-
Saussure, F. (1976).
Course in General Linguistics.
Duckworth.
Translated by Roy Harris.
- [Shneiderman, 1997]
-
Shneiderman, B. (1997).
Designing the User Interface.
Addison Wesley.
Second edition.
- [Shneiderman, 1998]
-
Shneiderman, B. (1998).
Designing the User Interface.
Addison Wesley.
Third edition.
- [Tufte, 1983]
-
Tufte, E. (1983).
The Visual Display of Quantitative Information.
Graphics Press.
- [Turner, 1997]
-
Turner, M. (1997).
The Literary Mind.
Oxford.
- [Zbikowski, 2002]
-
Zbikowski, L. (2002).
Conceptualizing Music.
Oxford.
Contents
1 Introduction: Motivation, Difficulties and Approaches
2 Algebraic Semiotics
2.1 Signs and Sign Systems
2.2 Representations
2.3 Simple Examples
2.4 Quality of Representation
3 Fragments of a Calculus of Representation
3.1 Metaphor and Blending
3.2 Some Laws
4 Case Studies
4.1 Information Visualization
4.1.1 Code Visualization
4.1.2 A Film Visualizer
4.1.3 A Later Version
4.2 Proof Representation and Understanding
4.3 Humor
4.4 Interaction
5 Summary, Future Research, and Social Implications
Footnotes:
1Due to the nature of this paper, sign
systems are not fully formalized, and in particular, signatures are treated
rather informally, because they are sufficiently complex that a formal
definition would distract from the flow of ideas; see [Goguen and Malcolm, 1996; Goguen et al., 1978] for the formal definition of
signature, and see [Goguen, 1999a] for the
formal definition of sign system.
2These are for fixed data types like
integers, booleans and colors, which are always interpreted in a standard way.
3This use of the word "space" conflicts with
the conceptual spaces of cognitive linguistics, which are discused below.
4This duality is a Galois connection between
algebraic theories and their models; it does not involve the levels or
priorities.
5The source UP is grounded in our experience
of gravity, and the schema itself is grounded in everyday experiences, such as
that when there is more beer in a glass, or more peanuts in a pile, the level
goes up, and that this is a state we often prefer; therefore the image schema
MORE IS UP, discussed in [Lakoff,
1987], is even more basic.
6However, we do not assume that they are
necessarily the minimal such spaces needed to understand a given blend,
since that can only be determined after the blend has been understood.
Moreover, different blends may ignore different elements of the input spaces,
and it may also be necessary to recruit additional information from other
spaces in order to understand a blend.
7The term "base space" is used in [Goguen, 1999a], because it is considered to
better describe how this theory is used in applications to user interface
design.
8Strict commutativity, usually called just
commutativity, means that the compositions are strictly equal, i.e.,
one morphism is defined on an element if and only if the other is, and then
they are equal.
9But as before, gray tones appear in our
rendition of the display.
File translated from TEX by
TTH, version 3.59.
On 20 Mar 2004,
18:32.
Extensively edited by Joseph Goguen 20 - 22 March 2004.