CAPE: A programming environment for a new generation of Knowledge Based Systems

Robert Inder

7th May, 1999

Background

Conventional Knowledge Based Systems (KBSs) involve the controlled manipulation of symbolic descriptions of the world.

Extracting such symbolic descriptions from sensory data was (and still is) a major challenge in its own right, addressed by specific fields as speech recognition and vision, and most work in KBSs tries to sidestep it.

The first KBSs worked with very restricted amounts of information (e.g. chess positions) and typically solving problems described in special format data files.

KBSs started to achieve much wider use/acceptance when Expert Systems began to ask questions of users, (``Has the patient got a rash?''), thereby using the user's perceptual abilities and common sense to analyse the world into the abstract terms that the KBSs required. Subsequently, KBSs began to access data, either statically (i.e. KBS linked to database) or by analysing real-time feeds (e.g. network monitoring). To a greater or lesser extent, these systems can determine what data is relevant, and when, but only within a fixed (or at least limited) range of data for which the format and semantics were known to the system builders (or maintainers).

Now, we have the Internet.

By providing ``universal'' connectivity between computers, the Internet has created a previously inconceivable range of possibilities for any program. Innumerable documents -- ranging from the ephemera of electronic ``news'' discussions through to published documents (including newspapers, literature and technical documents) -- can be retrieved from almost anywhere in the world within moments. Programs can instantly query a multitude of databases, covering everything from holiday prices to registered trade marks, and call on other computational services such as dynamic information feeds (e.g. stock prices), notification of changes and so forth. Finally, the Internet allows programs to easily interact with huge numbers of people, either by email or using Web-based forms.

The number of sources of information and services continues to grow at an incomprehensible pace. Crucially, though, those sources are outside the control of--indeed, since they are continually growing and evolving, outside even the knowledge of--the system's builders.

Fully exploiting the opportunity presented by the Internet--e.g. by building systems which can analyse and filter information for a particular purpose--will in several ways require a new breed of KBSs. They will, at the very least, have to be easily configured to deal with new sources of information, and should, more importantly, be able to explore, discover and operate within a huge and ever-changing world of information.

By making it easier to both locate and distribute software, the rise of the Internet has also greatly increased the number that readily accessible software packages. These include not only end-user applications, ``plugins'' and extensions for system tools and programming languages, but also library components for more-or-less standard tasks, such as parsing standard document formats and mark-ups.

In particular, there are efficient implementations of data-intensive algorithms such as neural nets, statistical analysis or ``data mining'' tools, and information retrieval and other techniques for processing free text based on word occurrence statistics. Such packages are potentially valuable new sources of information. It will therefore be increasingly important that software tools are able to work with them--either by incorporating libraries or by interfacing to stand-alond packages.

Requirements

To best exploit the opportunities presented by the rise of Internet, a KBS programming tool should offer...

Expressive language
combining object oriented programming symbolic reasoning. It should also provide efficient memory management, and support both incremental program development and interactive debugging.

Searching and pattern matching
Only a tiny proportion of the oceans of material available can be readily assimilated by a program. Most is free text, and much of the rest--such as results of database queries--is tables and other more-or-less regular ``arrays''. Handling such material requires searching for words and phrases, recognising repeating structures and extracting content from them.

Support for Server Building
The Internet encourages one to make a system available as a ``server'' that users--or other programs (i.e. agents)--can contact as required. However, not all KBSs are well suited to starting on demand (e.g. for CGI), and many can require long periods of processing. An ideal KBS tool should support building processes that remain responsive while reasoning.

Easy Interaction with Other software
In addition to providing services to other programs, future KBS must be able to use the wide range of libraries or packages that are available. Tools must allow linking to C (the most common language for package distribution), and facilitate interacting with other stand-along processes.

One way to achieve this is by extending an existing language which has some of the desired properties (e.g. by implementing a rule-based engine within Java, such as the Java Expert System Shell (Jess)--see http://herzberg1.ca.sandia.gov/jess/).

The alternative is to combine two (or more) languages which each exhibit some of the desired properties into a single programming environment, in the manner of Poplog (Sloman and Hardy, 1983). This is the approach adopted by CAPE.

CAPE

CAPE is a programming environment which allows programs to be written in an intimate mixture of the CLIPS (CLIPS: C Language Integrated Production System) rule-based and object-oriented language, and Perl, a procedural programming language.

CLIPS was chosen because it closely integrates a very fast forward chaining rule-based system with a flexible object system (supporting both message passing and generic functions). CLIPS was initially a partial re-implementation, in C, of Inference Art (Inder, 1989), which was arguably the most powerful of the lisp-based ``knowledge representation tookits'' that emerged during the mid/late 1980s. Its rule language features very powerful and efficient pattern matching facilites based on the RETE algorithm (Forgy, 1982), and including the ability to match against the state of objects, and a truth maintenance mechanism. Information and software related to CLIPS is accessible via, http://www.ghgcorp.com/clips/ Perl was chosen because of its extremely powerful regular expression matching facilities, and its huge library of freely available software modules. It also supports complex data structures, and (combinations of symbolic patterns), Information and software related to Perl is accessible via, http://www.perl.com/ . Note that CAPE takes a fundamentally different approach to combining languages than that taken by Poplog. Poplog combined languages by creating a common virtual machine and implementing the various languages within it (Smith, Sloman and Gibson, 1983). This allowed seamless integration with minimal performance overhead. In contrast, CAPE includes the standard language engines for both CLIPS and Perl. This has the advantage of ensuring that the functionality of these languages is provided correctly, so that all existing programs and libraries can be expected to work correctly. It also greatly reducing the implementation effort and maintenance effort, particularly when new features are added to the underlying languages.

To the two separate language sub-systems, CAPE adds inter-calling and data transfer. It also provides mechanisms for synchronising the initialisation of data structures between the two programming systems. Finally, it uses CLIPS's run function facility to support monitoring Internet sockets while reasoning, allowing the system to remain responsive to socket activity while reasoning.

Clips calls Perl
CLIPS programs can call specific Perl subroutines, evaluate strings or match Perl regular expressions, on either the left- or right-hand side of a rule -- that is, in either the condition or the action part -- or from within the body of a CLIPS function or message handler. In particular, rule firing can be made dependent on successful matching of Perl regular expressions. To support this, a number of CLIPS functions are defined, each of which converts the value returned by Perl into the relevant CLIPS data type.
Perl calls Clips
Perl programs can call functions defined in CLIPS--both normal functions and generic functions--and can send messages to CLIPS objects. There are also Perl subroutines defined for asserting facts into CLIPS working memory, and for having CLIPS evaluate an arbitrary string.

Sockets
All CAPE's functions for initiating and configuring Internet socket handling are available from both CLIPS and Perl. The current state of port monitoring and socket connection is used to continually update a collection of CLIPS objects which are accessible from both languages and available for pattern matching in CLIPS rules. This means systems to provide services via an Internet port can be built using only CAPE itself.

The lowest level handling of socket activity is done in C within the core of CAPE. A function called after each rule firing accept connections to any ports being monitored, and aggregates any data received from any current connection. Then, whenever the buffer is full, or a Record break character (currently new line) is received, the accumulated byte string is passed to a connection-specific Perl function for filtering. This function could simply respond directly to the input received. Typically, though, it will map it into CLIPS working memory, thereby allowing the full power of the pattern matching to be used to decide when and how to respond.

Listener
CAPE offers the user a read/eval/print loop, which prompts for a command, sends it for evaluation by either the Perl or Clips interpreter, and ensures that the result is printed. The appropriate interpreter is selected purely on the selection of the first non-blank character, but CLIPS' lisp-like syntake means that this simplistic approach is adequate.

Listener itself is in Perl, so can be redefined, and thus customised or extended, by user.

The CLIPS rule engine has a very powerful mechanism for deciding which rule to fire at any point. This can be used to make extremely subtle and flexible decisions about the flow of control within the system--i.e. about what the CAPE application should do next. Doing this requires structuring the system as a whole as a rule-based system, so that firing rules triggers activities that are possibly implemented in Perl. Such a system will also remain responsive to external events (e.g. socket activity), provided that the code executed by any particular rule does not take too long to run.

Perl to help map ``the world'' into symbols that CLIPS can then reason about, and then to interpret the symbolic results of that reasoning into actions in the world.

CAPE has been designed (and primarily tested and exercised) with this model of system operation in mind. However, there is no (known!) obstacle to building an ``inverted'' system -- i.e. one in which overall control resides in a Perl program (or an interface generating call-backs into Perl), within which some subroutines specify or initiate rule-based reasoning.

Using CAPE

The functionality of the current version of the system has been stable since the spring of 1998, and the system itself has been used for the development of substantial research systems since the following summer.

Three demonstration systems, which form part of the CAPE distribution:

Handshaking:
A minimal demonstration of communication between a pair of CAPE processes. An ``interface agent'' accepts user requests (via HTTP) for commands to be executed, and forwards them to the relevant ``execution agent'', which executes them in due course. The execution agent then contacts the interface agent with the results, which it then forwards to the user.

Web server
This application operates as a (partial) web server. It accepts a limited set of HTTP requests, locates the relevant file and then replies with either an appropriate HTTP response--either an error, or the contents of the requested file (along with appropriate HTTP header information). However, the server also allows users to specify ( (using an HTML form that it generates) required transformations to pages, which are then subsequently applied to the pages being served. This demonstration system, which uses the ``webserve'' component application, is about 350 lines of code.

``Dungeon''
This application provides a real-time multi-user ``dungeon'' game. Players access the system via a web browser, and are able to direct the actions of one of a number of ``characters'' moving within a simple environment. In addition to moving about and manipulating objects, players are likely to encounter other characters, either controlled by other players, or by the computer. All descriptions of situations are generated from information structures (i.e. there is no ``canned'' text) based on the objects and locations known to the user. Computer-controlled characters can generate and follow multi-step plans.

The entire application was build from scratch in five days. It contains a total of 2800 lines of CAPE code (about one third Perl), although re-implementation using the webserve component application would reduce this substantially.

In addition to the demonstration and component applications mentioned above, CAPE is being used to develop a number of research systems. Most of this work relates to DIME: the Distributed Information Manipulation Environment. DIME is a collection of components useful for building systems to retrieve and manipulate distributed data. These components include a highly flexible document cache (which is able to fetch documents when necessary, store them locally, and search them in various ways) and a framework for specifying and controlling interaction with remote search systems.

DIME is being used to develop two research systems. K-DIME, or Kansei DIME (Inder, Bianchi and Kato, 1999) is concerned with fetching and filtering images based on subjective criteria, or kansei. Users connect to K-DIME using a Web browser, and specify their requirements in terms of both the objective and subjective properties of the images. K-DIME then queries web-accessible search engines, such as the Alta-Vista Photofinder, for relevant images based on the objective criteria. It then analyses the results of these queries, retrieves the relevant images and then chooses and configures external ``oracles'' (currently neural nets) to assess them against the subjective criteria specified by the user. K-DIME is responsible for all aspects of the overall control of the operation, such as deciding which oracles to run and when, setting filter thresholds and so forth.

The second system being build using DIME is Maxwell, a ``smart'' agent intended to both query and combine the results from a number of book vendors databases on behalf of the user. An initial version of Maxwell was implemented in Emacs Lisp (Inder and Kato, 1998; Inder, Hurst and Kato, 1998), and a more powerful version is currently being implemented in CAPE using DIME.

Apart from DIME, CAPE has been used to build a system for generating graphical maps of web sites. Given an initial URL and a regular expression to delimit the area to be mapped, the system fetches the relevant pages and analyses them to extract the links that they contain. The eventual aim is to generate a graph and output it in the format required by a graph layout system--specifically, the one found in the ``thinking support tool'' D-Abductor (Sugiyama and Misue, 1991).

However, although one can trivially generate a graph description from a web site, such approaches scale very badly: even modest web sites give rise to graphs that contain too many nodes to be drawn effectively. Producing workable graphs, therefore, involves either generating or recognising structures that can be used as a basis for omitting or temporarily hiding parts of the graph.

The mapping system uses Perl for

Accessing libraries:
Pages are fetched through the document cache component application, which in turn uses the HTTP package from CPAN.
Manipulating URLs:
Extractign and canonicalising protocol, host names, file extension etc.
Extracting links:
Using regular expressions to search the pages retrieved to look for links, image maps and so forth,
Gathering document statistics:
Using regular expressions to find and count links, words in anchor texts and so forth.

However, the top-level of the system is written in CLIPS. In particular, the system's ontology is described by defining CLIPS objects for such concepts as pages, page sets, servers and directories. CLIPS rules and methods are then used for mapping pages into suitable graph descriptions. The current prototype contains just over 40 rules, doing such things as:

This approach is taken to ensure that, as far as possible, rules can be written at the domain level, without reference to the mechanics of fetching or searching documents. The figures give some idea of the extent to which this has been achieved. In particular, the rule for detecting ``chains'' of similar links can be written entirely in terms of facts describing the links that have been found in the various documents. These facts can be generated in CAPE in a single line of code, just by requesting the matches for an appropriate regular expression. THey could, of course, be generated in CLIPS, but it would require considerably more error-prone and tedious programming.

Note that the left-hand side of the rule decide-to-fetch-page includes a call to match a perl regular expression, while spot-chain calls a short perl function (strip_label) to manipulate a string.


 
Figure 1: A typical rule: note all in domain terms
(defrule decide-to-fetch-page
    "If we know about a page that is relevant to our target, 
       note that we should fetch it"
    (target ?query ?regexp)
    (object (is-a page)
	    (name ?doc)
	    (type html|unknown)
	    (did 0)
	    (url ?url&:(p-match-p ?url ?regexp)))
 =>
    (assert (to-fetch ?query 1 ?url ?doc)))


 
Figure 2: A typical rule: note all in domain terms
(defrule spot-chain
     "Find a chain of 'identical' links"
   (link ?did ?source ?anchor ?dest $?rest)
   (link ? ?dest ?anchor ?dest2 $?)
   (link ? ?dest2 ?anchor ?dest3 $?)
   (not 
      (link ? ? ?anchor ?source $?))
   (not (object
	 (is-a chain-set)
	 (source ?source)
	 (anchor ?anchor)))
   (object 
        (is-a webitem)
	(name ?obj)
	(url =(perl-call-s strip_label ?source)))
 =>
   (make-instance (gensym "chain-") of chain-set
		  (status possible)
		  (anchor ?anchor)
		  (source-obj ?obj)
		  (source ?source)))


Current Status and Future Work

CAPE is a new tool which brings together two very different programming systems. As such, there are inevitably a number of minor inconsistencies and omissions that reveal the immaturity of the system. Nevertheless, the functionality of the current version of the system has been stable since the spring of 1998, and, as described above, the system itself has been used for the development of substantial research systems since the following summer.

The system was made available by FTP and announced on the Web in February 1999, and has been being downloaded by about one person per day since then.

The core CAPE environment comprises 3000 lines of C code, plus 600 lines of Perl or CLIPS. The distribution currently also includes three demonstration applications (a web-server that can be configured to add links to pages, a demonstration of bi-directional inter-process communication, and a ``dungeon'' game), and 1500 lines of CAPE code in two standard CAPE ``component applications'', or ``Capplets'' (support for regression testing CAPE applications and a simple Web server), with more being developed.

There is also a thirty-one page manual.

CAPE runs under a number of versions of UNIX, including Solaris and Linux.

The main shortcoming of CAPE is the fact that CLIPS and Perl have different syntaxes, CLIPS being lisp-like and Perl being generally C-like. This unfortunately means that fully exploiting CAPE requires achieving reasonable proficiency in both languages, and in working with more than one language at a time (when programming, it is easy to write fragments of the ``wrong'' language). A single ``unified'' syntax is therefore clearly desirable, although defining it is non-trivial, since Perl syntax is already complex and very ``dense'' (in the sense that a great many characters and character combinations are already assigned meanings).

There are a number of obvious extensions to CAPE, some of which will be added to the system in the near future:

Conclusion

CAPE is a powerful tool for building a new generation of KBSs. It provides powerfule mechanisms to support a number of key activities:

Symbolic reasoning
CLIPS offers a very efficient forward chaining rule-based system with extremely expressive pattern matching, coupled with a highly flexible object-oriented system and supported by a truth-maintenance system.

Data analysis/manipulation
Perl has extremely powerful regular expression matching coupled with very concise string handling and easy-to-use hash-based index-building and data structuring.

Service Provision
CAPE's socket monitoring mechanisms allow a rule-based program to remain responsive to external activity even while it is reasoning.

Standard languages/libraries
CAPE programs can use any of the enormous range of software available in CPAN, the Comprehensive Perl Archive Network (accessible via http://www.perl.com/) .
Interaction with software packages
Perl provides very concise and flexible mechanisms for controlling and processing the results obtained from system commands and other external programs. CAPE programmers can also exploit the tools for generating Perl ``wrappers'' for software components written in C, and make use of Perl's ability to dynamically load compiled code at run-time.

CAPE is a powerful tool for building KBSs that can exploit the opportunities offered by the Internet. Download it now!

Acknowledgements

CAPE was developed while the author was supported by a NEDO Visiting Researcher fellowship at the Electrotechnical Laboratory (ETL) in Tsukuba, Japan. Matt Hurst helped greatly with bug hunting in the early stages, and Srinivasan (1997) was invaluable.

References

Forgy 1982
Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem", Charles L. Forgy, Artificial Intelligence 19(1982), 17-37.

Giarratano and Riley, 1994
Expert Systems: Principles and Programming, 2nd Edition J. Giarratano and G. Riley. PWS Publishing Company, 1994

Inder, 1989
State of the ART: A review of the Automated Reasoning Tool. R. Inder. In Expert System Applications, S. Vadera (ed), Sigma Press, Wilmslow, 1989. Also available as AIAI-TR-41.

Inder 1998
CAPE Users Manual. R. Inder. ETL Technical Report ETL-TR98-3, Electrotechnical Laboratory, Tsukuba, Japan. (Note: most recent version available via http://www.hcrc.ed.ac.uk/~ robert/CAPE/manual.ps )
Inder Bianchi and Kato 1999
``K-DIME: A Software Framework for Kansei Filtering of Internet Material'' R. Inder, N. Bianchi and T. Kato. Proceedings of IEEE Systems Man and Cybernetics 1999.

Inder, Hurst and Kato 1998
A Prototype Agent to Assist Shoppers R. Inder, M. Hurst and T. Kato. in Computer Networks and ISDN Systems, V30 (1998), pp 643-645. Also appeared as a short paper in Proceedings of WWW7. Full version available as Technical Report ETL-TR98-3 , Electrotechnical Laboratory, Tsukuba, Japan.

Inder and Kato 1998
Towards Shoppers' Assistants: Agents to Combine Information R. Inder and T. Kato. In Advanced Database Systems for Integration of Media and User Environments 98, Y. Kambayashi (ed.), World Scientific, February 1998.

Riley 1991
Clips: An Expert System Building Tool G. Riley. Proceedings of the Techlology 2001 Conference, San Jose, CA, 1991

Sloman and Hardy 1983
Poplog: A Multi-Purpose Multi-Language Program Development Environment A. Sloman and S. Hardy AISB Quarterly, Vol 47, pp26-34, 1983.

Smith, Sloman and Gibson 1983
POPLOG's Two-level Virtual Machine Support for Interactive Languages Robert Smith, Aaron Sloman and John Gibson Cognitive Science Research Paper 153, School of Cognitive and Computing Sciences, Unversity of Sussex, 1983.

Srinivasan 1997
Advanced Perl Programming S. Srinivasan. O'Reilly and Associates, 1997.

Sugiyama and Misue 1991
Visualization of Structural information: Automatic Drawing of Compound Digraphs, K. Sugiyama and K. Misue. IEEE Transactions on Systems, Man, and Cybernetics, vol. 21. 1991.

Wall 1996
Programming Perl (2nd Edn.) L. Wall, T. Christiansen and R. Schwartz. O'Reilly and Associates, 1996.

About this document ...

CAPE: A programming environment for a new generation of Knowledge Based Systems

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 paper.tex.

The translation was initiated by Robert Inder on 5/7/1999


Robert Inder
5/7/1999