Book description (p.vanbommel@cs.ru.nl)

Problem statement. Most collections of hyperdocuments have one or more of the following problems: (a) structured search for documents is not sufficiently supported; (b) proving properties (e.g. correctness criteria or integrity constraints) is difficult or impossible; (c) maintenance is not supported by formal mechanisms; (d) personalisation of information, or adaptation to user groups, is difficult or impossible. As these problems are mainly caused by the lack of suitable modelling techniques, this book is an important step towards the next generation of hyperdocuments in general, and web sites in particular. We have considered fundamentals in modelling (theory) and concrete modelling techniques (practice), and hope that further research in this area will contribute to Internet being fully integrated in society.

We will give an overview of the book. It consists of three parts: fundamentals of Internet information modelling, elaboration of modelling techniques, and additional topics. These three parts contain fourteen chapters. It is possible to start in a later chapter (e.g. in part II), without reading all earlier chapters (e.g. more theoretical chapters in part I).

Fundamentals of Internet information modelling

Part I of this book is about fundamentals and consists of four chapters. The focus of chapter 1 is semistructured data: modelling and querying web data. In this chapter a logic approach is presented, as algebraic approaches have already been defined in other literature. The logic approach uses a rule-based constraint language, having declarative and operational semantics (fix-point theory). This is illustrated by a case study for XML.

Chapter 2 extends the logic approach of chapter 1. Here verification of properties is considered: verifying web site properties using computational logic. The properties of web sites include what is usually called "integrity constraints". It is a well-known fact that we need to be able to specify as well as check integrity constraints for information systems. In the context of web-based systems, these topics have not yet received much attention. In the logic approach in this chapter, web sites are defined as hypergraphs.

It is important to treat the contents and properties of web sites using existing database theory. We then need mechanisms to generate hypertext views on databases. This is considered in chapter 3: design and analysis of active hypertext views on databases. In the chapter, an overview of existing approaches to database publishing is given. The specification of hypertext views is considered, along with a design methodology. Finally, specific kinds of views are introduced, including active and adaptive views (necessary for e.g. personalization of web sites).

Before going into the details of concrete modelling techniques in part II of this book, the focus of chapter 4 is the integration of several aspects considered in the first three chapters: an object-oriented hypermedia reference model formally specified in UML. In this model, formal constraints on hypermedia model elements are possible, such as invariants, pre-conditions, and post-conditions. These are expressed in terms of UML using the Object Constraint Language.

Elaboration of modelling techniques

Part II of this book is about concrete modelling techniques and contains six chapters. When a modelling technique for web-based information systems is defined, we should not forget that there is a lot of experience with modelling techniques for "normal" information systems. Therefore, in chapter 5 we consider systematic development of Internet sites - extending approaches of conceptual modelling. This extension of conceptual modelling aims at a number of new challenges, including full flexibility, support of tracing, and push-up content just-in-time. The site specification uses the following modelling concepts: stories, scenarios, scenes, dialogue steps, and media objects.

In the Webspace method in chapter 6, existing database techniques are used to define advanced search possibilities. This is done in three stages. First, multimedia web data is modelled; then extraction of meta-data is performed; finally, collections of documents are queried. Webspace models define concepts and allow for the derivation of document structures. Once such a structure has been derived, content and presentation functions may be added.

For the modelling of web data, the araneus data model (ADM) can be used. This model is discussed in chapter 7: specification of web applications with ADM-2. This chapter presents ADM, with some new extensions for the specification of dynamic aspects. A basic dynamic aspect is interaction, which is often done via web pages by activating links or buttons. This activation results in the execution of actions and thus special attention is given to the question how actions should be specified.

Although modelling of web data is essential in developing a web site, designing a suitable web interface is of major relevance as well. This is discussed in chapter 8 OO-H method: extending UML to model web interfaces. In the Object-Oriented Hypermedia (OO-H) method, diagrams and their populations are used for code generation. Several kinds of links are distinguished, such as internal, traversal, requirement, exit, and service links. Other attributes associated to links are e.g. visualization, user interaction, and application scope.

In chapter 9, special attention is given to construction of models for existing web pages, also called extraction: ontology extraction and conceptual modelling for web information. A distinction is made between data extraction and schema extraction. This approach works with HTML pages. These pages are annotated with tags such as verbs, nouns, and adverbs (part-of-speech tags). The conceptual model is based on extended entity-relationship models, where meta-data are stored in a relational database.

The final chapter of part II is called OODM - an object-oriented design methodology for development of web applications. Here, a comparison is made with Hypermedia Design Model, Relationship Management Methodology, Object-Oriented Hypermedia Design Model, and Object-Oriented Design Method for Hypermedia Information Systems. The chapter contains an elaborated case study, where a number of page classes is identified and elaborated.

Additional topics

In part III several additional topics are treated. The focus of chapter 11 is maintenance and testing: web application quality - supporting maintenance and testing. Attention is given to static analysis and transformations. A distinction is made between testing for pages, hyperlinks, definition-use, all-uses, and all-paths. In this context, generation of test cases and statistical testing are considered. For example, the structure of a web application can be tested by measuring the coverage of a set of test cases for a given set of features under consideration.

In chapter 12 the topics of personalization and performance are discussed: modelling data intensive web sites for personalization, integrity, and performance. The following processes of analysis are defined: data, user requirements, usability constraint, integrity constraint, and personalization provision analysis. A distinction is made between active and passive personalization. In discussing the performance, attention is given to ranking.

It is generally recognized that the Internet context is an excellent environment to work in communities. In the database area, this results in adaptive web-based database communities. In chapter 13, organizing and modelling is discussed, as well as advertising and querying database communities. Special attention is given to inter-community relationships, particularly when these relationships are changing. Then, monitoring and maintaining inter-community relationships become necessary. It is proposed to perform monitoring by means of statistics and agents providing recommendations for changing existing relationships or creating new ones.

In the final chapter of this book, several important concluding issues in building web applications are explained. This chapter is entitled designing hypertext and the web with the heart and the mind. The topic of internationalisation is addressed from the perspective of user's native language and cultural background. It is explained that cultural issues often may come disguised as e.g. illiteracy problems or user faults, rather than as surmountable cultural differences. Finally, several pressing ethical questions are addressed, such as "how do business rules apply to the Internet" and "who are the regulators and what activities do they regulate".