MHEG-5: An Overview

December 6, 1995

Robert Joseph, Ph.D.,

US West Technologies, rjoseph@advtech.uswest.com

Jörgen Rosengren, M.Sc.

Philips Research, rosen@prl.philips.nl

Table of Contents

  1. Introduction
  2. Description of the MHEG-5 Functionality
    1. Basic concepts
    2. Interaction within a Scene
    3. Availability; Running Status
    4. Interactibles
    5. Object Sharing Between Scenes
    6. Object Encoding
    7. Conformance
  3. Detailed Description of the MHEG-5 Classes
    1. Link and Action
    2. Procedure
    3. Palette, Font, and CursorShape
    4. Variable
    5. TokenGroup, TemplateGroup and List
    6. Stream
    7. Audio
    8. Rectangle
    9. Bitmaps
    10. Video and RTGraphics
    11. Text; HyperText; EntryField
    12. Button
    13. Slider
  4. Acknowledgements

Introduction

The MHEG-5 standard was developed to support the distribution of interactive multimedia applications in a client/server architecture across platforms of different types and brands. MHEG-5 defines a final-form representation for application interchange. The applications consist mainly of declarative code, but provisions for calling procedural code have been made. MHEG-5 applications need only be authored once and can run on any platform that is MHEG-5 compliant. This paper gives the reader an overview of the current Draft International Standard (DIS) of the MHEG-5 standard (ISO/IEC DIS 13522-5, dated December, 1995). This is an informal description; in the event of a discrepancy between this text and the actual DIS text, the latter should be given authority.

The global scope of MHEG-5 is to define the syntax and semantics of a set of object classes that can be used for interoperability of multimedia applications across minimal-resources platforms. The developed applications will reside on a server, and as portions of the application are needed, they will be downloaded to the client. In a broadcast environment, this download mechanism could rely, for instance, on cyclic rebroadcasting of all portions of the application. It is the responsibility of the client to have a runtime that interprets the application parts, presents the application to the user, and handles the local interaction with the user.

Figure 1: MHEG application on a server, with parts downloaded to the client

Some of the application types that MHEG-5 is targeted for are: video on demand, near video on demand, news on demand, navigation, and interactive home shopping.

The major goals of MHEG-5 are:

  1. To provide a good standard framework for the development of client/server multimedia applications intended to run on a memory-constrained Client.
  2. To define a final-form coded representation for interchange of applications across platforms of different versions and brands.
  3. To provide the basis for concrete conformance levelling, guaranteeing that a conformant application will run on all conformant terminals.
  4. To allow the runtime engine on the Client to be "small" and "easy" to implement.
  5. To be free of strong constraints on the architecture of the Client.
  6. To allow the building of a wide range of applications. This means also providing access to external libraries. An application using external libraries will only be partly portable.
  7. To allow for application code that is guaranteed to be "safe" in the sense that it cannot harm other code in the Client, nor put the Client in an abnormal state.
  8. To allow automatic static analysis of (final-form) application code in order to help insure bug-free applications and minimize the debugging investment needed to get a robust application. Note that this analysis should be possible to implement independently of the authoring environment.
  9. To promote rapid application development by providing high-level primitives and provide a declarative paradigm for the application development.

The MHEG-5 work item was created in November 1994, in order to meet the urgent need of the international community for a standardized application format adapted to minimal resources platforms. In December 1995, it reached the status of Draft International Standard (DIS), and it is scheduled for promotion to International Standard in July 1996.

Description of the MHEG-5 Functionality

This section describes the functionality that is provided by the MHEG-5 standard in terms of the MHEG-5 classes.

Basic concepts

An MHEG-5 application is made up of Scenes and objects that are common to all Scenes. A Scene contains a group of objects, called Ingredients, that represent information (graphics, sound, video, etc.) along with localized behavior based on events firing (e.g., the 'Left' button being pushed activating a sound). At most one Scene is active at any one time. Navigation in an application is done by transitioning between Scenes.

An MHEG engine has the ability to display visual objects in a rectangular coordinate system with a fixed size, and to play audible objects. A user input device (e.g., remote control, game controller, etc.) can be used with the runtime to allow interaction with the applications.

The most important classes in an MHEG applications are:

An Application object contains objects that are shared between several Scenes. A Scene is a collection of objects that are intended for coordinated presentation. Both Applications and Scenes can contain objects of the class Ingredients. The Ingredient class has the subclasses Link, Procedure, Variable, Presentable, Palette, Font, and CursorShape.

A Link object consists of a LinkCondition and a LinkEffect. The LinkEffect, which is a list of elementary actions, is executed when the LinkCondition becomes true. A LinkCondition is always triggered by the occurrence of an event.

These are objects that can be presented to the user as a part of a Scene. Presentables are either visible or audible, or combine visible and audible objects. The Presentable class has the subclasses Visible, Audio, Stream, TokenGroup, and TemplateGroup.

These are objects that can be displayed on the screen. The various types of Visibles are Lineart (including Rectangle), RTGraphics, Bitmap, Button (including SwitchButton, Hotspot, and PushButton), Slider, Text (including EntryField and HyperText), and Video. Some of these items allow the runtime to control the user interaction with them. These objects are called Interactible. For example, the EntryField object is an interactible.

Variable object can hold a value (e.g., an integer or a text string). They are typically used to store the state of an object or to communicate with the outside world, for example with a server.

Procedure objects encapsulate calls to procedural code, the format of which is not directly defined by MHEG-5. The Procedure objects make it possible to escape from the MHEG-5 paradigm to perform a task that is better expressed in procedural code.

Below is a very simple scene that displays a bitmap and text. The user can press the 'Left' button on the input device and a transition is made from the current scene, InfoScene1, to a new scene, InfoScene2.

Figure 2: InfoScene1 is an example of a scene with presentable information and a simple link.

The pseudo-code from the above scene may look like the following:

(scene:InfoScene1
    <other scene attributes here>
    group-items:
      (bitmap: BgndInfo
          content-hook: #bitmapHook
          original-box-size: (320 240)
          original-position: (0  0)
          content-data: referenced-content: "InfoBngd"
       )
       (text:
           content-hook: #textHook
           original-box-size: (280 20)
           original-position: (40 50)
           content-data: included-content: "1. Lubricate..."
       ) 
       links:
           (link: Link1
               event-source: InfoScene1
               event-type: #UserInput
               event-data: #Left
               link-effect: action: transition-to: InfoScene2 
            )
)

Interaction within a Scene

The MHEG application is event-driven, in the sense that all actions are called as the result of an event firing a link. Events can be divided into two main groups: synchronous events and asynchronous events. Asynchronous events are events that occur asynchronously to the processing of Links in the MHEG engine. These include timer events and user input events. An application area of MHEG-5 (such as DAVIC) must specify the permissible UserInput events within that area. Synchronous events are events that can only occur as the result of an MHEG-5 action being targeted to some objects. A typical example of a synchronous event is #IsSelected, which can only occur as the result of the MHEG-5 action Select being invoked. Synchronous events are always dealt with immediately; asynchronous events are queued.

The mechanism at the heart of the MHEG engine, therefore, is the following:

  1. After a period of of idleness, an asynchronous event occurs. The event can be a user input event, a timer event, a stream event, or some other type of event.
  2. Possibly, a link that reacts on the event is found. This link is then fired. If no such link is found, the process starts again at 1.
  3. The result of a link being fired is the execution of an action object, which is a sequence of elementary actions. These can change the state of other objects, create or destroy other objects, or cause events to occur.
  4. As a result of the actions being performed, synchronous events may occur. These are dealt with immediately, i.e., before processing any other asynchronous events queued.

When all events have been processed, the process starts again at 1.

Availability; Running Status

Before doing anything to an object, the MHEG-5 engine must prepare it. Preparing an object typically entails retrieving it from the server, decoding the interchange format and creating the corresponding internal data structures, and making the object available for further processing. The preparation of an object is asynchronous; its completion is signalled by an IsAvailable event.

All objects that are part of an application or a scene have a RunningStatus, which is either true or false. Objects whose RunningStatus is true are said to be running, which means that they perform the behaviour they are programmed for. More concretely:

Interactibles

The MHEG-5 mix-in class Interactible groups some functionality associated with user interface-related objects (Slider, HyperText, EntryField, Buttons). These objects can all be highlighted (by setting their HighlightStatus to True). They also have the attribute InteractionStatus, which, when set to true, allows the object to interact directly with the user, thus bypassing the normal processing of UserInput events by the MHEG-5 engine. Exactly how an Interactible reacts when its InteractionStatus is true is implementation-specific. As an example, the way that a user enters characters in an EntryField can be implemented in different ways in different MHEG-5 engines.

At most one Interactible at a time can have its InteractionStatus set to True.

Visual Representation

For objects that are visible on the screen, the following rules apply

  1. Objects are drawn downwards and to the right of their position on the screen. This point can be changed during the life cycle of an object, thus making it possible to move objects.
  2. Objects are drawn without scaling. Objects that do not fit within their bounding box are clipped.
  3. Objects are drawn with "natural" priority, i.e., on top of already existing objects. However, it is possible to move objects to the top or the bottom of the screen, as well as putting them before or after another object.
  4. The screen can be frozen, allowing the application to perform many (possibly slow) changes and not update the screen until it's unfrozen.

Object Sharing Between Scenes

It is possible within MHEG-5 to share objects between some or all scenes of an application. As an example, this can be used to have variables retain their value over scene changes, or to have an audio stream play on across a scene change. Shared objects are alway contained in an Application object. Since there is always exactly one Application object running whenever a scene is running, the objects contained in an application object are visible to each of its scenes.

Object Encoding

The MHEG-5 specification does not prescribe any specific formats for the encoding of content. For example, it is conceivable that a Video object is encoded as MPEG or as motion-JPEG. This means that the group using MHEG-5 must define which content encoding schemes to apply for the different objects in order to achieve interoperability.

However, MHEG-5 does specify a final-form encoding of the MHEG-5 objects themselves. This encoding is an instance of ASN.1, using the Basic Encoding Rules (BER).

Conformance

The issue of conformance, though of crucial importance, has not yet been extensively addressed by the MHEG committee. It is expected that a conformance definition for the standard will have to be drafted, probably lagging the standard itself by some period of time.

Detailed Description of the MHEG-5 Classes

This section describes certain MHEG-5 leaf classes in more detail. Classes that are described in detail above are left out.

Link and Action

Link objects express the dynamic aspects of an MHEG-5 application. A Link object consists of a LinkCondition and a LinkEffect. The LinkCondition, in turn, consists of an EventSourceCondition, an EventTypeCondition, and an EventDataCondition. A Link object can be fired as the result of an event occuring. An event always emanates from exactly one object (for example, an IsSelected event can be generated by a Button object). Some events also have a piece of data associated with them, called the EventData.

A Link fires if an only if the right type of event emanates from the right object and carries the right associated data. When a Link is fired, it executes its LinkEffect, which is an Action object.

MHEG-5 Action objects consist of a sequence of elementary actions. Elementary actions are comparable to methods in an object-oriented paradigm. The execution of an Action object means that each of its elementary actions are invoked sequentially.

As an example, consider the following Link, which transitions to another Scene when the character "A" is entered in the EntryField EF1.

Example:

(link: Link1
    event-source: EF1
    event-type: #NewChar
    event-data: 'A'
    link-effect:
        (action: transition-to: Scene2)
)

Procedure

As was noted above, a Procedure object is an abstraction of a piece of procedural code that performs a specific task. The task will typically be something that is not easy to express within the MHEG-5 paradigm. There are three types of Procedure objects: custom, remote, and script. Custom Procedure objects are used for calls to the runtime environment. Remote Procedure object are used to encapsulate an RPC mechanism. Script Procedure objects, finally, are used to encapsulate a piece of exchanged procedural code, that could be written in a scripting language.

Palette, Font, and CursorShape

These three classes are used to encapsulate fonts, color look-up tables and cursors, respectively. The objects of these classes are not used stand-alone, but rather referenced from objects of other classes. For example, the only way to use a Font object is to reference it from a Text object. The CursorShape class has a similar relationship with the Scene class, and the Palette class is used by classes that have colour attributes.

Note that MHEG-5 does not specify the actual encoding of the data structure representing a colour look-up table, font, or cursor.

Variable

As was noted above, Variables are objects that can be used to store a value. That value can be either an integer, a byte string, a boolean, a content reference, or an object reference. There are actions to set the value of variables and to test the value. Variables can also be used as parameters to actions; for instance, the TransitionTo action can take a variable as the parameter that denotes the Scene to transition to.

TokenGroup, TemplateGroup and List

These classes provide the functionality of logically grouping Visibles and other objects. The TokenGroup class can be used to implement simple jumping-highlight navigation. The TemplateGroup class does the same, but also supports the description of behaviour and content that is common to all members of the group in a so-called template. The List class, finally, is a TemplateGroup that also allows certain dynamic features, such as scrolling and adding or deleting members.

All these classes inherit an abstract mixin class, TokenManager, which has the ability to administer a token over a group of items. A typical use of this feature is to let the item that has the token be highlighted. TokenManager also implements a MovementTable, which makes it possible to translate UserInput events into movements of the token between the members of the group.

Stream

The Stream class provides the functionality of a multiplex of one or more audio or video objects that are to be presented in synchronization. It also supports the handling of stream events, which makes it possible to link an MHEG behaviour to a specific point in a stream, or to the occurrence of a specific, user-defined event.

Audio

The Audio class is used to encapsulate an audible object (such as an audio clip encoded in MPEG-2). The audio object can be either part of a stream or stand-alone.

Rectangle

Rectangles are objects which, when run, draw a rectangle on the screen. The line size and color as well as the fill color (if any) are attributes of the Rectangle object.
Example:

(rectangle: Rect1
    original-position: ( 0 0 )
    line-colour: 15
    box-size: ( 40 50 )
)

Bitmaps

Bitmaps are objects which, when run, draw a bitmap on the screen. A "bitmap", in this context, is any two-dimensional static picture, and therefore can be used also to represent for instance an MPEG-2 still picture.
Example:

(bitmap: BgndInfo
    content-hook: #bitmapHook
    content-data: referenced-content: "Info.bitmap" 
    box-size: ( 320 240 )
    original-position: ( 0 0 )
)

Video and RTGraphics

The Video and RTGraphics classes are used to encapsulate a visible object which changes in real time (such as a video clip encoded in MPEG-2, or a subtitle stream). Objects of both these classes can be either part of a stream or stand-alone. The difference between the classes is minimal: whereas a Video object has the action ScaleVideo, this is not true for the RTGraphics class.

Text; HyperText; EntryField

Text objects, when run, render a text string on the screen according to some attributes. The most important attributes are the Font and colours to be used. In addition, justification and more esoteric features can be specified. The text class also has two subclasses: HyperText and EntryField.

A HyperText object is different from the Text objects in that it supports the concept of hyperlinks, i.e. the possibility to associate words or groups of words with a link to, for instance, another page. More formally, it is possible to program an MHEG link to be fired when a particular hypertext link has been selected by the user.

An EntryField is a Text object which can be interacted with by the user to change the text it contains. In addition to the attributes of the Text class, it has attributes to control the type of input expected, whether the input is echoed to the user or not, and what the length of the input is expected to be.

Button

Buttons come in three varieties. All three varieties provide the functionality of rendering a button (of a certain type) and of changing the appearance of the button depending on the two internal attributes HighlightStatus (inherited from Interactible) and SelectionStatus. It is the responsibility of the application (not the MHEG-5 engine) to make sure that UserInput actions are coupled to state changes in a Button. This can be done using the MHEG-5 actions SetHighlightStatus and Select/Deselect/Toggle. The different types of buttons are:
Example:
(switchbutton: Switch1
    style: #radiobutton
    position: ( 50 70 )
    label: "On"
)

Slider

Sliders are objects which represent in a graphical form the relationship between a value and an interval. For instance, the value 1 in the interval [0,2] would be represented by an indicator which is positioned in the middle. The value represented by a slider can be changed by the user. Since it is an interactible, it also has the HighlightStatus attribute, which makes it possible for the engine to control the highlighting of the object. Sliders come in different styles, i.e., thermometer, normal, and proportional.
Example: 
(slider: Slider1 
    box-size: ( 40 5 )
    original-position: ( 100  100 )
    max-value: 20
    orientation: #right
)

Acknowledgements

The authors of this paper would like to thank all the people on the MHEG ISO committee. We would especially like to thank the chairman of the MHEG-5 group, Klaus Hofrichter, and the MHEG-5 editor, Christine Théot.