
You are here: MyURC.org > publications > Universal Control Hub & Pluggable User Interfaces
Universal Control Hub & Pluggable User Interfaces
Gottfried Zimmermann
The Universal Plug and Play forum has developed specifications for device classes, so-called "Device Control Protocols" (DCPs). Each DCP defines a common (machine-level) interface for a class of UPnP devices with embedded services in terms of mandatory and optional actions and state variables. Control points that have foreknowledge of the DCP that a device is using, can thus easily make use of its interface.
However, a DCP doesn’t tell anything about how a control point should present a UPnP device and its services to a human user. The UPnP specifications explicitly stay away from defining or adopting a framework for user interfaces. The presentation page URL which is an optional part of a UPnP device description is a hint to the fact that there needs to be some common mechanism of specifying a remote user interface for a UPnP device. However, the meaning of this presentation page is unclear and must rely on proprietary solutions, and is therefore hardly used in UPnP implementations.
The Universal Remote Console (URC) framework, specified as a family of ANSI standards (ANSI INCITS 389-2005 through 393-2005), can fill the user interface gap in UPnP. It defines a "Protocol to Facilitate Operation of Information and Electronic Products through Remote and Alternative Interfaces and Intelligent Agents". Despite the title its purpose is to define a user interface layer on top of any existing interoperability framework for device discovery, control and eventing. It does so by defining a family of XML-based languages for the specification of cross-device user interfaces. The most prominent component of the URC framework is the "User Interface Socket", a common low-level denominator for all user interfaces that can be used for a device. High-level, polished user interfaces can be "plugged" into the socket, thus reusing application-specific code contained in the User Interface Socket.
This paper describes how the Universal Remote Console framework can be combined with the UPnP DCP approach, with the "Universal Control Hub" being the core component of the proposed architecture. In a nutshell, the Universal Control Hub (UCH) allows for both thin UPnP devices and thin user interface clients. It provides the UPnP architecture and its DCP specifications with a common user interface layer, thus allowing UPnP devices to project their user interfaces on remote clients that they have no knowledge of.
1. Proposed Architecture

Figure 1: Universal Control Hub Architecture
In the center of the proposed architecture (see figure 1) is the Universal Control Hub (UCH) which acts as a gateway between a user interface client ("UI client") and any UPnP device that it wants to access and control (here called "back-end device"). The UCH talks the UPnP-defined DCPs for communicating with the back-end devices. For example, it interacts with a UPnP enabled TV or DVR through the AV Device Control Protocol; and it interacts with a UPnP thermostat through the HVAC DCP. As mentioned previously, these UPnP DCPs do not include any mechanism for specifying a user interface or the remoting of them. (One exception is the RemoteUI DCP which is used for a different purpose – see later). Thus a manufacturer of a device can deploy very simple devices that don’t know anything about user interfaces, and solely rely on being controlled through their corresponding DCPs.
On the other end, the UI clients don’t know anything about the DCPs to be used for controlling the back-end devices. Some of them may use UPnP RemoteUI to discover the UCH and to pick the remoting protocol of their choice, thus acting as RemoteUI client (or at least as control point to the RemoteUI server). Others may know how to initiate a specific remoting protocol with the UCH by some other means (e.g. through setup). In any case, a UI client finds a remoting protocol ("remoting channel") on the UCH through which it can remotely access and control any one of the back-end devices.
Figure 1 lists some remoting channels as examples:
- The Scalable Vector Graphics (SVG) format of the W3C offers a scalable (vector-based) format for rendering interactive graphics. It may be conveyed over HTTP (including the back-channel for control commands). In general, SVG has somewhat extensive computing demands on the rendering device, but there is also a light SVG version ("SVG Tiny") for smaller rendering devices.
- HTML is the prevalent user interface standard in the World Wide Web. It
is well-known and many tools exist for the development of HTML-based user
interfaces. Here HTML is paired with HTTP for UI serving and as a
back-channel. It might be useful to further divide the HTML remoting
channel up into at least 3 different variations:
- "HTML text" would be the fall-back option for any UI client, only containing textual elements for maximal device independence;
- "Dynamic HTML" (DHTML) would allow any kind of HTML plus JavaScript scripting;
- "CEA-2027-A" would be compatible with CEA 2027-A clients).
- Macromedia Flash is a widely used framework for rendering dynamic content over the Web. Flash Remoting, a proprietary protocol built on top of Flash, allows for a distributed architecture in which a thin UI client binds to a remote application server. However, for the user interface developer this remote binding is transparent.
- XRT2 is a light-weight binary remoting protocol. With XRT2 even ultra-thin devices with limited graphical capabilities can be used as UI clients.
- It is possible to remotely control devices through audio only, over a phone line. A user interface serving this channel could be specified in the W3C’s VoiceXML standard language. Of course, this requires that the UCH have a phone link built in.
- Basically any user interface description (standardized or proprietary) may be used as a remoting channel. The architecture does not impose any restriction in this regard. As an additional example (not included in figure 1) a "native Pocket PC" channel could convey binary code for the rendering of a graphical user interface on a Pocket PC based PDA.
Remoting channels are in most cases offered to UI clients by URI. (The phone line is an exception.) The URI scheme denotes the remoting channel, but may also contain a session identifier or other information about the state of a control interaction. For example, the URI "http://192.168.1.1/svg" may serve a portal-style SVG interface that lets the user pick a device from a list of available back-end devices. For a UI client that has already picked a channel and back-end device through the UPnP RemoteUI procedure, the URI "http://192.168.1.1/svg/dvr" may immediately provide an SVG interface for the DVR.
One should not think of a channel as delivering one static version of a user interface. Server-side adaptation mechanisms may be built into the channel protocol that may facilitate delivering user interfaces that are adapted to the UI client’s properties such as screen size and user input capabilities. For example, when a UI client requests a user interface over HTTP, the HTTP header may bear information about the device’s and the user’s preferences. Also, some user interface descriptions allow client-side adaptations such as scaling and reformatting.
Some UI clients are only capable of using one remoting channel; others could use either one of multiple channels. For example, a desktop or laptop computer can easily use the following remote UI channels: SVG on HTTP, HTML on HTTP and Flash Remoting. A PDA can use HTML on HTTP, and Flash Remoting. A TV set with a remote control can use HTML on HTTP or Flash Remoting, depending on its software capabilities. A cell phone could use either one of the Flash Remoting, XRT2, or VoiceXML on phone line. And a plain old telephone could use voice-based VoiceXML user interfaces for remote control.
So far we have looked at the UCH as a "black box" which somehow bridges between user interface protocols on the UI client-side and UPnP DCP based protocols on the back-end. But how does the UCH generate the user interface descriptions for serving the RUI channels? Does it use some pre-defined documents that are hard-coded into such a device? The answer is "YES" and "NO". "YES" because there is some part of a user interface (the "User Interface Socket") that is pre-defined for any DCP-standardized UPnP device. "NO" since the manufacturers of the back-end devices have a great interest in being able to project "their user interfaces" (bearing their corporate identity) onto the UI clients.
The User Interface Socket is the part of the remotable user interface that doesn’t change whatever remoting channel is used to convey the user interface. It is the "common denominator" of all user interfaces for a specific back-end device, defined by its manufacturer. This includes all types of user interfaces with any output modality (visual, auditory, tactile, or any combination) and any input modality (keyboard, mouse, touch-based, stylus, hand-writing, gesture, etc., or any combination). A User Interface Socket contains a flat set of low-level user interface elements (called "socket elements") that provide a synchronized communication channel to the back-end device and its current state. Socket elements add a logical layer on top of the DCP based constructs, that is closer to actual user interfaces than the UPnP DCP constructs are. It is easier for user interface developers to bind their widgets to UI Socket elements than to DCP actions and state variables of the back-end device. Socket elements are either variables, commands or user notifications. The description of the UI Socket (the "Socket Description") also specifies how socket elements depend on each other, for example that the "volume" variable can only be modified is "mute" is off.
For today’s UI client devices, the User Interface Socket would not provide enough information for constructing a nice-looking user interface. What’s missing are concrete instructions how to build the user interface, what widgets to use and how to arrange and structure them. Also, labels need to be provided for the UI Socket elements. Widgets, structure and layout is provided by a "Pluggable User Interface", a channel-specific user interface description that plugs into a particular User Interface Socket. In general, a manufacturer will provide for each of its products a User Interface Socket plus a set of Pluggable User Interfaces for the most common UI client types, and deploy them to a Resource Server. The Resource Server may be company-owned or provided by any other organization such as a consortium. Other parties may create complementary Pluggable User Interfaces and make them available through the same or other Resource Servers. A Universal Control Hub that encounters a particular back-end device will look for Pluggable User Interfaces for that back-end device, searching on any Resource Server on the Internet.
At this point it is important that there be a defined procedure for the UCH what Pluggable User Interface to use if there are multiple available. The UCH is part of an implicit contract between the back-end devices and the UI clients. The agreement is that if the manufacturer of a back-end devices provides a Pluggable User Interface for a specific remoting channel, this Pluggable UI is the default user interface to be rendered on the UI client when using that channel. Only if there is no user interface available from the manufacturer of the back-end device, or if for some reason it is not usable by the UI client or its user, user interfaces from other parties may replace the default one. For example, if the user understands only Japanese, but the manufacturer of the back-end device provides only European language user interfaces, a Japanese user interface that was created by a third party for that back-end device may fill in.
2. Sample Scenarios
To illustrate how this all works together, let‘s look at some example scenarios.

Figure 2: Sample scenario - Computer controlling TV
In the first scenario (figure 2), a user wants to use their desktop computer (as UI client) to control the TV in the living room which are both connected to the home network. The Universal Control Hub advertises itself as a UPnP RemoteUI server device. Since the computer is UPnP aware, it discovers the UCH and interacts with it through the RemoteUI DCP. Thus the computer finds out that the UCH provides an HTML/HTTP based RUI channel for controlling the TV. By following the corresponding URI for the HTML/HTTP channel, the computer opens a HTML/HTTP based controlling session on the UCH for TV control. The UCH is using an HTML/HTTP Pluggable User Interface for this session that it retrieved from the TV manufacturer’s Resource Server on the Internet.

Figure 3: Sample scenario - Same TV controlled by cell phone using Flash
The next scenario (figure 3) has the same TV being controlled by a cell phone instead of the computer. The cell phone cannot render HTML, but finds a remoting channel for Flash Remoting clients. Since it has a light Flash player installed, it follows the corresponding URI. From the initial list of back-end devices that the UCH projects onto the cell phone’s screen, the user picks the TV. The UCH retrieves and installs the Pluggable User Interface from the TV manufacturer for the Flash Remoting channel, if not already installed. Then the TV’s flash user interface is rendered on the cell phone, as defined by the TV manufacturer.
Instead of starting a new session with the cell phone controlling the TV, the same session could have migrated from the TV to the cell phone by some kind of session URI manipulation. For example, if the URI "http://192.168.1.1/html/tv?session=xyz" denotes a HTML/HTTP based control session to the TV, the URI "http://192.168.1.1/flashremoting/tv?session=xyz" could denote the same session but served through the Flash Remoting channel. When migrating a session from one channel to another, the Pluggable User Interface would be replaced but the User Interface Socket would remain.

Figure 4: Sample scenario - Same phone controlling thermostat, using VoiceXML
In the third scenario (figure 4), a user wants to set the temperature of the thermostat at home while driving home. Because she is driving, she cannot look at the cell phone’s screen. Instead she dials the UCH’s private phone number and hears: "Here is the Universal Control Hub. Say one of these: TV, DVR, thermostat." She says "Thermostat" and hears "Thermostat selected". She: "Set temperature to 68 degrees." The UCH responds: "Thermostat set to 68 degrees". The user hangs up.
In this scenario the UCH retrieves a Pluggable User Interface for the VoiceXML channel from the thermostat manufacturer’s Resource Server, and binds it to the User Interface Socket for the UPnP enabled thermostat. A VoiceXML interpreter (with phone line connection) is acting as UI client.

Figure 5: Sample scenario - Aggregated Flash UI for TV and DVR, showing on TV
The last scenario (figure 5) illustrates how aggregated (compound) Pluggable User Interfaces may be used to project a single user interface comprising functions of multiple back-end devices. Here a TV is used to render a Flash Remoting based user interface for both the TV and the DVR (see figure 5). For example, this user interface could contain the volume slider for the TV and the channel selection list for the DVR.
The UCH finds an Flash Remoting based aggregated Pluggable User Interface for the TV and the DVR in the home network. In its list of available remoting protocols announced through the UPnP RemoteUI server service, it can now offer a URI for a "TV+DVR" user interface session based on Flash Remoting. The user can pick the session using the remote control of the TV, and thus have the aggregated user interface rendered on the TV screen.
3. How the Proposed Architecture Adds Value to UPnP
By proposing a Universal Control Hub as middleware layer between the DCP based back-end devices and the remoting protocol based UI client devices, we identify the following added values:
(1) The UCH provides a solution for device independence. Through its remoting channels it offers a set of diverse user interfaces that are tailored for specific UI client devices. A "device-independent HTML version" should be provided for any back-end device by its manufacturer. This HTML version may be used as a fallback option, when no tailored user interface exists for a particular UI client device. If written in a decent way, HTML code is suitable for almost any type of graphical user interfaces. Guidelines for writing "device-independent HTML" should be provided for developers of back-end devices.
(2) The UCH provides an open platform for Pluggable User Interfaces. This brings about the following features:
(a) The manufacturers of back-end devices can project their user interfaces onto UI client devices. The UCH acts as a broker of remotable user interfaces between the back-end device and a UI client device. Neither the UI client device nor the UCH need to be made by the same manufacturer.
(b) Easy internationalization (i18n) since Pluggable User Interfaces are easily provided as duplicates in different languages. Also, by outsourcing, third parties can be mandated to translate Pluggable User Interfaces and post them to a Resource Server.
(c) Simplified programming model for user interface designers designing for complex UPnP devices. Some UPnP DCPs are very complex and push the limits of what the UPnP architecture can achieve. For example, the AVTransport service template (part of the AV DCP) defines an evented state variable "LastChange" that provides a summary of other state variables’ value changes in the form of an XML document. Therefore a user interface designer would have to write XML parsing code to be able to trigger user interface updates based on a back-end device’s state change. The User Interface Socket would free the user interface designer from having to deal with XML parsing. Instead the socket layer provides a flat set of variables, commands and user notifications that the UI designer can bind its interface to. APIs for the User Interface Socket layer will exist for common user interface description languages
(3) The User Interface Socket model provides an open platform for task-oriented user interfaces. The Socket Description declares how individual UI Socket elements depend on each other. UI Socket elements are suitable for forming the leaf nodes of a task-model tree, with their parent nodes being tasks of various aggregation levels. Task-model trees could be published for device classes or combinations of device classes by the vendors of these devices, or by any third party. Even without a task-model tree, the option of using an aggregated Pluggable User Interface that binds to multiple UI Sockets, is a first step toward task-oriented user interfaces.
(4) The User Interface Socket model also provides an open platform for future usage scenarios, involving intelligent user agents and natural language interaction. It is expected that these kind of next-generation user agents will provide an answer to the simplicity challenge of consumer electronics. An intelligent agent will make the User Interface Socket the basis of its device and service assessment, with possible extensions of the Socket Description toward knowledge modeling and semantic Web technologies. By introducing the basic model of a User Interface Socket today, we benefit in multiple ways today and are also ready for the user interface technologies of tomorrow.
Appendix: Glossary of Main Components
User Interface Socket
- Based on ANSI INCITS 390-2005, a User Interface Socket is a functional user interface for a controlled device or service that can be rendered in any input and output modalities.
- It contains elements (variables, constants, commands or user notifications) which provide status information and input capabilities for a controlled device.
- The User Interface Socket provides a common model for pluggable user interfaces that can be used to access a controlled device. For user interface designers it hides the complexity of UPnP and its DCPs, providing a simplified model (API) that they can bind their user interface objects to.
- The User Interface Socket is the basis for an open platform for pluggable user interfaces on top of UPnP.
Pluggable User Interface
- A user interface description or implementation that binds to the elements of one or more User Interface Sockets.
- A Pluggable User Interface may be specified in any programming language or user interface description language, including HTML, SVG, Flash, Java code or any other binary code.
- However, the manufacturer of a controlled device must provide at least one Pluggable User Interface that works well for a wide range of controller devices. This particular Pluggable User Interface is provided as HTML code that conforms to a "device-independent" profile.
- When programming a Pluggable User Interface, its elements (or widgets) should access User Interface Socket elements for getting and setting their values, or for command invocation (this is called "binding").
Universal Control Hub
- The Universal Control Hub is a gateway between UPnP devices (controlled devices) that talk a DCP, and controller devices that talks different user interface protocols. This approach allows for light-weight controller devices and small-footprint controlled devices.
- The UCH allows a controller device to remotely access and control a UPnP device. It allows a UPnP device to project its remotable user interface on controller devices that it has no knowledge about.
- For discovery purposes, the UCH acts as RemoteUIServer device, letting the controller device pick whatever controlling protocol is suitable for it.
Resource Server
- A Resource Server provides Pluggable User Interfaces to Universal Control Hubs. Instead of building Pluggable User Interfaces into a product, its manufacturer deploys them to a Resource Server. Deployment and updating may occur even after the product has been shipped.
- Any party may deploy Pluggable User Interfaces to Resource Servers, but a UCH must prefer Pluggable User Interfaces from the manufacturer of a product to Pluggable User Interfaces from other sources, if suitable for a particular controller device and its user.
- For easy internationalization, a Resource Server may also provide Resource Sheets that contain labels, icons and help texts for a particular Pluggable User Interface.
- As a possible framework extension, a Resource Server may also provide User Interface Socket Descriptions (for UPnP devices that implement vendor-specific extensions or non-standard DCPs) and a description of how to map the elements of the User Interface Socket to the actions and events of the UPnP device description of the device.