RETRIEVING EMAILS VIA TRADITIONAL PSTN TELEPHONES, AN H.323 SERVICE
Penton JB, Terzoli A, Wentworth P Department of Computer Science
Rhodes University
Grahamstown, 6140, South Africa Email: [email protected]
Abstract
The service described in this paper enables end users to dial in via a regular touch- tone phone, enter in their login-id and password, retrieve their email messages and have these messages read to them over a traditional PSTN (Public switched Telephone Network) phone. After dialing up the system, the user interacts with it only through the touch-tone keys on a regular telephone. The service side of this system receives these tones via the H.323/ISDN (Integrated Services Digital Network) gateway and forwards them, using the H.323 protocol, to the ‘heart’ of the service, which is implemented as a callable H.323 endpoint. This endpoint supports IMAP (Internet Mail Application Protocol) and Text-to-Speech interfaces, which together with H.323, are the fundamental components of this service.
The system interacts with an IMAP mail server. It authenticates the user's login name and password on that server, provides the user with information on the number of messages they have received, allows the user to listen to message headers, the text of messages and delete the message. The user may also navigate forwards, backwards or to a specific message. One important feature is that the user may interrupt the reading of any message at any point while it is being read back by the system and cancel, delete or back-up the reading of the message.
I INTRODUCTION
The system developed is not based on any single new technology. Rather, it brings together a collection of existing technologies to create a new functionality for users. Its purpose is to bring mobility to email. Email is no longer restricted to a computer but can be obtained from a touch-tone phone.
A number of private companies have begun to develop and market similar systems. These companies are all trying to provide a simple, single integrated solution for handling a variety of
different communication services - voicemail, email and in some cases fax.
II ARCHITECTURE
The complete system deployed is illustrated in figure 1.
III USER INTERFACE
ISDN/H.323 Gateway
PictureTel
Conventional Telephone
users
Mail Server H.323 email reader endpoint Email reading service
components
The ‘heart of the service’ is basically a callable H.323 endpoint (H.323 email reader in Figure 1) with a large amount of added functionality. Such functionality includes the integration of enabling technologies to
1. Connect to IMAP mail servers
2. Generate audio files from text files using text-to-speech systems
Rhodes University’s IMAP mail server is used as the default mail server to connect to when users connect. A few simple additions to the code would enable the service to cater for multiple IMAP mail servers.
To enable PSTN telephone users to access the H.323 environment on our Ethernet network the H.323/ISDN gateway is required to bridge the gap between these different communication networks.
In addition to the gateway an H.323 gatekeeper is also required to be able to identify which H.323 endpoint the PSTN telephone user is trying to call.
This is similar to extension numbers on PABX’s.
For example, there could be a number of H.323 services available to users of the PSTN. Such services could be identified with their associated extension number. So when a PSTN user connects to the gateway he/she could dial 1000 for the email reader service and 1001 for the alarm clock service.
The interaction between the gateway and gatekeeper will forward the call to the appropriate H.323 service/endpoint.
II USER INTERFACE
One of the main design challenges of this service was to create a user interface for a technology that normally uses a full computer keyboard and monitor in a telephone handset. We needed to design an interface that was consistent and easy to use and at the same time powerful.
A major difficulty here was obtaining the user's log- in ID and password. The reason is that at Rhodes University, where the service was deployed, usernames consist of alphanumeric strings like g97p5142. The system was also designed such that the user is required to enter their full log-
information in order to be authenticated. Although this is not the most user-friendly option, it provides a high level of security.
This required that our system supports the mapping of all the characters that are acceptable as a login ID and password characters into a telephone touch-tone sequence. The following implementation attempts to use a simple way of mapping phone keys to
keyboard characters. This mapping is based on the way the keys on most cellular phones are mapped to alphanumeric characters.
The Telephone keys are mapped as follows:
• A lower case letter will be mapped to the following:
o The touch-tone key where the letter appears. For example: a⇒2, b⇒2, d⇒3.
o "1", "2",“3” or “4" will be pressed to differentiate among the letters that appear in each key.
o For example: "21" will be used for a, "22" for b and "23" for c.
• An upper case letter will be represented using:
o The first 2 keys that are used for the lower case of this letter, followed by a "1" key, which will represent upper case.
• Since punctuation can be used for passwords, "." will be represented by "*".
• A number in the password is mapped to the same number.
• All characters, numbers, etc. are terminated by the * key.
• Finally, The end of a sequence of an alphanumeric string is terminated by the # key i.e. log-in names and passwords must be terminated with a #.
The following table summarizes the mapping from keyboard characters to touch-tone telephone keys:
1 1* m 61* I 431*
2 2* n 62* J 511*
3 3* o 63* K 521*
4 4* p 71* L 531*
5 5* q 72* M 611*
6 6* r 73* N 621*
7 7* s 74* O 631*
8 8* t 81* P 711*
9 9* u 82* Q 721*
0 0* v 83* R 731*
a 21* w 91* S 741*
b 22* x 92* T 811*
c 23* y 93* U 821*
d 31* z 94* V 831*
e 32* A 211* W 911*
f 33* B 221* X 921*
g 41* C 231* Y 931*
h 42* D 311* Z 941*
I 43* E 321* . **
j 51* F 331*
k 52* G 411*
l 53* H 421*
Table 1 Keyboard to touch-tone mappings For example, a student with log-in name g97p5142 would enter 41* 9* 7* 71* 5* 1* 4* 2* #
g 9 7 p 5 1 4 2 At each point in the user's interaction with the system they are given only certain options to choose from. For example, before retrieving any messages the user must first pass the authentication
procedure. The system thus needs some state information maintained and so we have modeled the service as a state machine. For example if the machine is playing a message it is in a ‘playing’
state. This means that the system expects the user to either save or delete the currently playing message;
no other action is permitted. Three states were defined for the system and they are as follows:
1. Authenticating 2. Playing 3. Idle
When in the ‘authenticating’ state the system is gathering the user’s username and password. This state also includes the procedure of authentication with the IMAP mail server. A successful
authentication will result in the machine proceeding to the ‘idle’ state. This is the state where the user may do one of the following:
1. Jump to a specific message number by typing the corresponding message number followed by hash
2. Play the next unread message by pressing the # key
3. Play the previous message by pressing the * key.
If successful, the system proceeds to the ‘playing’
state and begins to read the selected message to the user. At this stage the system only accepts one of the following two options:
1. Save the current message by pressing 9.
2. Delete the current message by pressing 7.
If successful, the machine will return to the ‘idle’
state and await indication of the next message to be played.
There are various voice prompts that guide the user through their interaction with the system to make the experience as simple as possible.
Security Considerations:
The security design goal is to allow each user to access his/her own messages and to avoid anybody else accessing them. Since mail message security is provided by the user's login name and password, these must be entered at the beginning of each email by phone session for the user to retrieve the
messages. Each user is allowed three attempts to log-in, after which point the system will
automatically disconnect them. This is to prevent attempts at password guessing. Also, to make this harder the system only reports limited information back when a user enters invalid data, it will not specifically say if the problem is with the login name or with the password.
Security Limitations
The problem of having somebody listen to the user's phone line can compromise the security of the mail messages. No encryption can be done for the communication in this media, since no decryption tool is available at the phone end.
VII SYSTEM IMPLEMENTATION
The service is implemented as a callable H.323 endpoint. This means that the service is available to any H.323 terminals in addition to PSTN
telephones. One difference is that when using a PSTN phone, the H.323/ISDN gateway is required to bridge the PSTN telephone network and the H.323 computer network.
The system can be logically divided into four individual components:
1. H.323/ISDN Gateway 2. H.323 API
3. IMAP API
4. Text-to-speech API H.323/ISDN Gateway:
This gateway is an open source application developed by the University of Carlos, Spain. It
provides connectivity to the PSTN via a number of ISDN interfaces. This is a fundamental component of this service as it enables the PSTN telephone user to get access to the service residing on the H.323 computer network.
The gateway itself resides on a PC that has an Ethernet interface and a number of ISDN interfaces.
When a PSTN telephone user calls our service, the gateway acts as an operator. The user must dial the extension number, using touch-tones, of the desired service (in this case the email reader service). With the help of an H.323 gatekeeper the gateway will forward the call to the corresponding service on the H.323 network.
H.323 API:
The H.323 protocol stack used in this development is an open source solution by the OpenH323 Project, founded by Equivalence.
The service is implemented as a callable H.323 endpoint. It can accept multiple simultaneous calls so that more than one user can have their emails read to them simultaneously.
IMAP API:
This API is written in C and is used to access Rhodes’ IMAP mail server. It allows the service to query the IMAP server with numerous options and download selected messages, headers and bodies.
Text-to-speech API:
The text-to-speech API was developed at the University of Edinburgh. It is written in C++ with a Scheme-based command interpreter for general control.
System Workings:
Once a user initiates a call with the service they are prompted for their username and password. If authentication is successful with the IMAP mail server, the user’s mailbox is queried for all the unread messages. The headers for these messages are then downloaded using the IMAP API. Often these headers contain a lot of information that is not essential to read to the user and also difficult for the speech synthesis system to synthesize. As a result
the headers are parsed to create a stripped version of the header containing only the From and Subject fields. These stripped versions are then synthesized into voice messages using the text-to-speech API.
The audio file is played out to the user through the H.323 RTP (Real-time Protocol) channels. At this stage the user indicates which of the available messages he/she wants to have read. On entering a valid message number the system downloads the full message, synthesizes it, and plays it to the user.
At any point during or after the message is finished being read, the user can either delete or save the message. A saved message leaves the message in the mailbox and marks it as being read. A deleted message is flagged for deletion on the mail server, where it is later purged from the user’s mailbox.
When the user hangs up, the system removes all downloaded text files and synthesized audio files associated with that user, the connection to the IMAP server is closed, and the usual processes of H.323 call termination are followed.
Problems:
VII REFERENCES
[1]ITU-T Recommendation H.323 (1998) "Packet Based Multimedia Communications Systems."
[2]ITU-T Recommendation H.225.0 (1998) "Call Signaling Protocols and Media Stream Packetization for Packet Based Multimedia Communications Systems."
[3]ITU-T Recommendation H.245 (1998) "Control Protocol for Multimedia Communication."
[4]ITU-T Recommendation H.246 (1998)
"Interworking of H-Series Multimedia Terminals."