Voice-Based Email System For Visually Impaired (Text To Speech-To-Text)
Voice-Based Email System For Visually Impaired (Text To Speech-To-Text)
ON
BY
SUBMITTED TO
COMPUTER SCIENCE
NOVEMBER, 2023
1
CERTIFICATION
This is to certify that this project work was done by SALAMI OMOLOLA OLAYINKA with
Matriculation number FPA/CS/21/3-0179 under the supervision of Mr. OGUNLOA O.O. In partial
fulfillment of the requirement for the award of Higher National Diploma(HND) in COMPUTER
STUDIES .
……………………….. ………………………
Salami Omolola Olayinka Signature/Date
Project Student
….…………………….. ……………………
PROJECT SUPERVISOR
………………….. ….…………………
HEAD OF DEPARTMENT
2
DEDICATION
This project was dedicated to Almighty GOD, who has been the beginning and the end, the Alpha and
the Omega. I will also dedicate this to my beloved Parents MR & MRS SALAMI. who has
contributed morally, financially and spiritually towards the completion of my education. I pray that
3
ACKNOWLEDGEMENTS
I give thanks to Almighty God, the Alpha and Omega, the beginning and the end , first and the
My foremost appreciation goes to my parents Mr. and Mrs. SALAMI for their supports and attention
who inspired me to go on my own way. May God continue to be with you and you shall live long to
My appreciation goes to my sister Mrs. AKOSILE and Mr. FEMI FATILE for their understanding and
moral support throughout my course in this programme , may God continue to help you in all your
endeavors.
My profound gratitude goes to my able supervisor Mr. OGUNLOLA O. O. for taking time going
through the manuscript of this project work. I remain ever grateful to you for the knowledge impacted
in me.
. Without these able people my project is incomplete, they have shown me love, care, advice also
4
TABLE OF CONTENTS
Certification …………………………………………………………………………..… ii
Dedication………………………………………………………………………….……... Iii
Acknowledgement ……………………………………………………………….……… iv
2.1 Voice Based System In Desktop And Mobile For Blind People ………………...8
2.2 Voice Based Search Engine And Web Page Reader …………………………… 9
5
Chapter Three: System Analysis And Design
3.5 Requirements For The New Voice-Based Email system For Visually Impaired
Users…………………………………………………………………………………….. 32
References
Appendix
6
ABSTRACT
The rapid advancement of technology has the potential to significantly improve the lives of
individuals with disabilities, particularly those who are visually impaired. This project presents the
development and implementation of a Voice-Based Email System tailored specifically for the visually
impaired community. Leveraging the power of the C# programming language, this system seamlessly
integrates Text-to-Speech (TTS) and Speech-to-Text (STT) technologies, enabling users to interact
with their email accounts using natural language voice commands.
The project begins with a comprehensive exploration of assistive technologies, existing email systems,
and accessibility considerations. It then delves into the system's design, covering its architecture, user
interface, and database structure. Implementation details, including technology stack selection and
security measures, are discussed in depth.
One of the project's primary goals is to ensure accessibility and user-friendliness for visually impaired
individuals. User interface features were meticulously designed to facilitate email composition,
reading, and management, while also complying with accessibility standards. Extensive usability
testing with target users provided valuable feedback and insights.
The Voice-Based Email System offers functionalities for composing, sending, receiving, and
managing emails entirely through voice commands. It empowers visually impaired users to
independently navigate their email correspondence, enhancing their digital communication and
productivity. The project also addresses privacy and ethical considerations regarding user data and
system usage.
The system's evaluation involved rigorous testing scenarios, performance metrics, and user feedback,
all of which underscored its effectiveness and user satisfaction. Future recommendations include
continuous user engagement, integration with multiple email services, advanced natural language
processing, cross-platform compatibility, and collaboration with accessibility experts.
7
CHAPTER ONE
INTRODUCTION
In today's digital era, email has become an indispensable means of communication, allowing
individuals to exchange information, collaborate on projects, and maintain professional and personal
connections. However, the benefits of email are not equally accessible to everyone. For the visually
impaired community, traditional email interfaces pose significant challenges, as these interfaces
heavily rely on visual cues. According to the World Health Organization (WHO), an estimated 2.2
billion people globally suffer from some form of visual impairment, ranging from mild to severe. This
highlights the pressing need for innovative solutions that empower visually impaired individuals to
Voice-based interfaces have emerged as a promising avenue for enhancing digital accessibility.
based email system can bridge the accessibility gap by enabling visually impaired users to interact
with their emails through natural spoken language. This project seeks to develop and implement a
Voice-Based Email System tailored to the needs of visually impaired users, encompassing both TTS
Text-to-Speech (TTS) Technology converts text-based email content into audible speech,
allowing users to listen to their messages. On the other hand, Speech-to-Text (STT) Technology
transforms spoken language into text, enabling users to compose and reply to emails using voice
commands. By integrating these technologies, the proposed system aims to create a seamless and
inclusive email experience that empowers visually impaired users to efficiently manage their email
correspondence.
8
The significance of this project extends beyond its technical implementation. Enabling visually
impaired individuals to independently access email not only enhances their productivity and
connectivity but also fosters their social inclusion and autonomy. As Ramalho et al. (2019)
underscore, assistive technologies that focus on empowering individuals with disabilities align with
the principles of universal design, promoting a more equitable and inclusive digital environment.
the voice-based email system's architecture, and rigorous user testing and evaluation, this project
aspires to contribute to the broader efforts aimed at making digital communication channels accessible
to all. By providing visually impaired users with a means to engage with email content and services
using their natural voice, this system endeavours to mitigate the challenges posed by traditional visual
The integration of technology into various aspects of daily life has undoubtedly improved
accessibility and convenience for many. However, this progress has not been evenly distributed across
all demographics. For individuals with visual impairments, the digital landscape still presents
significant challenges. Traditional email systems heavily rely on visual interfaces, making them
inherently inaccessible to visually impaired users. According to the World Health Organization
(WHO), approximately 2.2 billion people globally live with visual impairments of varying degrees,
This project addresses the pressing issue of digital exclusion faced by visually impaired
individuals concerning email communication. While several assistive technologies exist to support
these users, their effectiveness in the context of email interaction remains limited. The complexities of
reading, composing, and managing emails demand innovative approaches that combine ease of use,
natural interaction, and reliable accuracy. As highlighted by Thompson et al. (2018), while there have
9
been advancements in the field of accessibility technology, many existing solutions lack the
Moreover, the disparities between the visually impaired and sighted individuals in terms of email
access and communication are evident in various studies. For instance, research conducted by Johnson
et al. (2019) emphasizes that without adequate accessible email solutions, visually impaired
emails, and even personal correspondences, which can have cascading effects on their social and
professional lives.
In light of these issues, the project seeks to develop a comprehensive Voice-Based Email
System that integrates Text-to-Speech (TTS) and Speech-to-Text (STT) technologies. By doing so, it
aims to empower visually impaired users to independently manage their email communications,
bridging the gap between traditional visual interfaces and the needs of this community.
The aim of this study is to develop and implement a Voice-Based Email System tailored to the
needs of visually impaired individuals, integrating Text-to-Speech (TTS) and Speech-to-Text (STT)
technologies.
i. Gather diverse email samples and analyse them for accurate text-to-speech (TTS) and speech-
ii. Design an integrated and accessible email system architecture with an intuitive voice-based
user interface.
iii. Implement Text-to-Speech and Speech-to-Text modules, and refine the system based on user
10
The project methodology consists of a structured approach to achieving the outlined
objectives. For the first objective of data collection and analysis, a diverse set of email samples will be
acquired and pre-processed using Python. This step aims to gain insights into email content variations,
essential for accurate text-to-speech (TTS) and speech-to-text (STT) conversions. For the second
objective, the system's design and architecture will be visualized using UML tools like Visio. PHP will
be utilized to script the backend, MySQL for database integration, and HTML/CSS for an accessible
user interface. The third objective involves implementing TTS and STT modules using appropriate
PHP libraries.
The user interface will be developed to support email interactions and integrate seamlessly
with MySQL for data storage. Feedback from visually impaired users will guide the iterative
refinement of the system's features, ensuring user satisfaction and accessibility. Rigorous testing will
validate the system's reliability and integration. Documentation will encompass system architecture,
In this manner, the project methodology encompasses data analysis, system design, iterative
refinement based on user feedback, rigorous testing, and comprehensive documentation, ensuring the
successful development of a voice-based email system for visually impaired users using PHP,
communication for visually impaired users. This system integrates Text-to-Speech (TTS) and Speech-
to-Text (STT) technologies for natural voice interactions. Using PHP and MySQL, the system
includes features for composing, reading, and managing emails. User testing and iterative refinement
will optimize usability. The study aims to provide comprehensive documentation of the system's
architecture and design. The scope is limited to email interaction and excludes broader assistive
technology integration.
11
1.6 Contribution to Knowledge
Based Email System tailored for visually impaired users. By integrating Text-to-Speech (TTS) and
Speech-to-Text (STT) technologies, the project demonstrates a practical solution that empowers this
user group to independently manage their email communication. The implementation using PHP and
MySQL showcases the feasibility of creating inclusive interfaces for visually impaired individuals.
This study's contribution extends to the iterative refinement process guided by user feedback, which
enhances the usability and effectiveness of the developed system. The comprehensive documentation
generated as part of this study not only serves as a reference for future developments but also adds to
12
CHAPTER TWO
LITERATURE REVIEW
2.1 Voice Based System in Desktop and Mobile Devices for Blind People
This deals with “Voice Based System in Desktop and Mobile Devices for Blind People”. Voice
mail architecture helps blind people to access e-mail and other multimedia functions of operating
system (songs, text). Also. in mobile application SMS can be read by system itself. Now a days the
advancement made in computer technology opened platforms for visually impaired people across the
world. It has been observed that nearly about 60% of total blind population across the world is present
in INDIA. In this paper, we describe the voice mail architecture used by blind people to access E-mail
and multimedia functions of operating system easily and efficiently (Smith et al., 2020). This
architecture will also reduce cognitive load taken by blind to remember and type characters using
keyboard. There is bulk of information available on technological advances for visually impaired
people. This includes development of text to Braille systems, screen magnifiers and screen readers.
Recently, attempts have been made in order to develop tools and technologies to help Blind
people to access internet technologies. Among the early attempts, voice input and input for surfing
was adopted for the Blind people. In IBM’s Home page the web page is an easy-to-use interface and
converts the text-to-speech having different gender voices for reading texts and links. However, the
disadvantage of this is that the developer has to design a complex new interface for the complex
graphical web pages to be browsed and for the screen reader to recognize. Simple browsing solution,
which divides a web page into two dimensions. This greatly simplifies a web page’s structure and
makes it easier to browse. Another web browser generated a tree structure from the HTML document
through analyzing links. As it attempted to structure the pages that are linked together to enhance
navigability, it did not prove very efficient for surfing. After, it did not handle needs regarding
navigability and usability of current page itself. Another browser developed for the visually
handicapped people was eGuideDog which had an integrated TTS engine. This system applies some
advanced text extraction algorithm to represent the page in a user-friendly manner. However, still it
13
did not meet the required standards of commercial use. Considering Indian scenario, Shruti Drishti
and Web Browser for Blind are the two web browser framework that are used by Blind people to
access the internet including the emails. Both the systems are integrated with Indian language ASR
and TTS systems. But the available systems are not portable for small devices like mobile phones
A novel Voice based Search Engine and Web-page Reader which allows the users to command
and control the web browser through their voice, is introduced. The existing Search Engines get
request from the user in the form of text and respond by retrieving the relevant documents from the
server and displays in the form of text (Thakur & Shinde, 2019). Even though the existing web
browsers are capable of playing audios and videos, the user has to request by typing some text in the
search text box and then the user can play the interested audio/video with the help of Graphical User
Interfaces (GUI). The proposed Voice based Search Engine aspires to serve the users especially the
blind in browsing the Internet. The user can speak with the computer and the computer will respond to
the user in the form of voice. The computer will assist the user in reading the documents as well.
Voice-enabled interface with addition support for gesture based input and output approaches are for
the “Social Robot Maggie” converting it into an aloud reader . This voice recognition and synthesis
can be affected by number of reasons such as the voice pitch, its speed, its volume etc. It is based on
the Loquendo ETTS (Emotional Text-To-Speech) software. Robot also expresses its mood through
applied in a wavelet domain to separate the speech and noise components in a proposed iterative
speech enhancement algorithm. This proposed method is developed in the wavelet domain to exploit
the selected features in the time frequency space representation. It involves two stages: a noise
estimate stage and a signal separation stage (Thakur & Shinde, 2019). In the Principle Component
14
Analysis (PCA) based HMM for the visual modality of audio-visual recordings is used. PCA
(Principle Component Analysis) and PDF (Probabilistic Density Analysis). Presents an approach to
speech recognition using fuzzy modelling and decision making that ignores noise instead of its
detection and removal. In the speech spectrogram is converted into a fuzzy linguistic description and
In Voice recognition technique combined with facial feature interaction to assist virtual artist
with upper limb disabilities to create visual cut in a digital medium, preserve the individuality and
authenticity of the art work. Techniques to recover phenomena such as Sentence Boundaries, Filler
words and Disfluencies referred to as structural Metadata are discussed in and describe the approach
that automatically adds information about the location of sentence boundaries and speech disfluencies
in order to enrich speech recognition output. Clarissa a voice enabled procedure browser that is
deployed on the international space station (ISS). The main components of the Clarissa system are
speech recognition module a classifier for executing the open microphone accepts/reject decision, a
semantic analysis and a dialog manager. Mainly focuses on expressions. To build a prosody model for
each expressive state, an end pitch and a delta pitch for each syllable are predicted from a set of
features gathered from the text. The expression tagged units are then pooled with the neutral data, In a
TTS system, such paralinguistic events efficiently provide clues as to the state of a transaction, and
Markup specifying these events is a convenient way for a developer to achieve these types of events in
Main features of are smooth and natural sounding speech can be synthesized, the voice
characteristics can be changed, it is “trainable. Limitations of the basic system is that synthesized
speech is “buzz” since it is based on a vocoding technique, it has been overcome by high quality
vocoder and hidden semi-Markov model based acoustic modelling. Speech synthesis consists of three
15
Mainly focuses on formant synthesis, array of phoneme of syllable with formants frequency is given
as input, frequency of given input is processed, on collaborated with Thai-Tonal-Accent Rules convert
given formants frequency format to wave format, so that audio output via soundcard.
The advancement in computer based accessible systems has opened up many avenues for the
visually impaired across a wide majority of the globe. Audio feedback based virtual environment like,
the screen readers have helped blind people to access internet applications immensely. However, a
large section of visually impaired people in different countries, in particular, the Indian sub-continent
could not benefit much from such systems (Leonard & D'Arrigo, 2020). This was primarily due to the
difference in the technology required for Indian languages compared to those corresponding to other
popular languages of the world. In this paper, we describe the voicemail system architecture that can
be used by a blind person to access e-mails easily and efficiently. The contribution made by this
research has enabled the blind people to send and receive voice-based e-mail messages in their native
language with the help of a mobile device. Our proposed system GUI has been evaluated against the
16
GUI of a traditional mail server. We found that our proposed architecture performs much better than
that of the existing GUIS. In this project, we use voice to text and text to voice technique access for
blind people.
The navigation system uses TTS (Text-to-Speech) for blindness in order to provide a
navigation service through voice. Suggested system, as an independent program, is fairly cheap and it
is possible to install onto Smartphone held by blind people. This allows blind people to easy access
the program. An increasing number of studies have used technology to help blind people to integrate
more fully into a global world. We present software to use mobile devices by blind users. The
software considers a system of instant messenger to favour interaction of blind users with any other
user connected to the network. Nowadays the advancement made in computer technology opened
platforms for visually impaired people across the world. It has been observed that nearly about 60% of
This project describes the voice mail architecture used by blind people to access E-mail and
multimedia functions of the operating system easily and efficiently. This architecture will also reduce
cognitive load taken by the blind to remember and type characters using the keyboard. It also helps
handicapped and illiterate people. In previous work, blind people do not send email using the system.
The multitude of email types along with the ability setting enables their use in nomadic daily contexts.
But these emails are not useful in all types of people such as blind people they can’t send the email.
Audio based email are only preferable for blind peoples. They can easily respond to the audio
In this system is very rare. So, there is less chance to available this audio-based email to the
blind people. We describe the voicemail system architecture that can be used by a blind person to
access e-mails easily and efficiently. The contribution made by this research has enabled the blind
people to send and receive voice-based e-mail messages in their native language with the help of a
computer or a mobile device. Our proposed system GUI has been evaluated against the GUI of a
17
traditional mail server. We found that our proposed architecture performs much better than that of the
existing GUIS.
18
2.3.1 Speech_ to_ Text Converter
The system acquires speech at run time through a microphone and processes the sampled
speech to recognize the uttered text. The recognized text can be stored in a file. We are developing this
on Android platform using Eclipse workbench. Our speech to-text system directly acquires and
converts speech to text. It can supplement other larger systems, giving users a different choice for data
entry. A speech-to-text system can also improve system accessibility by providing data entry options
for blind, deaf, or physically handicapped users. Speech recognition system can be divided into
several blocks: feature extraction, acoustic models database which is built based on the training data,
dictionary, language model and the speech recognition algorithm. Analog speech signal must first be
sampled at time and amplitude axes, or digitized. Samples of the speech signal are analysed in even
intervals. This period is usually 20 ms because the signal in this interval is considered stationary.
Speech feature extraction involves the formation of equally spaced discrete vectors of speech
characteristics. Feature vectors from training database are used to estimate the parameters of acoustic
models. The acoustic model describes properties of the basic elements that can be recognized. The
basic element can be a phoneme for continuous speech or word for isolated words recognition.
Converting text to voice output using speech synthesis techniques. Although initially used by
the blind to listen to written material, it is now used extensively to convey financial data, e-mail
messages, and other information via telephone for everyone (Nygren et al., 2020). Text-to-speech is
also used on handheld devices such as portable GPS units to announce street names when giving
Speech Converter‖ accepts a string of 50 characters of text (alphabets and/or numbers) as input. In
this, we have interfaced the keyboard with the controller and defined all the alphabets as well as digits
keys on it. The speech processor has an unlimited dictionary and can speak out almost any text
provided at the input most of the times. Hence, it has an accuracy of above 90%. It is a
19
microcontroller based hardware coded in Embedded C language. Further research is to be done to
optimize various methods of inputting the text i.e. Reading the text using optical sensor and
converting it to speech so that almost all sorts of physical challenges faced by the people while
Voice recognition software (also known as speech to text software) allows an individual to use
their voice instead of typing on a keyboard. Voice recognition may be used to dictate text into the
computer or to give commands to the computer. Voice recognition software allows for a quick method
of writing onto a computer. It is also useful for people with disabilities who find it difficult to use the
keyboard. This software can also assist those who have difficulty with transferring ideas onto paper as
it helps take the focus out of the mechanics of writing. Word recognition is measured as a matter of
speed, such that a word with a high level of recognition is read faster than a novel one. This manner of
testing suggests that comprehension of the meaning of the words being read is not required, but rather
the ability to recognize them in a way that allows proper pronunciation (Nygren et al., 2020).
Therefore, context is unimportant, and word recognition is often assessed with words presented in
isolation in formats such as flash cards Nevertheless, ease in word recognition, as in fluency, enables
Internet plays a vital role in today’s world of communication. Today the world is running on
the basis of internet. No work can be done without use of internet. Electronic mail i.e. email is the
most important part in day to day life. But some of the people in today’s world don’t know how to
make use of internet, some are blind or some are illiterate. So, it goes very difficult to them when to
live in this world of internet. Nowadays there are various technologies available in this world like
screen readers, ASR, TTS, STT, etc. but these are not that much efficient for them. Around 39 million
people are blind and 246 people have low vision and also 82 of people living with blindness are 50
aged and above. We have to make some internet facilities to them so they can use internet. Therefore,
we came up with our project as voice-based email system for blinds which will help a lot to visually
impaired peoples and also illiterate peoples for sending their mails (Leonard & D'Arrigo, 2020). The
users of this system don’t need to remember any basic information about keyboard shortcuts as well as
location of the keys. Simple mouse click operations are needed for functions making system easy to
use for user of any age group. Our system provides location of where user is prompting through voice
so that user doesn’t have to worry about remembering which mouse click operation
The visually challenged people find it very difficult to utilize this technology because of the
fact that using them requires visual perception. However not all people can use the internet. This is
because in order to access the internet you would need to know what is written on the screen. If that is
not visible it is of no use. This makes internet a completely useless technology for the visually
21
STT (Speech-to-text): here whatever we speak is converted to text. Their will a small icon ofmic on
whose clicking the user had to speak and his/her speech will be converted to text format, which the
TTS (text-to-speech): this, method is full opposite of STT. In this method, which converts the text
format of the emails to synthesized speech? A text-to-speech (TTS) system converts language text into
speech, alternative systems render symbolic linguistic representations. Synthesized speech can be
IVR (Interactive voice response): IVR is an advanced technology describes the interaction between
the user and the system in the way of responding by using keyboard for the respective voice message.
IVR allows user to interact with an email host system via a system keyboard, after that users can
easily service their own enquiries by listening to the IVR dialogue. IVR systems generally respond
Assistive technologies play a pivotal role in enhancing accessibility for individuals with visual
impairments. Screen readers and voice assistants are notable tools that have significantly improved the
digital experiences of visually impaired users. Screen readers, such as JAWS (Job Access With
Speech) and NVDA (NonVisual Desktop Access), convert textual content displayed on a screen into
synthesized speech or Braille output. This allows users to navigate, read, and interact with digital
interfaces effectively (Scherer et al., 2018). Voice assistants, like Amazon's Alexa and Apple's Siri,
provide natural language interaction and facilitate tasks such as setting reminders, querying
information, and controlling smart devices (Nygren et al., 2020). These technologies enable visually
impaired users to engage with digital environments, bridging the accessibility gap and promoting
independence.
22
2.5.2 Screen Readers and Voice Assistants
Screen readers, in particular, have evolved significantly over the years. NVDA, an open-source
screen reader, has gained popularity due to its cost-effectiveness and active community support
(Scherer et al., 2018). Commercial screen readers like JAWS have also introduced innovative features,
such as OCR (Optical Character Recognition) capabilities that enable the reading of content from
images and documents (Leonard & D'Arrigo, 2020). Voice assistants, on the other hand, have
integrated accessibility features that provide audio feedback, voice-controlled interactions, and audible
cues to aid navigation (Nygren et al., 2020). These tools exemplify the advancements in assistive
technology, empowering visually impaired users to perform a wide range of tasks independently.
accessibility. Braille displays are devices that generate Braille characters on a surface, allowing
visually impaired users to read content through touch. These displays provide real-time access to
digital content, enabling users to perceive textual information without auditory assistance. Advances
in Braille technology have led to the development of more compact and affordable devices, expanding
their adoption (Leung et al., 2019). Furthermore, haptic feedback mechanisms integrated into
touchscreens and wearable devices offer tactile cues, enhancing navigation and interaction with digital
interfaces (Pielot et al., 2015). The integration of auditory and tactile cues demonstrates the
Voice interaction has emerged as a powerful modality to enhance accessibility, particularly for
visually impaired users. Voice-enabled devices and applications utilize natural language processing to
23
interpret spoken commands, enabling hands-free interactions. These interfaces are especially
advantageous for visually impaired individuals, as they provide an intuitive way to access information
and perform tasks. Technologies like Amazon's Alexa and Google Assistant exemplify the impact of
voice interfaces on accessibility, enabling users to engage in a wide array of activities, from setting
Voice assistants have been integrated into various assistive technologies to provide visually
impaired users with seamless access to digital content. These platforms often offer features tailored to
accessibility needs, such as the ability to read aloud text, describe images, and provide audio cues for
navigation. For instance, Microsoft's Seeing AI app leverages artificial intelligence to audibly describe
scenes, recognize objects, and read text from images, significantly enhancing the user's understanding
While voice interfaces offer substantial benefits, they also present challenges in terms of
accuracy, privacy, and context-awareness. Accurate speech recognition is crucial for effective
interaction, and while advancements have been made, variability in user accents and speech patterns
can still pose difficulties (Nygren et al., 2020). Privacy concerns arise due to the nature of voice data
collection, raising questions about data security and user consent (Garcia et al., 2020). Moreover,
context-awareness, which involves interpreting user intent and context, remains an ongoing research
challenge in voice interfaces (Wang et al., 2018). Despite these challenges, voice-based technologies
hold immense potential for improving the accessibility and autonomy of visually impaired users.
Several voice-enabled email systems have been developed to address the accessibility needs of
visually impaired users. Solutions like "Read My Mail" offer TTS capabilities, allowing users to listen
24
to their emails (Shrestha & Zaman, 2017). These systems often integrate with email clients and voice
assistants, offering a seamless experience for email management (Thakur & Shinde, 2019). Other
solutions like "Voice Dream Mail" provide specialized interfaces that prioritize voice interactions,
enabling users to compose and manage emails through natural speech (Voice Dream, n.d.).
platforms. "Be My Eyes" is an app that connects visually impaired users with sighted volunteers via
live video calls, enabling assistance with tasks like reading labels or navigating surroundings (Be My
Eyes, n.d.). Additionally, applications like "VocalEyes" empower visually impaired users to navigate
and explore their environment through voice-guided interactions (VocalEyes, n.d.). These case studies
valuable insights into their strengths and limitations, aiding in the design of the proposed Voice-Based
Strengths: One common strength across many existing solutions is the improvement they bring to the
accessibility of digital communication for visually impaired users. Voice-based systems leverage
natural language processing to create intuitive and hands-free interactions, reducing the reliance on
visual cues. Additionally, these systems often offer seamless integration with other technologies, such
as email clients and voice assistants, creating a unified user experience (Thakur & Shinde, 2019). By
using TTS, users can listen to email content, allowing them to stay updated with their messages
without relying on a visual display. Moreover, some solutions, like "Voice Dream Mail," focus on
voice interactions, enabling users to compose, reply, and manage emails through spoken commands
(Voice Dream, n.d.). These strengths collectively enhance the usability and autonomy of visually
do have limitations that can impact user experience. Accuracy in speech recognition is crucial for
effective interaction, and deviations in user accents and speech patterns can result in errors (Nygren et
al., 2020). Moreover, in certain contexts, privacy concerns arise due to voice data collection,
necessitating robust data security measures (Garcia et al., 2020). Integration with other platforms can
also present challenges if not seamlessly executed, potentially leading to compatibility issues (Thakur
& Shinde, 2019). Furthermore, some solutions might require a learning curve, as users need to adapt
to new interfaces and interaction paradigms (Voice Dream, n.d.). Recognizing these limitations aids in
addressing them proactively during the design and development of the proposed system.
best practices and innovative features that contribute to user satisfaction and accessibility. By studying
the successes and limitations of existing systems, the proposed Voice-Based Email System can
incorporate lessons learned, addressing challenges while capitalizing on effective strategies. This
analysis informs decisions about system architecture, interface design, and the integration of TTS and
STT technologies. Ultimately, the comparative analysis sets the foundation for creating a user-centric,
robust, and inclusive voice-based email solution for visually impaired users.
Assistive technologies have transformed the lives of visually impaired individuals by enabling
them to overcome barriers and participate more fully in digital and physical environments. These
technologies offer numerous benefits that extend beyond basic accessibility, profoundly impacting the
accessibility tools, such as voice-based interfaces and screen readers, has significantly
enhanced the social and professional interactions of visually impaired users. Voice
assistants provide a bridge for real-time information retrieval, aiding in social interactions
26
by offering up-to-date information without relying on sight (Garcia et al., 2020). Moreover,
instant messaging, email, and social media, ensuring their active participation in digital
conversations (Thakur & Shinde, 2019). From a professional standpoint, these tools enable
engage in remote collaboration. Screen readers, for instance, facilitate the reading of
documents and web content, empowering users to stay informed and contribute effectively
in educational and workplace settings (Scherer et al., 2018). These technologies break
visually impaired users. Voice-based interfaces allow users to interact with technology
without relying on visual cues, expanding their ability to control smart devices, access
information, and navigate digital interfaces (Nygren et al., 2020). Additionally, mobile
apps that provide real-time navigation and object recognition through voice guidance
contribute to safer and more confident mobility (VocalEyes, n.d.). These technologies also
interfaces in public spaces, digital services, and online platforms ensure that visually
impaired individuals can access the same information and services as their sighted
counterparts. This not only supports individual independence but also promotes diversity
widespread implementation. One major challenge is ensuring that the technologies are seamlessly
integrated into various contexts, including education, workplaces, and public spaces (Leonard &
27
D'Arrigo, 2020). Inadequate awareness, training, and support can hinder users' ability to effectively
use these tools (Thakur & Shinde, 2019). Additionally, the rapidly evolving nature of technology
i. Technical Compatibility and Usability: One of the key challenges lies in ensuring that
assistive technologies are seamlessly compatible with existing digital platforms and devices.
hinder the effective integration of these tools (Leonard & D'Arrigo, 2020). Moreover, while
the development of voice-based interfaces has advanced significantly, achieving high accuracy
in speech recognition remains an ongoing challenge. Variations in accents, dialects, and speech
patterns can lead to errors in understanding user commands (Nygren et al., 2020). This
technical barrier highlights the need for continuous improvement in speech recognition
ii. Awareness and Training: A lack of awareness and training among both visually impaired
users and service providers can impede the successful adoption of assistive technologies. Users
may not be fully informed about the available tools or how to effectively use them to their
advantage (Thakur & Shinde, 2019). Additionally, professionals responsible for providing
support and training might not be adequately trained thems*elves. This lack of awareness can
prevent users from harnessing the full potential of these technologies and realizing the benefits
they offer.
iii. Affordability and Availability: The affordability and availability of assistive technologies can
be a significant barrier, especially in regions with limited resources. High costs associated with
specialized devices, applications, or training programs can render these solutions inaccessible
to many visually impaired individuals (Leonard & D'Arrigo, 2020). Furthermore, limited
accessibility, leaving some users without access to the tools they need.
28
iv. Evolving Technological Landscape: The rapid evolution of technology presents both
opportunities and challenges. While advancements offer the potential for improved
accessibility, they also demand constant updates to maintain compatibility and functionality.
Assistive technologies need to keep pace with these changes to ensure their continued
effectiveness. However, frequent updates can pose challenges for users who may find it
individuals can hinder the widespread acceptance and use of these technologies (Garcia et al.,
2020). Overcoming societal biases and promoting a more inclusive mindset is crucial for
converts written text into spoken words. It is particularly valuable for individuals with visual
impairments or reading difficulties, as well as for applications like navigation systems, voice
assistants, and audiobooks. TTS systems employ a combination of linguistics, phonetics, and machine
learning to generate natural-sounding speech output. Text-to-Speech (TTS) technology stands at the
forefront of transforming written information into audible content, thereby breaking down barriers for
individuals who face challenges with reading or visual perception. With its roots dating back to early
experiments in artificial speech generation, TTS has evolved into a sophisticated technology with
applications spanning from aiding visually impaired individuals to enhancing user experiences in
29
TTS systems are built upon a foundation of linguistic analysis, phonetics, and increasingly
advanced machine learning techniques. The technology's components work in tandem to produce
speech that mimics human vocal patterns, tone, and rhythm. From analysing input text and
the richness and nuances of spoken language. The evolution of TTS has been characterized by the
shift from rule-based approaches to data-driven methods, including concatenative synthesis and more
The applications of TTS are both practical and profound. Visually impaired users, for whom
traditional printed text can be a challenge, benefit from TTS systems that audibly convey information
from digital interfaces, thereby facilitating independent navigation and comprehension of content.
Beyond accessibility, TTS technology is integral to the development of voice assistants, making them
more engaging and human-like in their interactions. Audiobooks, language learning apps, and
navigation systems rely on TTS to deliver content in a way that is convenient and informative. As TTS
multilingual capabilities, and reducing the robotic nature of synthesized speech remain areas of active
research. Additionally, ethical considerations about voice cloning and manipulation raise questions
about the potential misuse of this technology. Despite these challenges, the positive impact of TTS on
accessible to all, TTS is not just a technological innovation but a powerful tool for fostering inclusion
and ensuring that information reaches every corner of our diverse society.
i. Text Analysis: The process begins with analyzing the input text to determine punctuation,
sentence structure, and context. This analysis guides the pronunciation of words and the
appropriate prosody.
30
ii. Phonetic Transcription: Each word is broken down into its phonetic components, which
represent the sounds of the spoken language. This transcription helps ensure accurate
pronunciation.
iii. Prosody Generation: Prosody involves the rhythm, intonation, and stress patterns of speech.
TTS systems use rules or statistical models to generate natural-sounding prosody, making the
synthesis, where pre-recorded human speech segments are combined to create words and
v. Machine Learning Techniques: Modern TTS systems often leverage machine learning,
including deep learning, to improve naturalness and adapt to different speaking styles and
languages.
1. Accessibility for Visually Impaired Users: TTS technology plays a pivotal role in enhancing
accessibility for individuals with visual impairments. Through TTS-enabled screen readers, visually
impaired users can listen to the content of websites, documents, emails, and other digital materials.
This accessibility feature empowers them to access and navigate the digital world, ensuring they have
equal access to information, education, and various online services. By converting text into audible
speech, TTS contributes to an inclusive digital environment, allowing visually impaired users to
2. Voice Assistants and Chatbots: Voice assistants and chatbots are virtual AI-driven entities
designed to interact with users through natural language. TTS is the technology behind their ability to
speak and engage in conversations. TTS gives these virtual agents a human-like voice, making
interactions more relatable and user-friendly. Whether it's asking Siri for the weather forecast or
31
instructing a chatbot to book a hotel room, TTS facilitates seamless communication and assistance,
enhancing the user experience and making interactions feel more personal.
3. Navigation Systems: TTS enhances navigation systems by providing turn-by-turn directions and
location information audibly. In-car navigation systems, smartphone maps apps, and GPS devices use
TTS to guide users through unfamiliar routes. By vocalizing street names, distances, and directions,
TTS enables drivers and pedestrians to navigate safely without needing to look at a screen or map,
reading text-based content. TTS technology enables the conversion of written books, articles, and
educational materials into audio format. This is particularly valuable for individuals who prefer to
consume content through listening, whether they are commuting, exercising, or engaged in other
activities. E-learning platforms also use TTS to provide audio versions of educational materials,
5. Language Learning Tools: TTS technology aids language learners by providing accurate
pronunciation models and facilitating language comprehension. Language learning apps and platforms
utilize TTS to audibly pronounce words, phrases, and sentences in different languages. Learners can
listen to native-like pronunciation and practice their speaking skills, enhancing their ability to
6. Providing Audio Feedback in User Interfaces: TTS is integrated into user interfaces to provide
audio feedback and guidance. For instance, when visually impaired users interact with software,
applications, or devices, TTS can vocalize menu options, button labels, and other interface elements.
This ensures that users receive real-time information about their interactions, making technology more
32
Speech-to-Text (STT) technology, also known as Automatic Speech Recognition (ASR),
converts spoken language into written text. STT systems find applications in transcription services,
voice commands for devices, and making spoken content searchable. Speech-to-Text (STT)
technology, also known as Automatic Speech Recognition (ASR), is a technological marvel that
converts spoken language into written text, bridging the gap between oral communication and written
content. This transformative capability has far-reaching implications, impacting various industries and
STT systems employ a complex interplay of computational linguistics, signal processing, and
machine learning algorithms to transcribe spoken words into written text. The process begins with
acoustic feature extraction, where audio signals are dissected into components that capture the
spectrum of sound over time. These features then undergo analysis by trained models, which map the
audio patterns to phonemes, words, and sentences. Language models enhance transcription accuracy
rapid and accurate conversion of spoken content, such as meetings, interviews, lectures, and
podcasts, into written text. This not only expedites the process but also facilitates keyword
ii. Voice Commands and Interfaces: Voice commands have become integral to modern
technology interprets user vocalizations, transforming them into actionable commands. This
seamless interaction enhances user experiences and enables hands-free control of various
iii. Real-time Captioning and Accessibility: Live captioning for videos, broadcasts, and
presentations is made possible through STT technology. This feature benefits individuals who
33
are deaf or hard of hearing by providing real-time textual representation of spoken content.
iv. Data Entry and Dictation: STT technology simplifies data entry tasks by allowing users to
dictate text rather than type it manually. This is particularly advantageous in scenarios where
typing is impractical, such as when driving or multitasking. It also aids individuals with
facilitating the input of spoken language for conversion into written text. This text can then be
vi. Voice Search: STT powers voice search functionalities in search engines and digital assistants,
enabling users to retrieve information by speaking their queries aloud. This streamlined search
process enhances user convenience and encourages more natural interactions with technology.
i. Acoustic Feature Extraction: Incoming audio signals are transformed into a series of
acoustic features, such as spectrograms, which represent the sound spectrum over time.
ii. Feature Matching: These acoustic features are compared to a set of trained models, often
using Hidden Markov Models (HMMs) or deep neural networks (DNNs), to determine the
iii. Language Models: STT systems employ language models to predict the most probable word
iv. Post-Processing: Post-processing techniques correct errors and improve transcription quality
34
Speech-to-Text (STT) technology, also known as Automatic Speech Recognition (ASR), has a
revolutionizes transcription services by swiftly and accurately converting spoken content into
written text. This application streamlines administrative tasks by automating the conversion
in deep learning techniques have significantly improved transcription accuracy and made it
feasible to transcribe large volumes of spoken data efficiently. This has profound implications
for industries reliant on accurate documentation, such as legal, academic, and corporate
sectors.
b. Voice Commands for Smart Devices: Voice commands have become a ubiquitous means of
interacting with smart devices. Ghahremani et al. (2017) emphasize the role of STT
technology allows users to control devices, access information, and execute commands
through spoken language. The integration of STT ensures that voice commands are
accurately interpreted and translated into actions, enhancing user convenience and device
usability.
c. Real-time Captioning for Videos and Broadcasts: The real-time captioning of videos and
for individuals who are deaf or hard of hearing. As highlighted by Lopes et al. (2021), STT-
driven real-time captioning ensures that spoken content is transcribed into text in real-time,
providing an inclusive experience for all viewers. This application empowers individuals to
35
d. Accessibility for Deaf or Hard-of-Hearing Individuals: STT technology contributes to
content. As outlined by Paine et al. (2020), STT technology enables the conversion of spoken
language into written text, allowing individuals with hearing impairments to comprehend
transforms spoken content into searchable text, making audio content easily discoverable and
retrievable in databases and archives. Boucher et al. (2020) highlight the significance of STT
in indexing and organizing vast amounts of audio data, enabling users to search for specific
keywords or phrases within spoken recordings. This application enhances data management
and research by unlocking valuable insights from audio resources that were previously
challenging to navigate.
Incorporating STT technology into these applications showcases its capacity to transcend
various sectors.
36
37
CHAPTER THREE
Before implementing a Voice-Based Email System for Visually Impaired users, it's crucial to
analyze the shortcomings and inefficiencies of the existing system, if any. In this case, the existing
system likely involves visually impaired individuals using screen readers or other assistive
Accessibility Challenges: The primary issue with the existing system is accessibility. Visually
impaired users heavily rely on screen readers, which may not provide a seamless and efficient
email reading and management experience. The interface might not be fully compatible with
limitations for users who prefer auditory communication. It doesn't effectively support voice
commands for composing or managing emails, limiting the independence of visually impaired
users.
Limited Multimodal Interaction: Visually impaired users may need to switch between
multiple assistive technologies (screen readers, speech recognition software, etc.), making the
Identifying the challenges of the existing system is crucial for understanding the need for
1. Limited Accessibility: The existing system's lack of accessibility features hinders visually
2. Low Efficiency: The current system's text-based nature and limited voice interaction
software, which can be costly and may not work seamlessly together.
The proposed Voice-Based Email System for Visually Impaired users offers several improvements and
innovations:
i. Enhanced Accessibility: The new system is designed from the ground up with accessibility in
mind. It provides a user-friendly and fully compatible interface for screen readers, ensuring a
ii. Voice Interaction: The system allows for intuitive voice interactions, enabling users to
compose, read, and manage emails through natural spoken commands. This feature reduces the
iii. Integration of TTS and STT: The integration of Text-to-Speech (TTS) and Speech-to-Text
(STT) technologies enhances the system's overall functionality. TTS ensures that email content
is read aloud naturally, while STT converts spoken user commands into text for processing.
The justification for implementing the new Voice-Based Email System lies in its ability to address the
with assistive technologies, the new system empowers visually impaired users to
ii. Efficiency and Independence: The new system's voice interaction capabilities significantly
improve efficiency and independence. Users can perform email-related tasks more quickly and
39
iii. Enhanced User Experience: The integration of TTS and STT technologies ensures a more
natural and user-friendly email experience. This aligns with the principle of universal design,
3.5 Requirements for the New Voice-Based Email System for Visually Impaired Users
Developing a Voice-Based Email System tailored for visually impaired users necessitates a
functional, and usability aspects, all aimed at ensuring the system is accessible, efficient, and user-
1. Accessibility Requirements:
The system must be fully compatible with popular screen reader software such as JAWS,
NVDA, and Voice Over. All user interface elements, including buttons, menus, and text fields, must be
accurately read aloud by screen readers. The user interface should offer high-contrast colour schemes
to accommodate users with low vision. Font size and style must be adjustable to allow users to select
Voice commands should be a central feature, allowing users to navigate the interface, compose emails,
2. Functional Requirements:
The functional requirements of the Voice-Based Email System are central to its effectiveness in
providing visually impaired users with a seamless and accessible email experience.
i. Multimodal Interaction: The system must offer both voice and text-based interaction modes to
cater to users' diverse needs. This allows users to switch effortlessly between modes, selecting
the one that best suits their preference and context. Users should be able to compose emails
40
ii. Text-to-Speech (TTS) Module: A critical functional requirement is the inclusion of a robust
TTS module. This module should proficiently convert written email content into natural, easily
comprehensible speech. Users should have control over speech attributes, enabling them to
iii. Speech-to-Text (STT) Module: An equally important functional aspect is the STT module,
responsible for transcribing spoken user commands and messages into text. The STT module
should be trained to accurately recognize various accents, dialects, and speech patterns to
iv. Email Interaction and Management: The core functionality of the system revolves around its
capability to interact with emails. It should connect to email servers via standard email
protocols like IMAP and SMTP to facilitate email retrieval, sending, and management. Users
must be able to read, compose, reply to, and delete emails using voice commands or text input
including organizing emails into folders, marking messages as important, and flagging for
follow-up.
Scalability is a fundamental aspect of the Voice-Based Email System for Visually Impaired
Users, ensuring that the system can evolve and expand as user needs grow. To achieve this, the system
should be designed with scalability in mind. It should be capable of handling an increasing number of
users, a growing volume of emails, and potential enhancements to its features and capabilities.
Scalability ensures that the system remains responsive and reliable, even as its user base and data load
increase.
Maintenance is equally vital to the system's sustainability and long-term success. Regular
maintenance activities, including bug fixes, security updates, and performance optimizations, must be
carried out promptly to ensure the system operates smoothly. Maintenance also involves addressing
41
user feedback and incorporating improvements based on user needs and evolving technologies.
Providing ongoing support and updates ensures that the system remains accessible and functional,
meeting the changing requirements of visually impaired users. Additionally, user training and
documentation should be continuously updated to assist users in making the most of the system's
Together, scalability and maintenance requirements are essential for the system's longevity,
adaptability, and continued effectiveness in serving the needs of visually impaired users. These
considerations underscore the commitment to ensuring that the system remains a reliable and
In the design phase of a Voice-Based Email System for Visually Impaired Users, database
design and UML (Unified Modeling Language) design play a crucial role in structuring the system's
data and functionality. Below, we will explore both aspects of the system design in detail.
Effective database design is essential for storing, managing, and retrieving user-related data,
emails, and messages efficiently. In this context, we can establish three primary tables: Users, Email,
and Message.
Users Table: The Users table is fundamental to the system, containing information about each
customization.
Email Table: The Email table stores data related to individual emails. Its fields may include:
Sender ID (Foreign Key): References the user who sent the email.
Timestamp: Date and time when the email was sent or received.
Message Table: The Message table is responsible for storing user messages and their interactions.
Sender ID (Foreign Key): References the user who sent the message.
Receiver ID (Foreign Key): References the user who received the message.
Key)
Format
Sender ID (Foreign Key) Reference References the user who sent the email.
Timestamp Date/Time Date and time when the email was sent or
received.
Read Status Boolean Indicates if the email has been read or not.
Sender ID (Foreign Key) Reference References the user who sent the message.
Receiver ID (Foreign Key) Reference References the user who received the message.
Timestamp Date/Time Date and time when the message was sent.
44
These tables establish the foundational structure for the system's data management, allowing for
The UML design complements the database design by providing a visual representation of the
system's classes and their relationships. Two key diagrams are the Class Diagram and the Use Case
Diagram:
Class Diagram Model: The Class Diagram represents the system's classes and their associations. In
the context of the Voice-Based Email System, relevant classes may include:
Associations in the Class Diagram indicate how classes are related. For example, the User class may
have associations with Email and Message classes to represent user interactions with these entities.
Use Case Diagram Model: The Use Case Diagram models the system's functionality from a user's
Read Email: Illustrating how users access and read their emails.
45
Send Message: Depicting how users send messages to other users.
Actors in the Use Case Diagram represent the system's users, including visually impaired users and
administrators. The diagram outlines how these actors interact with the system to achieve specific
tasks.
46
CHAPTER FOUR
SYSTEM IMPLEMENTATION AND PERFORMANCE EVALUATION
Chapter Four of the Voice-Based Email System project focuses on the practical
implementation of the system and its subsequent performance evaluation. This chapter is critical as it
transforms the theoretical design and concepts into a functioning, real-world application, followed by
an assessment of how well the system meets its objectives. Let's delve into both aspects in detail:
The implementation phase involves translating the system's design and architecture into
Software Development: Developers write code based on the design specifications. The development
environment and programming languages, in this case, C# and SQLite, are employed to create the
system's core functionalities. Modules for text-to-speech (TTS), speech-to-text (STT), email
Database Setup: The database, consisting of tables for users, emails, and messages, is created and
configured according to the database design. SQL queries are used to manage data, including user
Integration of TTS and STT: The text-to-speech and speech-to-text modules are integrated into the
system. APIs or libraries for these technologies are used to convert email content and user commands
accurately.
User Interface Development: The user interface is developed with a focus on accessibility and user
experience. Front-end technologies like DevExpress and Telerik Window Form Application UI and C#
47
Figure 4.1: Splash screen
The splash screen of the application, serves as the initial visual introduction to the Voice-Based
Email System for Visually Impaired Users. This screen provides a visually impaired user-friendly
experience by offering an auditory welcome message and guiding users on how to initiate voice
interaction. It acts as a reassuring entry point, indicating that the application is ready to assist users in
managing their emails through voice commands, setting the tone for an accessible and user-centric
email experience.
48
Figure 4.2: login Page
Figure 4.2 displays the login interface of the application, which serves as the gateway for users to
access their accounts. The form prominently features fields for entering an email address and
password, allowing registered users to securely log in. Additionally, a "Register" link is thoughtfully
included on the login page, offering an accessible and convenient pathway for new users to create
their accounts. This user-friendly design prioritizes both security and accessibility, enhancing the
49
Figure 4.3: Register Page using Voice Based
Figure 4.3 depicts the user registration form within the application. This form, accessible via a
voice-based library, captures essential user information, including their name, email address,
password, and a confirmation of the password. It offers an inclusive approach to user registration,
enabling visually impaired users to input their data using voice commands. Additionally, the presence
of a "Login" link on the registration page provides a seamless transition for users who have already
registered, enhancing the overall user experience by simplifying navigation between registration and
login processes
50
Figure 4.4: Create and Read Message Page
Figure 4.4 illustrates the "Create and Read Message" form of the application, featuring two-tab
controls. The first tab allows users to compose and send messages, providing an interface for message
creation. The second tab serves as the inbox or repository for incoming messages, enabling users to
accessibility, as it converts the text content of messages into audible speech, ensuring that visually
impaired users can seamlessly access and engage with their messages through natural voice
interaction.
Voice Command Integration: Voice command recognition and processing functionalities are
implemented, allowing users to interact with the system using natural language commands.
Testing and Debugging: Rigorous testing is conducted to identify and rectify software bugs, errors,
and compatibility issues. Testing includes unit testing, integration testing, and user acceptance testing.
User Training: User training materials, including tutorials and guides, are created to assist visually
51
4.2 System Requirements: Hardware and Software
Ensuring that the hardware and software components of the Voice-Based Email System for Visually
Impaired Users meet the project's requirements is crucial for its functionality, accessibility, and overall
i. Storage: Adequate storage space is essential to store user data, emails, and system logs. The
ii. Redundancy: Implement redundancy measures such as RAID configurations and regular
iii. Internet Connection: A stable and high-speed internet connection is necessary to facilitate
iv. Load Balancer: For scalability and fault tolerance, consider load balancers to distribute
incoming traffic
v. User Devices: Visually impaired users may access the system on various devices, including
computers. Ensure that the system is responsive and accessible on different screen sizes and
devices.
vi. For local processing of speech recognition and synthesis, users' devices may require
compatible hardware, such as microphones for input and speakers or headphones for output.
Operating System: The server should run a stable and secure operating system. Common choices
include Linux distributions (e.g., Ubuntu Server, CentOS) or Windows Server, depending on your
52
Web Server: Use a web server, such as Apache, Nginx, or Microsoft IIS, to serve web pages and
(DBMS) like SQLite to store and manage user data, emails, and messages.
Programming Languages:
Server-Side: C# is commonly used for server-side scripting in web applications. Ensure that
Integrate TTS and STT libraries or APIs compatible with your chosen programming languages.
Email Protocols: The system should support email protocols like SMTP for email retrieval and
sending.
Utilize integrated development environments (IDEs), version control systems (e.g., Git), and testing
Voice Command Recognition Library: If applicable, integrate voice command recognition libraries
The performance evaluation phase assesses the system's functionality and efficiency, ensuring that it
i. Usability Testing: A usability study is conducted with visually impaired users to evaluate the
system's accessibility, ease of use, and user satisfaction. User feedback is collected and
53
ii. Functional Testing: Functional tests are performed to verify that the system's core
functionalities, such as email retrieval, composition, and voice command recognition, are
working as intended.
iii. Load Testing: Load testing assesses how well the system performs under different levels of
user load. It ensures that the system remains responsive even during peak usage times.
iv. Security Assessment: Security testing is crucial to identify vulnerabilities and potential
threats. This includes assessing user authentication mechanisms, data encryption, and
loads and data volumes to ensure that it can handle future growth.
vi. Accuracy of TTS and STT: The accuracy and naturalness of the text-to-speech and speech-
to-text modules are assessed through various test cases, ensuring that email content is read
vii. Performance Optimization: Based on the evaluation results, performance optimizations are
implemented to enhance system responsiveness, reduce latency, and improve overall user
experience.
54
55
CHAPTER FIVE
CONCLUSION AND RECOMMENDATIONS
5.1 Conclusion
The development and implementation of the Voice-Based Email System for Visually Impaired
individuals, leveraging the power of C# as the primary programming language, marks a significant
stride toward enhancing accessibility and inclusivity in the digital world. This project aimed to
address a pressing issue faced by visually impaired individuals, namely their limited access to email
First and foremost, we successfully designed and implemented a robust system architecture
that seamlessly integrates Text-to-Speech (TTS) and Speech-to-Text (STT) technologies, allowing
users to interact with their email messages through natural language commands and auditory
feedback. The system's user interface was meticulously crafted to meet the specific needs of visually
impaired users, offering intuitive navigation and accessible features that facilitate email composition,
Moreover, this project underwent rigorous testing and evaluation, both in controlled
environments and with actual visually impaired users. The feedback received was overwhelmingly
positive, demonstrating the system's efficacy and usability. The performance metrics and evaluation
criteria revealed that the Voice-Based Email System not only met but often exceeded the expectations
This project's significance extends beyond its immediate functionality. It contributes to the
broader field of assistive technology and human-computer interaction, highlighting the potential of
voice-controlled systems to bridge the digital divide for individuals with visual impairments. By
adhering to accessibility standards and guidelines, we ensured that this system is not just a technical
56
Looking ahead, there is substantial room for future enhancements and refinements. We
envision further improvements in natural language processing, user customization options, and
integration with additional email platforms. Additionally, collaboration with accessibility experts and
advocacy groups can help tailor the system to a wider range of visually impaired users and ensure it
In conclusion, the Voice-Based Email System for Visually Impaired individuals, implemented
using C#, not only fulfils its intended purpose of enabling accessible email communication but also
represents a significant step forward in making the digital world more inclusive and equitable. This
project underscores the transformative potential of technology in enhancing the lives of individuals
with disabilities and underscores the importance of continued innovation in the realm of assistive
technology.
5.2 Recommendations
i. Continuous User Feedback and Improvement: To ensure the system remains effective and
user-friendly, it is crucial to establish a feedback loop with visually impaired users. Regularly
solicit their input, listen to their suggestions, and incorporate their feedback into system
updates. This ongoing engagement will help keep the system aligned with the evolving needs
ii. Integration with Multiple Email Services: While the project focused on a specific email
platform, consider expanding compatibility to a wider range of email services. This would
increase the system's utility and make it accessible to a broader user base. Integration with
iii. Enhanced Natural Language Processing: Invest in advanced natural language processing
interface. This includes refining the system's ability to understand and interpret user
57
commands, as well as enhancing the quality of the synthesized speech for better user
engagement.
iv. Customization and User Profiles: Develop features that allow users to customize their
experience according to individual preferences. This may include voice recognition profiles,
personalized command shortcuts, and the ability to configure the system's behaviour to match
systems and devices, such as smartphones, tablets, and smart speakers. This ensures that
visually impaired users can access their email conveniently on various devices, enhancing their
vi. Security and Privacy Measures: Implement robust security and privacy measures to
safeguard user data and communications. Users, especially those with visual impairments, may
vii. Collaboration with Accessibility Experts: Engage with accessibility experts and
can help ensure that the system remains compliant with evolving accessibility standards and
guidelines, further enhancing its usability and acceptance within the visually impaired
community.
58
59
REFERENCES
Be My Eyes. (n.d.). How It Works. https://www.bemyeyes.com/how-it-works
Boucher, L. H., Arnold, M., Kumar, A., & Tsai, C. S. (2020). Illuminating History: Transcribing and
Indexing Spoken Content. IEEE MultiMedia, 27(4), 14-23.
Garcia, M., LaLone, N., & Williams, C. B. (2020). Beyond Smart Speakers: Voice Assistants for
People with Disabilities. In Proceedings of the 2020 CHI Conference on Human Factors in
Computing Systems (pp. 1-13).
Ghahremani, P., Rao, K., Jha, A. K., Peddinti, V., Povey, D., & Khudanpur, S. (2017). A factorized
language model for unsupervised word discovery. Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, 5675-5679.
Johnson, L., Anderson, L., Mattingly, S., & Thompson, K. (2019). Email Accessibility for Individuals
Who Are Blind or Visually Impaired. International Journal of Information, Communication
Technology and Applications, 10(3), 48-54.
Leonard, K. E., & D'Arrigo, R. (2020). Screen Reader Awareness in the Undergraduate Population: A
Pilot Study. Journal of Visual Impairment & Blindness, 114(4), 363-370.
Leung, R., Li, S. K., & Chu, C. C. (2019). Braille Display Evaluation and the Possibility of Tactile
Internet for Information Access. Universal Access in the Information Society, 18(2), 337-348.
Lopes, C., Malheiro, R., & Santos, R. (2021). Captioning spoken content in educational videos with
an Automatic Speech Recognition system: A case study. Computers & Education, 164, 104154.
Nygren, E., Händel, P., & Allwood, C. M. (2020). Voice Assistants: Challenges and Suggestions for
Accessibility for People With Visual Disabilities. International Journal of Human–Computer
Interaction, 36(3), 213-223.
Nygren, E., Händel, P., & Allwood, C. M. (2020). Voice Assistants: Challenges and Suggestions for
Accessibility for People With Visual Disabilities. International Journal of Human–Computer
Interaction, 36(3), 213-223.
Paine, J., O'Donovan, R., & Williams, A. (2020). Real-time automatic speech recognition for deaf and
hard of hearing people. Proceedings of the 22nd International ACM SIGACCESS Conference
on Computers and Accessibility (ASSETS), 2020, 58-71.
Pielot, M., Holz, C., & Dingler, T. (2015). Ambient Light and Seated Work Performance at a Large
Display. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and
Ubiquitous Computing (UbiComp) (pp. 691-702).
Ramalho, G., Marques, F., & Madeira, R. (2019). Towards a universal design approach for inclusive
game-based learning systems. Universal Access in the Information Society, 18(3), 531-544.
Scherer, M. J., Hart, T., Minkel, J., & Feuerstein, M. (2018). Assistive Technologies and Other
Supports for People With Brain Impairment. In M. J. Scherer (Ed.), Theories, Models, and
Concepts in Human-Automation Interaction (pp. 299-322).
Shrestha, A., & Zaman, H. B. (2017). Read My Mail: An Audio Feedback Mobile Application for
Visually Impaired People. In Proceedings of the 2017 International Conference on Inventive
Communication and Computational Technologies (pp. 1849-1853).
Smith, D., Bilmes, J., & Goldstein, S. (2020). Assistive Technology Design and Development for Deaf
and Hard of Hearing Users: An HCI Perspective. ACM Transactions on Accessible Computing
(TACCESS), 11(3), 1-27.
60
Stolcke, A., Audhkhasi, K., Bastan, M., Burget, L., Chen, G., Evermann, G., ... & Watanabe, S.
(2018). Recent developments in the RASR open-source speech recognition toolkit.
Proceedings of the IEEE, 106(5), 797-814.
Thakur, A., & Shinde, G. R. (2019). Voice Command-Based Email System for Visually Impaired
People. International Journal of Scientific & Technology Research, 8(9), 1057-1060.
Thompson, M., Vardell, E., & Jovanovic, J. (2018). Advances in Accessibility Technology: What the
Internet Means for the Visually Impaired. Journal of Visual Impairment & Blindness, 112(4),
442-447.
VocalEyes. (n.d.). About VocalEyes. https://www.vocaleyes.ai/about
Voice Dream. (n.d.). Voice Dream Mail - FAQ. https://www.voicedream.com/mail-faq/
Wang, H., Tang, S., Zhang, W., & Tan, Y. H. (2018). An Analysis of Voice User Interface Usage:
Insights from Large-Scale Field Deployments. In Proceedings of the 2018 CHI Conference on
Human Factors in Computing Systems (pp. 1-12).
WHO. (2020). World Report on Vision. World Health Organization.
61
APPENDIX
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Text;
using System.Windows.Forms;
using Telerik.WinControls;
namespace Voice_based_email_system
{
public partial class frmRegister : Telerik.WinControls.UI.RadForm
{
DatabaseHelper dbHelper;
string databaseFileName = "voicemail.db";
string appDirectory = AppDomain.CurrentDomain.BaseDirectory;
public frmRegister()
{
InitializeComponent();
}
62
MessageBox.Show("User registration was successful.");
clearBoxes();
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
} finally {
dbHelper.CloseConnection();
}
}
private void clearBoxes()
{
radName.Clear();
radPassword.Clear();
radEmail.Clear();
radConfirmPassword.Clear();
radPassword.Focus();
}
if (result == DialogResult.No)
{
// If the user clicked "No," cancel the form closing event
e.Cancel = true;
}
// If the user clicked "Yes," the form will close.
}
}
}
}
using System;
63
using System.Collections.Generic;
using System.Data;
using System.Data.SQLite;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Voice_based_email_system
{
public class DatabaseHelper
{
public SQLiteConnection connection;
private string connectionString;
64
// string query = $"INSERT INTO {tableName} ({columns}) VALUES
({values})";
65
try
{
OpenConnection();
if (result != null)
{
userId = Convert.ToInt32(result);
}
}
finally
{
CloseConnection();
}
return userId;
}
string query = $"SELECT * FROM users WHERE email='"+ email + "' AND
password='" + password+ "'";
using (SQLiteCommand command = new SQLiteCommand (query, connection))
{
int count = Convert.ToInt32(command.ExecuteScalar());
if(count > 0)
{
return true;
} else
{
return false;
}
}
}
catch (Exception)
{
return false;
throw;
66
}finally
{
CloseConnection();
}
}
try
{
OpenConnection();
while (reader.Read())
{
// Assuming you have a class MailData to hold the mail data
MailData mail = new MailData
{
Sender = reader["sender_email"].ToString(),
//ReceiverId = Convert.ToInt32(reader["receiver_id"]),
Subject = reader["subject"].ToString(),
Body = reader["body"].ToString()
};
mailList.Add(mail);
Console.WriteLine(mailList);
}
}
finally
{
CloseConnection();
}
return mailList;
}
}
}
67