Automated Scrolling Using Speech Recognition
Prof. Manikandan K1, Apurva Singh2, Sakshi Agarwal2, Ankita Singh2
1Associate Professor, SCOPE, VIT University, Vellore
2Student, SCOPE, VIT University, Vellore
*Corresponding Author E-mail:ankita.singh1110@outlook.com, shaks018@gmail.com, apurva.singh_2013@vit.ac.in, kmanikandan@vit.ac.in
ABSTRACT:
The Automated Scrolling Using speech recognition is a promising technique which allows users to scroll through a document with minimum effort and in a smart manner. Speech recognition is being used to perform speech to text conversion of the user’s input which is then processed to accomplish automated scrolling. Google speech recognition API and python speech recognition library are used for efficient extraction of user speech input. A string matching algorithm is used to keep track of user speech input which is converted to text and the document which is to be scrolled. The paper discusses all the modules present in the creation of a proficient automated scrolling system using speech recognition.
KEYWORDS:Google Speech to Text API, Python Speech Recognition Library, Knuth-Morris-Pratt string matching algorithm, Inter Process Communication, Scroll Event.
1. INTRODUCTION:
In the present scenario, the constantly emerging technologies work towards a same goal of making human lives simpler. Scrolling is a very basic event which is carried out on various locations like web browsers and different types of documents. Sometimes such simple scrolling becomes a tiresome task and also causes hindrances in the concentration of the document user. People who are public speakers, orators, presenters and faculties while interacting with the audience, break their concentration and the flow as they need to scroll the document after every page. The Automated Scrolling using Speech Recognition is designed to enhance the existing scrolling mechanism by making it user friendly. It uses a combination of various features like speech recognition, inter process communication, string matching and event handling which work together to provide a user friendly scrolling mechanism. This project makes effective use of these features and provides an optimal, advanced and smart automated scrolling. In case of problems faced by speakers, orators etc. as mentioned above is eliminated by using the speech recognition of user and scrolling the document as soon as the end of page is reached by applying efficient string matching algorithm. This will enable the user to concentrate on the speech or lecture without worrying about scrolling the document, thus enabling the user to give an effective speech or lecture to the interested audiences. This creates a user friendly environment as it reduces human effort by performing scrolling automatically. Proposed paper uses Google speech recognition API and python speech recognition library which aids in speech to text conversion of user’s speech input in the most effective way possible. The use of KMP string matching algorithm facilitates a proper comparison between the text obtained from user input and the user selected document. Scrolling event is called as soon as the user speech reaches the end of the page.
2. PROJECT MODULES:
The automated scrolling using speech recognition can perform scrolling without any effort by the user. Some modules make use of processes which store the data from user speech input and some process keeps the track of the document which is selected by the user on which automated scroll is to be performed. The complete project is divided into five major modules which together establish the project. These five modules are discussed in detail in this section as follows:
2.1 Speech to Text:
Various speech to text conversion applications are available, but implementation of speech to text in the project is carried out by using Google speech recognition API and coding is done in python by using speech recognition library which aids in providing a responsive and user-friendly environment to the users. Speech to text constitutes the first module for the implementation of the automated scrolling using speech recognition. This module takes the user speech input by using the Google speech to text API and processed by python speech recognition library as it converts the speech input as said by the user to the corresponding text. This text which is obtained by the user’s input is stored into a process called process 1.The document which is selected by the user is stored into another process, called process 2. This process 2 is saved for the Inter process communication module which is described in this section. The tools used in speech to text module are dis-cussed below:
2.1.1 Google speech to text API:
Google speech to text API supports audio to text by applying powerful neural network models in an easy to use API. It is based on same neural network technology that powers Google’s voice search in the Google app and voice typing in Google’s keyboard The API recognizes over 80 languages but the project is concerned with only English language. The speech input given by the user is converted to text in the real-time i.e. speech API stream text results and recognize text result immediately while speaking. API makes use one of the best noise control mechanism which perform noise cancellation and advanced signal processing. The audio or speech input can be captured by the API through the user’s microphone which is connected to the system on which the document which is to be scrolled is present.
2.1.2 Python speech recognition library:
Python provides a dedicated library to deal with speech recognition with support for several engines and APIs, online and offline. The same library is used in the model to take speech input and support of Google speech Recognition API is used to convert it into text.
2.2 Inter Process Communication:
The second module of the model is Inter Process Communication (IPC). There are various means of establishing IPC between process, but this project makes use of one inter process communication with sockets. The process 1 which stores the user speech input communicates with process 2 which stores the document which is selected by the user for scrolling. Socket makes use of server and client communication between processes. Process 1 can be regarded as present in the client and process 2 is present in the server. The socket is first created on both client and server side followed by connection to socket by both server and client. A detailed description of both the client and server is given below:
2.2.1 Server side and client side:
Fig1. Inter Process Communication using socket
Fig1. shows the IPC using socket. Both the server and client create socket depicted by function Socket(). Server after creating socket binds with the socket which indicate any messages given to the socket is received by the bind. The Listen() function indicate the listen operation of server followed by Accept() where the server takes its first request as speech input. After receiving the speech input the server performs operations according to the algorithm created in the project which is discussed later in the document. In Fig1. There exist a client side where after socket creation, it connects to the server indicated by Connect() function, and provide the user speech input.
2.3 Data Buffer:
Data buffer is a region of a physical memory storage used to temporarily store data while it is being moved from one place to another. The purpose of a buffer is to hold data right before it is used. Buffers can be implemented in a fixed memory location in hardware or by using a virtual data buffer in software, pointing at a location in the physical memory. This project makes use of the later. Process 1 which stores the user speech input is passed to a process called process 3, which act like a temporary process which is allocated some memory before utilizing the data. Thus this part of the project requires the use of data buffers. The data present in the process 3 is then used to compare with the data present in process 1 as mentioned before for string matching. The major reason of using the data buffer is the difference between the rate at which the data from process 1 is received and the rate at which the data in process 2 is processed.
2.4 String matching:
This module of the project is one of the most important modules as it determines the complexity of the project. Out of various string matching algorithm which are available, the project makes use of Knuth-Morris-Pratt string matching algorithm which provides a complexity of order O(n) in worst case which is acceptable and better than the complexity of other string matching algorithms. The data in process 1 which is the user’s speech input converted to text is transferred to a temporary process, called process 3, after establishing a proper Inter Process Communication (IPC) using socket. The user selected document is stored in process 2. Once the user gives input the data in process 3 is modified and it is compared line by line to the data which is stored in process 2. A threshold value is defined by the developers which determine the count of lines before the end of page after which the document is to be scrolled automatically. As an example if a page contains 10 lines and threshold value set by developer is 8, then as soon as the KMP string matching algorithm detect that the user is reading line 8, it triggers the event of automatic scrolling.
2.4.1Knuth-Morris-Pratt string matching algorithm:
Knuth-Morris-Pratt string matching algorithm is one of the best string matching algorithms available in the market. The string tries to eliminate invalid shift by computing preprocessing function. The string which is to be matched is stored in an array of characters and the pattern with which the string is to be matched is stored in another array of characters. The KMP string matching algorithm uses a prefix function for matching the strings. The user speech input which is stored in process 1, is then transferred to process 3 which stores the text temporarily. Process 2 which stores the text document selected by user is then compared to the continuously modifying text in the process 3. As soon as the threshold value is reached the scroll event is triggered which automatically scrolls the document. The KMP string algorithm is preferred for the implementation of the project because of its better performance as compared to other string matching algorithm like Naïve string matching algorithm, Robin Karp string matching and Finite automata string matching
2.5 Automatic scrolling:
Scrolling of a document is not the most preferred task among the presenters, orators, faculties and everybody whose field involves communicating with a large crowd of audience. Continuous manual scrolling of the document not only affects the con-centration and speaking flow of the speaker but al-so result in an uninterested audience. Automatic scrolling feature is unavailable in the existing documents. It enables the user to automatically scroll the document using just speech recognition. During string matching which is carried out by KMP string matching algorithm, as soon as the data present in process 1 which corresponds to the user input speech to text, reaches the threshold value set by the developer, a down key arrow is triggered which is implemented by event handling. This event scrolls down the document, without any effort of the user. This is very efficient and provides a user friendly environment.
3. IMPLEMENTATION:
Client program
1. Import speech recognition and socket packages
2. Connect to socket
3. Obtain audio from the microphone
4. While not disconnected
· Listen to source
· Use Goggle Speech Recognition API to recognize the audio
5. If recognized and it is not disconnect then:
6. If didn’t recognize the audio print couldn’t recognize
7. End
Server Program
1. Import Socket package
2. Connect to socket
3. Bind the socket
4. Listen to socket
5. Open the text file in read mode
6. Read the file line by line
7. Print a threshold number of lines and let the user read them
8. Accept data from client
9. Use KMP to match the received data with the currently printed line
10. If they match then
11. Print next threshold number of lines
12. Continue till you reach the end of file
13. once the entire file has been read display the message for the same
14. end
The automatic scrolling using speech recognition provides a very user friendly approach to scrolling which is unavailable at present. The complete project is implemented on command line where all the functionalities of the project which are automatic scrolling and speech history are tested. Some modifications which can be done to the project are:
4.1 Application program:
User friendly application can easily be developed with an attractive user interface which also exe-cutes the proposed automatic scrolling.
4.2 Document format
The proposed project can scroll document having format like text, doc or pdf. The project can be ex-tended to make other document types compatible with the automated scrolling which will further increase the user friendly feature.
4.3 Web browser extension
The automated scrolling is limited to only documents but it is not designed to work on web browsers. This project can be extended to include web browser and provide the facility of automated scrolling. This will expand the use of automated scrolling to web application.
4.4 Database
Since the project is implemented to introduce automatic scrolling using speech recognition, a database could be introduced which will help in keeping record of the files used and also store the user’s speech history. Each user can be provided with a unique identification key. The major entities which can be added for improvement of the project are as follows:
4.4.1 User information
The database may store the information about user which includes user id which is the unique identification key, name of the use, speech ID which provides the speech history of the user, file count gives the number of files used.
4.4.2 File information
The database could include information about each file used by a user at any time. This includes file id which is unique identification provide to every file, file type which indicate the format of the file like doc, txt or pdf, description about the file, date of file last used and date of file modification.
4.4.3 Speech information
The database could store information about the speech history of every user. This includes user id, the topic of the speech, date of the speech, subject of the speech, event type and venue.
The project “Automated Scrolling Using Speech Recognition” is a step towards more user friendly and efficient scrolling mechanism which takes into considerations all the drawbacks and inefficiency of currently existing scrolling techniques and provide suitable solutions to deal with such problems. The project introduces the concept of speech recognition using Google speech to text API and python speech recognition library which shows the importance of application in creating automated scrolling. The use of Knuth-Morris-Pratt string matching algorithm provides most optimal string matching with acceptable complexity as desired in the project. This project can serve as a prototype and it can be extended and implemented in any other way with the upcoming latest technology which can be helpful in making scrolling more users friendly and interesting and take a step towards a smart society.
6. REFERENCES:
1. Suma Swamy, K.V Ramakrishnan, “An effective speech recognition system”, Computer Science & Engineering: An international journal (CSEIJ), Vol 3., August 2013.
2. Andy Cockburn, Joshua Savage, Andrew Wallace, “Tuning and Testing Scrolling Interfaces That Automatically Zoom”.
3. Raj Reddy, Lee D. Erman, R.B. Neely, “Working Papers in Speech Recognition”, Computer Science Dept. CMU, April 1972.
4. Youhao Yu, “Research on Speech Recognition Technology and Its Applications”, 2012 International Conference on Computer Science and Electronics Engineering.
5. Melanie Pinola, “Speech Recognition Through the Decades: How We Ended Up With Siri”, PCWorld.
6. Ganesh Tiwari, “Text Prompted Remote Speaker Authentication \: Joint Speech and Speaker Recognition/Verification System”.
Received on 13.11.2016 Accepted on 09.03.2017 © EnggResearch.net All Right Reserved Int. J. Tech. 2017; 7(1): 15-19. DOI:10.5958/2231-3915.2017.00004.9 |
|