PUBLICATIONS
PATENTS
PLACED TRAINEES
HACKATHON WINS
The purpose of speech, braille and sign are communication, and the research in these areas primarily deals with processing, representation of speech signal, recognization of typed braille or presented sign that leads to the development of voice, braille and sign based interfaces for man-machine interaction. Such natural interfaces enable access to information via hands-free mode, to literate, illiterate as well as divyagjan people. The thrust of our activity in Speech and Multimodal Laboratory is on the development of speech, braille and sign based interfaces for man-machine interaction. The objective of Speech and Multimodal Laboratory is to conduct goal-oriented basic research, and thus we address fundamental issues involved in building robust speech-to-text systems, text to sign generation and text to braille generation applications.
ME Fellowship Student
Expertise in Speech Processing, Augmentation Techniques, Text Processing, Software Planning and Execution
Email - : puneet.bawa@chitkara.edu.in
ME Fellowship Student
Expertise in Back-end Development, Automatic Speech Recognition
Email - : taniya@chitkara.edu.in
Skills- Automation, Hardware Fabrication, Back-End Development
Skills- UI Development
Skills- Linguistics & Manual Testing
Placed (Stryker, Gurgaon)
Placed (LIDO, Noida)
Placed (Sears Holdings,Pune)
Placed (JTG, Gurgaon)
Placed (Gojek,Mumbai)
Placed (Nagarro, Gurgaon)
Placed (Privafy Technology, Banglore)
CURIN AI Faculties Attended RPA Workshop At NITTTR Chandigarh From 22-26 July 2019.CURIN faculty members Dr Luxmi Dapra, Dr Nitin Goyal and Dr Virender Kadyan of the M Tech (AI) course attended the week long training at NITTTR, Chandigarh on Robotic Process Automation being conducted by UiPath. |
![]() |
---|---|
Dr. Virender Kadyan Conducted Speech Recognition Using Machine Learning Steam School Workshop From 18-22 Feb 2019Dr Virender Kadyan Assistant Professor-Research and his team members Ms Sashi Bala, Mr Puneet Bawa and Mr Rishab researchers of Speech & Multimodal Laboratory has conducted the Batch-II of Steam School 2019 workshop on Speech Recognition using Machine learning from 18-22 Feb 2019. Dr Kadyan gave an introductory session on Machine learning approaches in aspects of recognition of uttered words. Day 2 has been designed so that students can perform Hands on sessions on Speech to text recognition systems. On day3 students learnt about the formation of Chat Bots and also tried to embed it with uttered speech signals. Finally students build their own Chat Bots on Day4 and day5 with sample text corpus in Hindi or English languages. |
![]() |
Dr Virender Kadyan Organised 10 Days Research Induced Training-V From 18 Dec To 29 Dec 2018Dr Virender Kadyan Assistant Professor CURIN has organised 10 Days Research Induced Training-V from 18 Dec to 29 Dec 2018o. The training involved four modules for BE/BCA students of Chitkara University. Valedictory of the event is graced with presence of Chief Guest Dr Archana Mantri Pro-VC CURIN and Dr S N Panda Director Research along with faculty members. |
![]() |
Dr Virender Kadyan Assistant Professor Has Conducted The Day 2 Of Introduction To Linux Steam School 2018Dr Virender Kadyan Assistant Professor CURIN has conducted the day2 of Introduction to Linux Steam School 2018. He has taught the basics of shell scripts. Students have performed hands-on practice on their Linux machines with some scenario based examples. |
![]() |
Summer School On Research Trends In Network Security, Machine And Deep Learning, Image And Multimedia Processing, June 201810 days summer school is successfully organized on Research Trends in Network Security, Machine and Deep Learning, Image and Multimedia Processing by Dr K R Ramkumar in support with Dr Shefali Dr Deepika Kaundal Dr Virender kadyan and Er Sarvesh CURIN. Participants gain knowledge on interdisciplinary areas of different research domains. Participants learned the art of writing a research paper. Around 52 research papers are generated as an outcome of the workshop. |
![]() |
Techno SoundsTechno Sounds 2016 Ist and IInd round were held on 13 October 2016 and 10 November 2016 at Speech and Multimodal Laboratory in association with CSI student Chapter at Chitkara University, Punjab.Er. Virender Kadyan, Assistant Professor, shared in brief about Techno Sounds 2016 and provide opportunity to students to show case their innovative ideas. A total of 24 teams registered for the event in the Ist round, out of which 9 teams got selected for second round to represent the prototype of their ideas. Mr. Vinay Kukreja, Assistant Professor , CSE department and Mr. Jaswinder Singh, Assistant Professor, CA department judged the event. The event was succesfully managed by Ms. Kanika and event coordinators. Further, shortlisted teams were qualified for the IIIrd round held on 24 January 2017, in which students had to showcase their complete product. |
![]() ![]() |
Summer SchoolChitkara University Research and Innovation Network (CURIN) had organised a Summer School from June 20, 2016 to July 2, 2016. This Summer School included 16 workshops in most relevant topics of research and academics ranging from Computer Science, Electronics and Communication to Mechanical Engineering. These workshops were specially designed to cater the need of state of the art research and development in forefront areas. The main goal of the Summer School was to provide knowledge to students, research scholars and academicians to start research and development in pioneer research areas. |
![]() |
ASR using Word Based Modeling, 18 March, 2016Dr. Amitoj Singh (Assistant Director-Research) and Er. Virender Kadyan (Assistant Professor-Research) organized one day workshop on Hindi ‘ASR using Word Based Modeling’ at CU, Himachal campus. The audiences for this workshop included the students of CSE/ECE department. This workshop threw light on practical feasibility of making ASR for own language. Dr Amitoj began his talk by giving an introduction to Language Processing and its importance in a researcher’s life. The focus was on how to select and process a particular language. He then explained present state of availability of ASR system in Indian Language. In another session, Er Virender, gave a hands-on experience on How to recognize a particular word in Hindi language using HTK toolkit. He addressed the use of various acoustic modeling technique to process a small vocabulary Hindi ASR system |
|
Automatic Speech Recognition , 9-10th Oct, 2015Dr. Amitoj Singh, Assistant Director, Mr. Virender Kadyan, Assistant Professor and Mr. Vinay Kukreja, Assistant Professor, CURIN, CU, Punjab conducted a two-day workshop on “Automatic Speech Recognition” under ACM Student Chapter. Guest speakers enriched the workshop with their knowledge and skills and imparted practical sessions to engross students for inter-relating theoretical skills with practical implementation. Dr. Sumapreet Kaur from Punjabi University, Patiala gave an introduction on Phonology and Linguistics. Mrs. Rupinder Kaur from Thapar University, Patiala encouraged a hands-on session for building ASR system using HTK. Dr. Wiqas Ghai imparted hands-on session on Phone Based Modelling using HTK. The workshop emphasized an active participation of students as well as faculty members from within and outside University. |
![]() ![]() |
Abstract: Success of any commercial Automatic Speech Recognition (ASR) system depends upon availability of its training data. Although, it's performance gets degraded due to absence of enough signal processing characteristics in less resource language corpora. Development of Punjabi Children speech system is one such challenge where zero resource conditions and variabilities in children speech occurs due to speaking speed and vocal tract length than that of adult speech. In this paper, efforts have been made to build Punjabi Children ASR system under mismatched conditions using noise robust approaches like Mel Frequency Cepstral Coefficient (MFCC) or Gammatone Frequency Cepstral Coefficient (GFCC). Consequently, acoustic and phonetic variations among adult and children speech are handled using gender based in-domain training data augmentation and later acoustic variability among speakers in training and testing sets are normalised using Vocal Tract Length Normalization (VTLN). We demonstrate that inclusion of pitch features with test normalized children dataset has significantly enhanced system performance over different environment conditions i.e clean or noisy. The experimental results show a relative improvement of 30.94% using adult female voice pooled with limited children speech over adult male corpus on noise based training data augmentation respectively.
Abstract:Processing of low resource pre and post acoustic signals always faced the challenge of data scarcity in its training module. It’s difficult to obtain high system accuracy with limited corpora in train set which results into extraction of large discriminative feature vector. These vectors information are distorted due to acoustic mismatch occurs because of real environment and inter speaker variations. In this paper, context independent information of an input speech signal is pre-processed using bottleneck features and later in modeling phase Tandem-NN model has been employ to enhance system accuracy. Later to fulfill the requirement of train data issues, in-domain training augmentation is perform using fusion of original clean and artificially created modified train noisy data and to further boost this training data, tempo modification of input speech signal is perform with maintenance of its spectral envelope and pitch in corresponding input audio signal. Experimental result shows that a relative improvement of 13.53% is achieved in clean and 32.43% in noisy conditions with Tandem-NN system in comparison to that of baseline system respectively.
Abstract: Despite the number of developed Automatic Speech Recognition (ASR) systems for different languages, still no work has been done on children's speech of Punjabi language. Due to the unavailability of children's speech corpus for Punjabi Language, it is a challenging task to collect speech data. In our current work, efforts have been made to collect Punjabi children's speech corpus and build Children ASR system for Indian regional Punjabi language. The recognition rate of ASR systems is observed to be improved drastically by the emergence of Deep Neural Networks (DNN). In our work, the DNN acoustic model has been implemented by varying number of hidden layers. Approximately four hours of Punjabi children's speech corpus has been collected and several experiments have been performed using the DNN modeling technique. Experimental results have revealed that the system has attained 87% accuracy.
Abstract: In this paper, the Punjabi children speech recognition system is developed using Subspace Gaussian mixture models (SGMM) acoustic modeling techniques. Initially, the system is dependent upon Mel-frequency cepstral coefficients (MFCC) approach for controlling the temporal variations in the input speech signals. Here, SGMM is integrated with HMM to measure the efficiency of each state which carries the information of a short-windowed frame. For handling the children speaker acoustic variations speaker adaptive training (SAT), based on vocal-tract length normalization and feature space maximum likelihood linear regression is adopted. Kaldi and open-source speech recognition toolkit is used to develop the Robust Automatic Speech Recognition (ASR) System for Punjabi Children's speech. S GMM accumulate the frame coefficients and their posterior probabilities and pass these probabilities to HMM which systematically fit the frame and output have resulted from HMM states. Therefore, the achievement of SGMM has gotten a large performance margin in Punjabi children speech recognition. A remarkable depletion in the word error rate (WER) was noticed using SGMM by varying the feature dimensions. The developed children ASR system obtained a recognition accuracy of 83.66% while tested by varying the feature dimensions to 12.
Abstract: Despite significant progress has been made in building of ASR system for various adult speech, whereas the children ASR system is still in infant stage for Indian languages. To build Punjabi children speech recognition is one such challenge because of unavailability of zero-speech corpus. In this paper, efforts have been made to build small vocabulary Punjabi continuous children speech corpus. In explored system, four variations of bMMI discriminative techniques have been perform on two context models: Dependent and Independent. Experiment result have shown that system attains Relative Improvement (RI) of 22-26% on fbMMI and fMMI acoustic model as compared to other approaches. Various combination of parameter has been implemented with variation in boosted parameter and iteration values to obtain optimal value of bMMI and fbMMI acoustic models.
Abstract: Robustness of the automatic speech recognition (ASR) system relies upon the accuracy of feature extraction and classification in training phase. The mismatch between training and testing conditions during classification of large feature vectors causes a low performance. In this paper, the issue of robustness of acoustic information is addressed for practical Punjabi dataset. Traditional feature extraction approaches: mel frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) face the issue of high variance with leakage of spectral information. Also, handling of the huge number of feature information creates chaos for large speech vocabulary. To overcome this dilemma, a Principal component analysis (PCA) based multi-windowing technique is proposed with the incorporation of baseline GFCC and MFCC based feature approaches after the tuning of taper parameter. The proposed integrated approaches result in better feature vectors, which are further processed using differential evolution + hidden Markov model (DE + HMM) based modelling classifier. The integrated approaches show substantial performance for word recognition as compared to the conventional or fused feature extraction systems.
Abstract: A baseline ASR system does not perform better due to improper modeling of training data. Training of system through conventional HMM technique faced the issue of on or near manifold in data space. In this paper, Hybrid SGMM-HMM approach is compared with baseline GMM-HMM technique on Punjabi continuous simple sentences speech corpus. It examined the hybridized HMM technique: SGMM-HMM to overcome the problem of sharing of state parameter information throughout the training and testing of system. The system testing is performed on Kaldi 4.3.11 toolkit using MFCC and GFCC approaches at the front end of the system. The result obtained on SGMM-HMM modeling technique generates an improvement of 3–4% over GMM-HMM approach. The experiments are performed on real environment dataset.
Abstract: Dengue is a viral disease which affects public health every year in global wise. Every change in climate in a particular location increases the probability of spreading dengue disease in that domain. There are number of health schemes running by governments to prevent and control dengue disease at early stages. The usage of information technology helps to achieve this goal. There is a huge need to develop machines which enable medical technicians to detect dengue disease at early stages. For achieving this goal, the author efforts to detect dengue dataset which helpful to develop a machine learning prediction model for dengue disease. The author conducts analytical study by collecting symptoms and clinical tests conducting by researcher in the same domain. To detect important factors of dengue, the author uses the statistical and support machine. Here, author effort shows four important factors fever, headache, skin rash and abdominal pain used to detect dengue at early stage.
Abstract: India is the land of language diversity with 22 major languages having more than 720 dialects, written in 13 different scripts. Out of 22, Hindi, Bengali, Punjabi is ranked 3rd, 7th and 10th most spoken languages around the globe. Expect Hindi, where one can find some significant research going on, other two major languages and other Indian languages have not fully developed Automatic Speech Recognition systems. The main aim of this paper is to provide a systematic survey of the existing literature related to automatic speech recognition (i.e. speech to text) for Indian languages. The survey analyses the possible opportunities, challenges, techniques, methods and to locate, appraise and synthesize the evidence from studies to provide empirical answers to the scientific questions. The survey was conducted based on the relevant research articles published from 2000 to 2018. The purpose of this systematic survey is to sum up the best available research on automatic speech recognition of Indian languages that is done by synthesizing the results of several studies.
Abstract: Image fusion acts as a powerful tool in the medical domain. It is an essential method for enhancing the quality of images by combining the complementary images which are captured from various sensors or cameras. The aim of multi modal image fusion technique is to obtain a single fused image by fusing the images of different modalities. It is widely used in various clinical applications for better diagnosis of several types of diseases. In this paper, comparative analysis has been made using various multi modal techniques in medical domain such as guided filter, multi resolution singular value decomposition and principal component analysis. The quantitative and qualitative results have been taken. The experiment results indicated that guided filter method is more efficient as compared to other methods in terms of evaluation metrics such as standard deviation is 29.8, mean is 52.3, entropy is 2.8 and fusion information score is 0.8. It has also been observed that guided filter is able to preserve edges efficiently and is more suitable for real applications.
Abstract: HMM is regarded as the leader from last five decades for handling the temporal variability in an input speech signal for building automatic speech recognition system. GMM became an integral part of HMM so as to measure the efficiency of each state that stores the information of a short windowed frame. In order to systematically fit the frame, it reserves the frame coefficients and connects their posterior probability over HMM state that acts as an output. In this paper, deep neural network (DNN) is tested against the GMM through utilization of many hidden layers which helps the DNN to successfully evade the issue of overfitting on large training dataset before its performance becomes worse. The implementation DNN with robust feature extraction approach has brought a high performance margin in Punjabi speech recognition system. For feature extraction, the baseline MFCC and GFCC approaches are integrated with cepstral mean and variance normalization. The dimension reduction, decorrelation of vector information and speaker variability is later addressed with linear discriminant analysis, maximum likelihood linear transformation, SAT, maximum likelihood linear regression adaptation models. Two hybrid classifiers investigate the conceived acoustic feature vectors: GMM–HMM, and DNN–HMM to obtain improvement in performance on connected and continuous Punjabi speech corpus. Experimental setup shows a notable improvement of 4–5% and 1–3% (in connected and continuous datasets respectively).
Abstract: Punjabi is tonal as well as under resource language among all the Indo Aryan languages of the Indo-European family. A vast number of variations in language lead to challenges while designing an Automatic Speech Recognition (ASR) system. Therefore, it turned out to be a matter of extreme concern to study the essential features such as tone of the language for designing an effective ASR. This paper lays its focus upon the variation of tonal characteristics of Punjabi dialect. The speech corpus has been collected from native speakers of Punjab (including all the various dialects) and also covering the areas under the Himachali belt of Punjab. The result analysis shows that tonal words and dialectal word information caste a major impact on the information conveyed by the speaker. The analyzed data shows pitch variations in tonal words that vary from region to region. The experiments are performed by using Praat toolkit for calculating F0 value; then depending upon the pitch and frequency variations, we have studied that tonal words show dialectal variations when the similar sentence is spoken by speakers of different regions.
Abstract: Punjabi language has almost 105 million native speakers and faced the challenge of less resource. The Punjabi ASR system has little research as compared to other Indian languages. This paper examines the continuous vocabulary of Punjabi language using Sphinx toolkit. The proposed work has been implemented on speaker-independent and speaker-dependent speakers in different environmental conditions. The Punjabi ASR system has been trained on 442 phonetically rich sentences using 15 speakers (6 Male and 9 female). The system adopts MFCC at the front end and HMM at the modelling phase to extract and classify feature vectors. The simulation result demonstrates the performance improvement of 93.85% on speaker-dependent dataset and 89.96% on speaker-independent dataset.
Abstract: An automatic speech recognition system follows an approach of pattern matching, which consists of a training phase and testing phase. Despite advancement in training phase, the performance of the acoustic model is adverse while adopting the statistical technique like hidden Markov model (HMM). However, HMM-based speech system faces high computational complexity and becomes challenging to provide accuracy during isolated Punjabi lexicon. As the corpus of the system increases, the complexity of training phase will also increase drastically. The redundancy and confusion occurred between feature distributions in training phase of the system. This paper proposes an approach for the generation of HMM parameters using two hybrid classifiers such as GA+HMM and DE+HMM. The proposed technique focuses on refinement of processed feature vectors after calculating its mean and variance. The refined parameters are further employed in the generation of HMM parameters that help in reduction of training complexity of the system. The proposed techniques are compared with an existing technique such as HMM on benchmark databases and self-developed corpus in clean, noisy, and real-time environments. The results show the performance improvement in pattern matching of spoken utterance when demonstrated on large vocabulary isolated Punjabi lexicons.
Abstract: Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers.
Abstract: Speech to text conversion in various languages have been performed so far but no process has defined for the Kashmiri language. There has been no research done on Kashmiri speech recognition. So in this work, we describe the development as well as implementation of first CMU Sphinx-3 based speech recognizer for the Kashmiri language. Recognition of the words have been done by using hidden markov models (HMMs). Dictionary consists of 100 words, representing Kashmiri digits from one (akh) to hundred (hat). Here, we developed a speaker independent, Kashmiri - Automatic Speech Recognition (KASR) system. The System is trained and tested for 1200 words spoken by 12 male and female speakers. Maximum Accuracy of 78.33% was achieved by the K-ASR system.
Abstract: A Hindi dialect (Bangro) Spoken Language Recognition (HD-SLR) System is designed to recognize language from a given spoken utterance. Paper focuses on the influence of Hindi dialects, i.e., Haryanvi spoken by males and females of different age groups ranging from 18 to 40 years. The system is trained and tested with the help of Sphinx3 toolkit on Linux platform. Also, it has been tried with semicontinuous speech corpus in clean environment of around 5 h that includes 1000 distinct Hindi dialect words spoken in different parts of Haryana. The dialectal information of the input speech signals is extracted with the help of MFCC technique and the same system is then tested on the basis of utterance level. The Speaker Independent Semicontinuous (SISC) word recognition system has an average of 75–85 % accuracy rate by native and nonnative speakers of Hindi dialect.
Abstract: Natural language and human–machine interaction is a very much traversed as well as challenging research domain. However, the main objective is of getting the system that can communicate in well-organized manner with the human, regardless of operational environment. In this paper a systematic survey on Automatic Speech Recognition (ASR) for tonal languages spoken around the globe is carried out. The tonal languages of Asian, Indo-European and African continents are reviewed but the tonal languages of American and Austral-Asian are not reviewed. The most important part of this paper is to present the work done in the previous years on the ASR of Asian continent tonal languages like Chinese, Thai, Vietnamese, Mandarin, Mizo, Bodo and Indo-European continent tonal languages like Punjabi, Lithuanian, Swedish, Croatian and African continent tonal languages like Yoruba and Hausa. Finally, the synthesis analysis is explored based on the findings. Many issues and challenges related with tonal languages are discussed. It is observed that the lot of work have been done for the Asian continent tonal languages i.e. Chinese, Thai, Vietnamese, Mandarin but little work been reported for the Mizo, Bodo, Indo-European tonal languages like Punjabi, Latvian, Lithuanian as well for the African continental tonal languages i.e. Hausa and Yourba.
Abstract: In modern speech recognition systems, there are a set of Feature Extraction Techniques (FET) like Mel-frequency cepstral coefficients (MFCC) or perceptual linear prediction coefficients (PLP) are mainly used. As compared to the conventional FET like LPCC etc, these approaches are provide a better speech signal that contains the relevant information of the speech signal uttered by the speaker during training and testing of the Speech To Text Detection System (STTDS) for different Indian languages. In this dissertation, variation in the parameters values of FET’s like MFCC, PLP are varied at the front end along with dynamic HMM topology at the back end and then the speech signals produce by these techniques are analyzed using HTK toolkit. The cornerstone of all the current state-of-the-art STTDS is the use of HMM acoustic models. In our work the effectiveness of proposed FET(MFCC, PLP features) are tested and the comparison is done among the FET like MFCC and PLP acoustic features to extract the relevant information about what is being spoken from the audio signal and experimental results are computed with varying HMM topology at the back end.
Abstract: This paper frames correlation on three feature extraction techniques in ASR system. As compared to primarily used technique called MFCC (Mel Frequency Cepstral Coefficients), PNCC (Power Normalized Cepstral Coefficients) obtains impressive advancement in noisy speech recognition due of its inhibition in high frequency spectrum for human voice. The techniques differ in the way as MFCC uses traditional log nonlinearity and PNCC processing substitute the usage of power-law nonlinearity. Experimental results relay on the fact that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC as well as PLP (Perceptual Linear Prediction) processing for speech recognition in the existence of various types of additive noise and reverberant environments with marginally greater computational cost and the with the usage of clean speech, it does not lowers the decoding accuracy.
Abstract: Green computing refers to the process of improving the efficiency of computing devices and reducing its negative impact on mankind and environment. Over the years the idea of green computing has attracted the world due to its environment benefits. At present green computing is under the consideration of businesses organizations and IT industries to improve environmental conditions for the better living of human being. It is an effective approach to protect our environment from the harmful effects of toxic material used during the manufacturing of computing devices. This paper is an attempt to discuss the need of green computing and the steps required to be taken to improve the environmental condition in current era.
Winner of ‘Most International Project’ in University Of Nottingham, China at Global Ingenuity’18 Finals |
![]() |
---|---|
Dr. Virender Kadyan Wins 1st Prize and is Invited to United Nations Headquarters |
![]() |
Winner at ACM India Chapter Technology Solutions Contest 2019 |
![]() |
'Winner' at the Novate+ Himachal Pradesh winning grant of ₹2,50,00 |
![]() |
Winner of Punjab Innovation Summit 2019 |
![]() |
Speech Lab members presented BrilTab Edukit-1 and Child-Ensign project at UN House, Delhi on 7 Sept 2019 |
![]() |
Winner of HACKTIET’19, A 24-Hour Hackathon Organised By DSC, Thapar And GirlScript, Patiala |
![]() |
Winner of Chitkara Master Code Chef - Season 3 (Code Hackathon) |
![]() |
Winners Of UN Influx-Global Hackathon 2017 |
![]() |