not-a-bot - LetMeSpeak

## ℹ️ Project information

- **Project Name**: LetMeSpeak
- **Short Project Description**: This is an AI-backed, real-time, sign language (ASL) gesture recognition platform.
- **Team ID**: HX018
- **Team Name**: not-a-bot
- **Team Members**: 
[Tuhin Sarkar](https://github.com/tuhinsarkar7/)
[Vatsal Rathod](https://github.com/vatsal-rathod/)
[Sarvesh Shroff](https://github.com/sarveshshroff/)
                           
- **Demo Link**: [Youtube Video](https://youtu.be/v8Lo2EIrgHc)
- **Repository Link**: [https://github.com/vatsal-rathod/letmespeak](https://github.com/vatsal-rathod/letmespeak)
- **Labels**: Influence the Masses, Enhance the Social Norms, AI/DL, CNN, Image and Video Processing (openCV)

## 🔥 Your Pitch
Almost 3 million people in India and 30 million worldwide suffer from acute speech or hearing disability according to the World Disability Report by the International Journal of Speech-Language Pathology. The day to day life of these individuals is somewhat different but consists of almost everything every other human does, except communication. The most widely used language by such people is the American Sign Language, that consists of hand gestures to convey alphabetical, numerical letters as well as words. This language, however, is understood only by a minuscule of people with no disability, causing problems for the speech and hearing impaired.

Therefore, we decided to build a platform that takes in real-time videos of the ASL hand gestures and converts into the word(s) it corresponds to. In order to build that, we focused on four main things:

Data Creation We started with writing a python script that uses OpenCV among other libraries to create a 'hand histogram', in order to form a detection boundary of the perimeter of the palm and fingers. This will help later in gesture capture. Another python script takes in video gestures as inputs and outputs a folder with 1200 image frames of the same labeled beforehand. This adds scalability as almost all words from ASL can be incorporated without any strain while we only take 44 words/letter into training for the scope of the event.

Data Preparation Since we have 1200 grayscale labeled image frames of each gesture, end up with almost a hundred thousand images. We augment the data by flipping it, it also is noteworthy that this step is also necessary since the gestures are single-handed and flipping takes into consideration the case of left-handed people.

Model Training We train the data over a standard CNN and save the model in the h5 file provided. We encountered a training accuracy of 98.6% and a validation accuracy of 99%. Note: This is due to using a rather simple model over a moderate size of data to reduce overfitting. Also, the test accuracy reaches around 98%.

Prediction over live-feed The last python script is responsible to take in video input, convert it into grayscale and predict the gesture over the trained model. It prints the word(s) in real-time.

## 🔦 Any other specific thing you want to highlight?
We chose 'Influence the Masses' as the speech and hearing-impaired community has been suppressed for a long time due to their disability. This not only revolutionizes as to how these people will communicate in the future but also provides a more open, accepting take on the general people's perception of this community. This helps them become more 'abled'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

not-a-bot - LetMeSpeak #24

ℹ️ Project information

🔥 Your Pitch

🔦 Any other specific thing you want to highlight?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

not-a-bot - LetMeSpeak #24

Description

ℹ️ Project information

🔥 Your Pitch

🔦 Any other specific thing you want to highlight?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions