The NLP Summit was an amazing experience for all of us but even more for all the many wonderful companies, Data Scientists, NLP practitioners, and students that participated in our keynotes, panels, and presentations.
During The NLP Summit, we also run our first 48 hours Hackathon, from 2 pm ET on the 6th of October. 160 participants took up the challenge and they all poured their hearth and soul into their projects.
Maziyar Panahi (Spark NLP Lead), Armaan Gupta (CEO at Kreative), and Ida Lucente, (Marketing Manager at John Snow Labs) have all judged the submitted projects, correspondently, for their Technical Accuracy, Design Development, and Social Impact.
First Prize – Dr. Phyto: Solutions in Agriculture to save our Planet Empowering our Farmers for a Better Future
The winners are Hrishabh Digaari, Harshvardhan Singh, Dhruv Bhargava, and Shikhar Vaish.
The agricultural industry is the backbone of our country’s economy, yet how we can empower and support our local farmers are limited. Their vision is to innovate in the agriculture industry and help the local farmers increase their crop yields.
With global warming, climate change, and water pollution on the rise, we could be dealing with a harsh future.
TechCrunch reports that 40% of the world’s crops are lost to disease, they want to be a breakthrough for agriculture, especially in poorer countries.
It is not uncommon when we get to hear news about the farmers of our country committing suicides majorly due to crop failures, rising debts, failure of new farming techniques, etc. One of the banes of farming is the frequent loss of crops due to adverse weather conditions, lack of knowledge of farming techniques, and not taking the right action at the right point in time during pest attacks.
The farmers need to travel across villages, to their nearest Kisan Support Centre with a leaf sample, to get the right piece of advice from experts to solve their issues related to crops, fertilizers, and pesticides.
The winning team has a mission to empower the agriculture industry by bringing the latest technology to them, which could provide them with intelligent solutions to their queries.
Their goal is to use Artificial Intelligence to increase crop yields and reduce food waste, with the hope that this technology can help feed the world’s growing population by optimizing agricultural methods.
For this reason, they have combined their knowledge of Machine Learning, NLP, Computer Vision and develop a Robust Chatbot: Dr. Phyto to automate tedious tasks like measuring crop quality, solving basic queries of the farmers, eliminating the need for traveling across villages during the pandemic, forecasting weather conditions, crop disease detection, etc.
This won’t just speed up the process, but it will also help farmers to identify plants that have diseases and take necessary actions accordingly.
Their objective was to create a platform to empower farmers and provide them with the knowledge to make them self sufficient. They researched various problems and common queries faced by the farmers daily. They came across many queries by the Farmers to the Kisan Call Centre(KCC) at Open Government Data Platform India. This served as their primary dataset for chatbot queries.
The idea was to raise awareness among the farmers and solve their queries while they can sit at their farms and avoid traveling during the pandemic. They could get all of their queries resolved through our chatbot Dr. Phtyo. They intended to make them aware of the latest government schemes and how it would benefit them, let them know of the weather forecasts for the next 2–3 days so that they could accordingly prepare for the same, and also measure the quality of crops and look for possible pest attacks to give them the right remedies at the right point of time. Their robust computer vision model could detect the name of the crop and the disease that would be possibly affecting it.
They were aware that their target audience was a farmer who might or might not be used to such applications, so they ensured that the UX /UI should not only be pleasing to their eyes but also easy to comprehend and navigate.
Meet Dr. Phyto – Design Innovation
User Experience was a really important aspect of this project since the target group had limited experience with technology. With time being their biggest constraint, they aimed to not create the perfect product but to create a reliable system. Using the data available on Open Government Data Platform India, they were able to analyze that a majority of queries, that a farmer has, are related to weather conditions and the latest government schemes. Through User research, they were able to determine that there exists a gap of communication between the agricultural discoveries and beneficial methods and local farmers who still rely on traditional knowledge. The app starts with onboarding screens which help the user ease into Phyto’s experience.
The Home screen was divided into two sections:
- An article widget that displayed educational and informative articles related to agriculture and farming.
- The weather widget displayed comprehensive and in-depth data about weather conditions.
The purpose of these sections is to provide an incentive for farmers to open the App even when they don’t have a query.
The project comprises 4 modules:
- An End-to-End Computer Vision pipeline for Classification and Detection of Crop Diseases using TensorFlow
- A mobile application developed in Flutter with an elegant interface to interact with the chatbot where the users can send their queries along with relevant images
- Google’s Dialogflow cloud services
- A js server to serve as a pipeline between user’s queries and Google’s DialogFlow
The user is expected to upload an image of the crop’s leaf he/she wants to diagnose. The image is then made to pass through their robust End-to-End Computer vision model, that was stored on the server, and generates a unique token ID for each type of crop and its disease. Few examples are as follows:
‘Grape__healthy’, , ‘Strawberry_Leaf_scorch’, ‘Strawberry_healthy’, ‘Tomato_Leaf_Mold’
Then these tokens were pipelined into our Dialogflow chatbot service as user input for interpretation. All of this is wrapped up inside a docker container for quick deployment. This docker image uses TensorFlow’s official docker image as the base image to makes things easier.
Upon reaching the chatbot, specific responses get triggered by respective Token IDs which were then returned to the user on the application’s interface.
The Team Behind Dr. Phyto
They were not expecting anything from the results but were satisfied with their work.
The team said: “We still remember when we were on the call, half aware of what was happening around us when Harshvardhan screamed: “We’ve Won!”. “There was more of incredulity in his voice than excitement. But not to blame him, all of us were pretty stupefied even though we had read the congratulations mail about 50 times. We were so happy yet surprised that we even texted one of the officials of the summit just to make sure it was true! It was truly a magnificent experience for all 4 of us.”
Second Prize – Unicorn NLP – Language Understanding APIs for Reviews/User-Generated Content
The winners are Tom Krupa and Aleksandra Greber.
They are co-founders of UnicornNLP, a company that develops human-like Language Understanding APIs for reviews and other user-generated content. Each of our products is a set of Semantic Models designed for a specific domain (Healthcare Data, App Reviews, Hotels Reviews, Restaurants Reviews, etc.).
Tom has 13+ years of experience in the NLP/NLU field and has worked for more than 8 years with reviews and other user-generated content. Aleksandra designed Sentiment Analysis 2.0 for Reviews and wrote most of the semantic code for it (in QL4Reviews).
They make a great team and created systems for processing reviews and other user-generated-content in different areas: banking, e-commerce, cosmetics, hotels, surveys, NPS (net promoter score), and telecoms. They believe that Healthcare Data has a huge potential, especially nowadays in providing insights and tools both for patients and organizations.
They specialize in unstructured data e.g. patients’ drug reviews which we used during Hackathon and transform it into structured knowledge. Patient reviews were about their experiences with specific drugs, negative and positive effects of using them. They analyze the opinions, extract valuable information, and build Models that can be used for example for opinion summarization.
The models with the most occurrence are:
- Nausea (6,3% texts)
- Weight gain (6,3% texts)
- Headache (4,8% texts)
- Diarrhea (3,5% texts)
- Fatigue (3,3% texts)
- Rash (2,8% texts)
Extracting information from unstructured Healthcare texts e.g., follow-up appointments, vitals, charges, orders, encounters, and symptoms (some experts estimate that they are 80 percent of all patient data available) can have a large-scale impact and improvements in Healthcare.
NLP gives great analytical driven opportunities to improve healthcare outcomes.
3rd price – Covid-19 case fatality versus country development)
The winner is Diana Roccaro.
She holds a Ph.D. in Neurosciences while being a Machine Learning Engineer & passionate gardener with 2 daughters and she also collaborates in an Omdena project on the prevention of online violence against children.
The associations between the case fatality of infection with Sars-Cov2 and risk factors such as hypertension and certain other medical conditions are the focus of extensive research and already well established. Researchers active in the medical field however typically neglect factors outside their domain of expertise, just as researchers in the economic domain probably neglect some of the medical aspects that might be relevant for a given topic of investigation.
Since the framework of this hackathon of the very first NLP Summit allowed me to access data belonging to many different domains, and more specifically to merge health-related with economic factors, she decided to investigate the relationships between the likelihood to die from a Covid-19 disease, and factors related to a country’s developmental stage.
With the help of rather basic NLP, she scraped the CORE Repository for scientific publications related to the effects of Covid-19 on mental health and extracted the frequency of country names mentioned in the 7’696 publications using the GeoText package. She further accessed the official dataset on clinical trials conducted around Covid-19, from Clinicaltrials.gov to calculate the number of clinical trials conducted per country, again using GeoText(/NLP).
She merged these two parameters obtained using NLP with selected parameters from four different datasets downloaded from the collection of John Snow LABS to obtain a total of 37 parameters. Based on population size and the number of physicians, she calculated physician density for each country.
She fed these data into StatPlanet, a tool freely available for non-commercial purposes, to visualize the 37 parameters for the 51 countries that contained values for all of these parameters. A correlation matrix, together with a quick evaluation using the Data Analysis Baseline Library (dabl) revealed the Covid-19 case fatality to correlate most strongly with:
- the percentage of women in parliaments
- the country itself as an independent feature
- the fraction of the population living below the country-specific poverty threshold
- the number of Covid-19 tests conducted per 1M inhabitants
- the Gross Domestic Product. Correlation is not causation, and there are for sure confounding factors underlying the link between women in parliaments and case fatality of Covid-19
Data Analysis Baseline Library (dabl) Report
Altogether, these results seem to indicate that factors related to a country’s development stage are much stronger predictors of Covid-19 case fatality than health-related parameters such as the prevalence of obesity or the mean BMI.
On receiving her award, Diana said “My very spontaneous decision to still join the competition 35 minutes after its kick-off resulted in an amazing experience I won’t forget anymore. This was my very first hackathon and I was completely flashed by the number of datasets we could choose from. I’m very much looking forward to the NLP Summit’s 2nd edition and hoping to find some nice collaborators by then!”