As voice-enabled systems are gaining more and more popularity, the success of these systems relies not only on the correct recognition of what the user said but also on what the user meant.
Spoken Language Understanding (SLU) focuses on interpreting these user’s intentions from their speech utterances. SLU as compared to traditional text Natural Language Understanding (NLU) systems has its own challenges since spoken language is noisier than written language. Moreover, when SLU is applied to the task of understanding command-style short utterances having low context, it adds to the complexity.
In addition, there are challenges related to the Entertainment domain, where entities such as movies, music artists, music albums can have unique creative names often involving wordplays. The constant and rapid release of new content in this domain and their overlapping names across different entity types, such as movies and music albums, pose additional challenges.
The session tries to highlight some of these key challenges and some of the state-of-the-art approaches used to generally address these.