From the Members of the DSPG team at Digite Inc. (Siddhant Bane, Shashank M, Mitesh Gupta and Ashwin Swarup)
<h4style=”text-align: justify;”>Some Context
With more than two decades of experience in the application life cycle management space, we regularly help companies with their digital transformation by using our suite of products.
<pstyle=”text-align: justify;” data-selectable-paragraph=””>Given the fact that we proselytize, the Agile, Kanban and Chaos way of working, stand-up calls are very important to us. In fact, you wake up anyone of our employees at midnight and they will swear an oath of allegiance to the “stand-up” call.
The other thing we swear by is metrics. Observable metrics like burn-down charts, etc are one thing, we have been doing that for so long we are good at it. But how about Qualitative metrics? How do you measure qualitative stuff like team cohesion, team sentiment, objective priority and then use them to manage projects? Daily stand-up calls are an excellent way of capturing this information.
With COVID-19 forcing us to move all stand-up meetings to Zoom, Hangouts and Microsoft Teams it became an opportunity for us to explore these use-cases (Goodbye Cocktail Party Problem!!).
The first step to attempting this large problem is to capture the summary of the meeting.
What we describe next is a low code experiment we have conducted and its results by using Google Speech to Text (STT) and GPT-3 (which is under development; so shoutout to the Open-AI folks)
Problem Breakdown
We break the problem down into three broad steps.
1. Transcribe the audio: Given that all the speech is over multiple channels we found that the Google TTS engine did an excellent job. To run rapid experiments, we chose to use a chrome extension (Meet Transcript) which dumps the transcript of the entire conversation into a google word doc for further processing. We chose a sample from our daily calls for this experiment.
How the Google STT engine is used to store the conversation
2. Pre-process the text: All our additions in pre-processing were to improve the performance of subsequent steps. Other than the usual special character removal, nonsense word removal, there were other steps that we have included in our code.
Its funny that for something that is “-Pre” we think a lot more about it in “-Post “.
3. Summarize the transcript: This is where GPT-3 shines. Given a meeting transcript to the GPT-3 davinci engine, can it extract relevant details that can be acted upon by the team? That means we are not just looking at a summary but an actionable summary.
The code for this blog post is shared in this google collab notebook.
What follows is a discussion of the experiment and the subsequent results. We were like kids with shiny toys after the first experiment.
Solution First Iteration:
First iteration involved directly dumping the entire transcript and getting a results from GPT-3
We wanted to take GPT-3 out for a ride with zero prework. That means we wanted to understand how robust the solution was in its “off-the-shelf “state. The STT engine was itself a source of error but we found that GPT-3 was able to overcome this problem at first glance (with some reservations).
The Input: A typical stand-up would involve a lead (/manager/scrum master) directing the flow of the conversation with each team member giving his works update in turn. We took such an extract from the transcript of our own team pictured below. It is a 600-word transcript which is filled with broad updates on the AI products that we worked on the day before. We then asked GPT-3 to summarize this text as-is using the davinci engine.
The entire conversation of about 500 words is condensed into a few lines. While this is great for seding an email, important details in the conversation are missing.
The Output:
The team is working on the first release of the Ditto App and are waiting for approval from the Zendesk representative. The team is also working on Kairon for its next release .
That is awesome! We were able to condense a conversation from 500 words to just 50 words without much work. But, there is an issue
The thing with the minutes of the meeting is to capture the “Who did what” extremely well. These details are important so that one can create, assign work.
What we have is good for an email, but not something we can perform subsequent processing on like — Agile Card Creation / Card movement on a Scrum board etc.
In the next part of this blog we will discuss how we overcame this limitation by modifying the inputs to GPT-3 so that it could can pick up more details from a conversation.
Shameless Plugs
Some of the other open-source work we are doing! And this!
Acknowledgements
- https://stackoverflow.com/questions/64722585/gpt-3-prompts-for-sentence-level-and-paragraph-level-text-summarization-text-s
- Meet Transcript: This is an excellent app and is able to leverage google transcribe https://chrome.google.com/webstore/detail/meettranscript/jkdogkallbmmdhpdjdpmoejkehfeefnb
- Chatbot playground: https://kairon.digite.com/
- The folks at Open-AI for letting us experiment with GPT -3 https://openai.com/
Check out part 2 of this blog HERE.