Note: This app was never published.
For one of my Master’s of Organizational Leadership courses, we were tasked with evaluating our groups performance as a team.
Evaluating the chat
During the course, we discussed using different methods and metrics to evaluate how well a team is working together. We were also tasked with creating our own set of team evaluation criteria to self evaluate our group. At the beginning of the class we established group communication expectations. One of those expectations was responding to each other’s queries within 48 hours. Because of this, I thought it would be an interesting experiment to create a program to analyze our groups compliance with our established 48 hour norm and have a Large Language Model (LLM) analyze our groups’ collaboration. Because most of our team interaction took place on a Microsoft Team’s chat, I decided to use the Team’s chat log as the content that would be used to evaluate our group.
Extracting the chat and metadata
The most difficult part of this task was actually getting the Team’s chat log into a ChatMessage object format that I could analyze. Microsoft Teams typically only allows an organization admin to export chat logs. To get around this, I developed a macOS app that could take in screenshots of our chat and use Optical Character Recognition (OCR) to extract the text content with high accuracy. Once the text was extracted, I looked for patterns in the extracted text to identify what parts were the message’s content, the date and time of the message, the message’s author, whether an individual was mentioned and whether or not the author was replying to another user. The function I wrote to analyze the extracted String for this information was extremely complex but in the end I was able to extract the metadata from each chat message and put it into a better form for further analysis.
Analyzing the Chat
Now that I had the chat in a format I could work with. I wrote code to determine how many messages were sent by each group member, the average response time between a chat member being mentioned and their next message (a response), and how many times they failed to respond within our group’s established communication expectation of 48 hours. This seemed to work quite well and after manually review, the accuracy of the app determining whether someone failed to respond in 48 hours was quite high.
When it came to analyzing the chat with an LLM. I utilized LLAMA 3, an open-source LLM created by Meta running on a MacBook Pro through Ollama. At first, I realized that the LLM had trouble with keeping up with the contest and learned that their was a ceiling for the amount of tokens (sort of like words) the LLM could accept at once and not lose track of context. To get around this, I ended up feeding the LLM each person’s chat history one at a time and was able to get summaries for each of their contributions. However, I was unable to get summaries of their interactions with others within the chat context as providing everything at once was not possible with this LLM.
Conclusion
This experiment was very unique and is definitely worth revisiting in the future especially as Large Language Models continue to improve.