Go To Business Insights Case Study: Argus Data Insights
Business Insights

Case Study: Argus Data Insights

By: Kristina Licenberger
3 minute read
ARGUS Casestudy 792 X 364 NEWS Najava Za Blog

Project idea

The client is a company that processes data collected from the media and finds insights within that data. They provide services in the form of various products - web and mobile applications through dashboards and reports. These products are used by the client’s customers. Behind all of that, there are a lot of complex tasks implemented by old software solutions and outdated services. The primary purpose of our work is to improve these existing products by replacing old microservices and help other teams by automating software with modern solutions. This will be accomplished by using multiple Natural Language Processing (NLP) models and AI solutions.

Goals

Our goal was to use Natural Language Processing (NLP) backend services to improve solution performance and accuracy. To name a few, we identified areas such as:

  • Sentiment analysis - detects the underlying tone of the text, whether it is negative, positive, neutral, or mixed
  • Keyword extraction - provides the most informative words which appear in the text which then can be used later in the analysis
  • Named entity recognition - extracts people, organizations, and locations from the text
  • Text summarization - summarizes the text which is then used instead of a full text

Solution

By making every NLP model part of an independent microservice we provide an API for each of them which can be further exposed to any of the aforementioned client’s products. With these APIs used in the product, we will significantly reduce the time necessary for generating final results.

Technologies

For data wrangling, we used standard Python libraries, such as NumPy, Pandas, PyTorch, and NLTK. For NLP models, we used Hugging Face: an open-source platform provider of NLP technologies. Most models used are finetuned with the client’s data using JupyterHub hosted as an on-premises solution. Depending on the task we aim to solve, we used models with complex architecture like BERT, roBERTa, GPT2, etc. These models needed to be language-agnostic, as they will be used in multiple countries, which was one of the main reasons why we manually implemented the solution.

In order for a Data Scientist to give a maximal contribution to a project, the knowledge of Machine Learning is not enough. This is why we expanded our knowledge of the following technologies:

  • FastAPI library - to create backend service
  • Elasticsearch - used as a database
  • React - frontend technology used for the demo application
  • DevOps technologies - Microsoft Azure Cloud, Docker, Kubernetes

Results

At the beginning of our journey on this project, we were familiar with NLP. But since most of the materials and documentation were written in German, we prepared ourselves and started taking German classes. This step proved to be helpful for us since we could analyze the textual results and decisions made by NLP models much more quickly. We actively participated in discussions on searching for the best model to use for each microservice and contributed with different suggestions - we always stay up to date by reading the latest scientific papers as well as the newest libraries used in the field of NLP.

For each NLP model we deployed, we asked for feedback from other teams in the company on their performance to improve our solutions and fulfill the client's new requirements. With the received feedback, we always suggested multiple options for improvement, and through the iterative testing of the results, we always reached the most suitable solutions.

In order for other teams to understand what was happening behind the scenes with NLP microservices, we made a demo application for all the employees from the client's company so that they can try out these models and see what the results look like. Furthermore, we stored some of the results in a Elasticsearch cluster to optimize the execution time for the demo app. The advantage of Elasticsearch cluster is in the distribution of tasks and indexing as well as searching across all the nodes in the cluster.

This made it easier to reuse the results and helped us avoid the repeated algorithms’ execution, which greatly improved the response time and user experience.

We took responsibility for the work we did. We actively participated in the monthly meetings and sprint reviews, where we presented our progress and completed work. This strengthened our relationship with the client which turned into a successful partnership.

Kristina Licenberger

Data Engineer

Kristina is a well-organized team player, always interested in making sense of data and ready to contribute to problem-solving.

More posts:

Case Study - Salesforce Marketing Cloud - Hair Assessment Questionnaire
Case Study: Salesforce Custom Application - New Employees Onboarding Process
Case Study: Rethink project | Hyperion-X