Identify Languages and Filter Noise with Social Intelligence

  Archives Categories

Identify Languages and Filter Noise with Social Intelligence


A short while ago I wrote the first post in a series about Synthesio’s Natural Language Processing (NLP), which was focused on how our social intelligence tool uses Automated Sentiment Analysis. If you missed it, you should check it out, as I will continue to discuss aspects of our NLP now with part 2 of this series, focusing on language identification and sifting through the social noise.

Now you may be wondering, how can you identify the sentiment of a piece of content if you don’t know what language it’s in? Good question. At Synthesio we have developed a custom language identification system (langID) that is designed for social media messages. Our system can correctly detect 80 languages — even on very short messages (such as small tweets) and transliterated content, such as Arabic text written in roman alphabet letters. For improved results, we even apply intelligent message cleaning so that symbols and unrelated text (e.g. URLs, hashtags, etc.) do not degrade our results. This means that our social intelligence system will automatically tell you which language every mention is written in, making it easy to sort and filter by language. How convenient!

But what about noise and irrelevant messages? While our algorithms are the smartest in the industry, they’re not perfect, and in some cases your search results (dashboards) will contain irrelevant content. (Anyone who tells you their system is immune to this problem is lying.) One recurrent example arises with the term “apple”, where it’s difficult for the automatic system to determine whether users are talking about the company or the fruit.

Natural Language Processing

The traditional way to solve this problem is to manually improve the search queries over time, but this requires specialized workers and can be costly. At Synthesio, our social intelligence platform has the ability to automatically filter out unwanted messages on Twitter. Over time, accounts that are considered spammers will be flagged and removed from client dashboards. The flagging is done semi-automatically, making use of machine learning methods.

Interested in learning more about our Natural Language Processing tools and how they can help your business? We’d love to give you a personal tour. Put our social intelligence to work and download our guide or request a demo.

Back to Blog

Leave a Reply