Antivirus bot for Telegram

last week, the "Doctor Web" released antivirus bot for Telegram. I as a direct participant in this project, I would like on behalf of the entire team to talk about why we made this bot, how it works and whether it's time to abandon the desktop antivirus.



the

Concept


Last summer presented a Telegram Telegram bots and Bot API. Chat-bots have existed for a long time, but in this case, the platform has provided such opportunities for experiments in integration that own bots did not just lazy. There are even exotic examples.

Most bots, which we felt was entertaining (like IQ tests or assessment stickers), information (for example, send weather forecast, translate words or the address of the nearest ATM), or simultaneously by both — for example, bots to search for Indian cinema. To use them was easy, and the format so intrigued us, we wanted to use it for your own information booth — our bot could give the description of the threat upon request: for example, the user asks the bot, what it does caught by the antivirus Linux.Encoder.1, and in response to receiving a detailed description of the threat. But with a little control the idea in hand, we found obvious shortcomings:

the
    the
  • format of the message from the messenger uncomfortable reading about malware: a description of the mechanism is often very long, with code examples and a mountain of screenshots.
  • the
  • Artificial seemed the situation itself, when the user learned about the threat on their device, opened the Telegram, found the bot and asked him about it, and not simply googled it.
  • the
  • Different antivirus companies use different naming threats. The user can search for a threat in another name and do not find the desired information.

Considering all this, we decided to go further and create a bot with a real application functionality. Experimental anti-virus bot.

The task seemed interesting and useful. For traffic encryption and secure communication is responsible messenger, Telegram is in this established. For the safety of the device on which you have installed Telegram, the user is responsible and all the usual tricks of social engineering. Both your computer and smartphone can be infected by the Trojan, which in the best case will show tons of ads, and at worst will turn the device into an insensitive set of plastic and metal.

We wanted a bot that could check the files and links on the fly and warn the user if it detects a threat. When antivirus protection is built in, say, email, your antivirus may be located either on the side of the mail hosting, or on the user's device. Bot API allows you to organize protection differently in the new paradigm: bot is not on the user's machine and not on the service side, it does not depend on the operating system or from the performance of the device. The only condition for its operation — version Telegram must support the use of bots. If a suspicious message in the Telegram, it can be directly ForwardRoute bot. Convenient and send the bot a dubious link received from other sources.

Outset that such a mechanism is not a replacement for antivirus. The bot is not able to prevent the user to click on a dangerous link or run the file, it can only warn of danger — while antivirus protection, even if the reckless sacrifice of stingeder will immediately download and run the Trojan. In this tech-savvy audience of Telegram may be interested in an antivirus product which does not restrict their actions, but provides information upon request. We think of the bot as a research project, first and foremost, we are interested in feedback — so what you see here in our article.
the

Implementation


The bot is implemented using Tornado framework — which, as a controller of traffic on the intersection, and coordinates the data flows between the Telegram Bot APIs and closed APIs our services Dr.Web. Initially, we went the standard way and used Django. However, a feature of Django framework is such that during the input and output data (receive request body, the administration, working with database, etc.) wasted precious time. We conducted an experiment using Siege utility and realized that this model proved to be unsuitable for efficient processing of thousands of simultaneous requests.
So we began to look towards asynchronous models and made a choice in favor of the Tornado (where async is the main feature). At the moment all the code of the bot is asynchronous: including downloading files, checking references and even work with a database — adding records to the database the bot is not waiting for a response from the server, and continues to perform tasks.



When the messages directed to the bot come from the Telegram cloud, we need to parse links in the received text. It is important to avoid discrepancies between the way our parser works (that is what the page will check the bot), and the work of the parser Telegram (that is what will open the user by clicking in the messenger) so we followed the as parses links Telegram — focusing on open source web version. Although their mechanism is probably not limited and occasionally caused us issues (for example, in a mobile application for iOS “test.com:8080” without the Protocol looks like “test.com:8080” from the sender, but as “test.com:8080“ the recipient).

Further processing of links, and files comes in several stages: decompression, the disclosure of abbreviated references and tracking redirects. If the link files are downloaded, we are downloading them — thanks to this bot may detect not only files, sent via Telegram, but the files are external links.

To better distribute the load on the servers, the first thing checked kesi files and links. Then bot passes the baton to the various Dr technologies.Via our internal Web API: cloud-based Dr service.Web Cloud, anti-virus engine Dr. web Scanning Engine service check links Link Checker, database of antivirus signatures. Exchange data asynchronously and mnogopotochnoy, and as the load increases we can increase the capacity by adding new servers and writing certain settings in the configuration files — the ability to scale initially built in the architecture of the bot.
Finally, the tested materials are returned to the bot — and he sends the results to the users, considering ograniczenia on the frequency of messages from bots, set Bot API Telegram.

Users can check links and files in private mode (to send to bot suspicious content or to forward messages received from other users) and group chat — if you add a bot to the chat participants, it will be triggered for all files and links in the chat.

Bot runs in two modes: quiet and normal. In normal mode, the bot responds to each file or link, and sends a message that the link is safe or not recommended file to download. If the bot will behave in group chat, it can prevent people to communicate, so we did "quiet" mode. In this mode, the bot gives the signal only when the file or link in the chat contain the threat and warns users from rash tap or click. Error messages check come in "quiet" mode — otherwise, not waiting for an answer, the user could mistakenly think that the link or the file has been successfully tested and safe. Choose a mode by using the command /mode.
With the development of the API we will implement new features if they will be useful for our purposes. Not long ago, the Telegram, introduced the use of bots in inline mode without adding a bot in the chat while this mechanism does not allow to pass to the bot the file for review, but we are considering its use. In next updates we plan to make the bot faster and more reliable, continuously monitor the users ' feedbacks.

A few words about localization (since it is not only my profession but also passion): our bot is able to communicate in Russian, English or German. Particular difficulties were not, we use the gettext library, and localization files are stored in format .po.

As a rule, all texts for our products is written in an official style, so curious new experience was the use of Emoji in text resource files in OS X are supported out of the box, in Ubuntu it was enough to add a font to the system (sudo apt-get install ttf-ancient-fonts), and in Windows it took tricks that translators could see emasdi in the localization files. We tried to insert Emoji in .po files using codes, but not all operating systems can read (for example, users desktop clients for Windows seen instead text codes). Apparently, the two reasonable solutions: either choose an editor .po files, which displays all the emoticons, or replace them with codes, but to convert into Emoji on our side. We think the second option but anyway, the user of the torment will not even notice.

Another feature that we keep in mind when designing: the same Emoji look different on different devices and in General not everywhere supported. A solution to this problem helped Emojipedia — it is possible to see if you need Emoji on certain platforms, but also copy the Emoji character or its code and paste it in .the po file.

And the little hiccup we encountered: Telegram not possible to localize the bot completely, the description of the bot and tips in the input box always on any one language (in our case English). I hope that a solution will appear in next releases Bot API Telegram.

In General, the development, testing and localization took us 3 months with a team of 7 people. Colleagues were developing in relaxed mode in parallel with the basic work tasks, so the time to "meditate" on the logic of the bot we had enough. The most difficult in this mode to perform load testing for the main stress test invited a few dozen employees that have accounts in the Telegram, and on a prearranged signal fed to the bot a collection of thousands of files. I hope that the influx of curious testing with Habra does not lead us down, but if — over time we will get the extra power, do not judge strictly.

As far as we know, antivirus bots no one else did, so there is a wide field for experimentation. Will be glad if you share your thoughts and experiences with our bot: @drwebbot
Article based on information from habrahabr.ru

Comments

Popular posts from this blog

Powershell and Cyrillic in the console (updated)

Active/Passive PostgreSQL Cluster, using Pacemaker, Corosync

Experience with the GPS logger Holux M-241. Working from under Windows, Mac OS X, Linux