Hack Detection ML Model

Forta

Mar 5, 2024

TL;DR

Aleno analyzed the lifecycle of DeFi exploits—comprising funding, preparation, activation, and laundering phases—and developed machine learning models to detect malicious smart contracts during the preparation stage. By enhancing Forta's detection capabilities and incorporating funding provenance analysis, Aleno improved the model's F1 score by over 20% and achieved a precision rate of 92%. These advancements enable proactive threat detection, allowing protocols to implement timely defenses and mitigate potential losses.

How it works and how Machine Learning helps to prevent hacks

The DeFi landscape is a battleground for hackers, where protocols often serve as tempting honeypots with smart contracts that may lack rigorous audits. This has led to a growing demand for security monitoring to safeguard the value they hold.

One of the security measures gaining traction involves the ability to pause a protocol through special triggers. This implies relying on robust methods to detect threats, at the risk of pausing the protocol unnecessarily.

In this article, we will look at how most attacks work and the various detection methods.

Together, we will explore the improvements proposed by the model we have developed and discuss its limitations.

We will also examine alternative and emerging methods of avoiding and responding to potential hacks.

1. The Exploit Pattern

Before delving into the intricacies of trigger design, let’s first understand the steps involved in an exploit: the Funding, Preparation, Activation, and Money Laundering phases.

Step 1: Funding Phase

To execute any on-chain transaction, including an attack vector, one needs to pay transaction fees in gas. This often involves funding an address with the chain’s main asset. To maintain anonymity, hackers use privacy platforms like Tornado Cash, FixedFloat, or Railgun to transfer funds to freshly created addresses. These funds are frequently shuffled to obscure transaction tracking.

Step 2: Preparation Phase

Most exploits involve one or more smart contract deployments. Exploiting a protocol typically requires a series of complex steps, often automated through a smart contract (e.g., Flashloans). Hackers deploy these exploit contracts using the assets obtained in Step 1.

Step 3: Activation Phase

After deploying the contract, the exploiter activates it by interacting with it. At this point, funds are drained from the protocol. This can occur through a single transaction or a series of similar transactions, targeting multiple pools or chains.

Step 4: Money Laundering Phase

Stolen funds can only be recovered through negotiation or legal action. The exploiter can cash out and launder the money using privacy platforms.

2. Detecting exploits

To enhance the security of users’ funds, the protocol can implement monitoring solutions and security procedures at two distinct stages:

Activation Phase: During this phase, the focus is on promptly identifying any ongoing exploit and initiating a swift response to protect the remaining user funds. However, it is important to note that by the time an exploit is identified in the activation phase, it may already be too late to prevent some potential losses.

Preparation Phase: Implementing proactive measures during the preparation phase enables pre-emptive action before activation, thereby ensuring the safeguarding of 100% of user funds.

Identifying Activation Phase

Identifying the activation phase requires vigilant monitoring of all transactions interacting with the protocol and assessing whether each transaction poses a threat by attempting to steal funds or is entirely benign.

Monitoring pending transactions

Whenever a transaction is proposed in the mempool, it can be simulated off-chain on a fork of the blockchain. This simulation allows for an evaluation of the transaction’s impact on the protocol’s balances. The algorithm triggers an alert if the impact surpasses the predefined risk policy threshold. At this stage, a white hat could try frontrunning the malicious transaction, or the protocol can try pausing the protocol before the malicious transaction gets verified.

Monitoring verified transactions

In many cases, monitoring the mempool alone may not suffice, as malicious actors might utilize private mempools. The only remaining solution is to monitor the protocol’s verified transactions and verify whether funds have been stolen or if there have been modifications to the protocol parameters.

Drawbacks

While it is crucial to identify any ongoing exploit during the activation phase as early as possible to react promptly and prevent further damage, it is worth noting that in some cases, it may already be too late to secure user funds.

Most hackers employ private mempools to broadcast their transactions, making it challenging to apply preventive measures. In such scenarios, protocols can only rely on the hope that the hacker’s exploits necessitate multiple transactions to drain the entire protocol. This could potentially allow for partial fund recovery in the unaffected pool or remaining funds.

Note: Many large-scale hacks involve multiple transactions, especially when the exploiter targets multiple pools/assets.

Identifying Preparation Phase

Detecting malicious smart contracts as soon as they are deployed is crucial, as there can be a time gap from minutes to hours between preparation and activation. This could help protocols take security measures to react to the potential threat. Malicious smart contracts can be identified through techniques such as smart contract simulation or machine learning classification models

Identifying malicious contracts with simulations:

For exploit contracts with parameterless functions, it is feasible to simulate the use of these contracts’ functions by replicating the state of the blockchain within a simulation environment. Subsequently, monitoring any alterations in balances for stolen assets after executing the contract call enables us to evaluate whether the function was indeed an exploit.

Identifying malicious contracts with ML models:

Malicious smart contracts exhibit distinct patterns that differ from normal ones. These patterns can be identified through contract opcode analysis (low-level machine instructions).

As exposed by Forta Network’s work in this area, malicious contracts show, for example:

· Few PUSH32: Malicious contracts often emit minimal logs

· Few SHA3: Malicious contracts do not authenticate user actions and do not compute keccak256.

· Frequent PUSH 20: They frequently hard-code victim addresses into the contract.

Drawbacks:

While simulations-based detection yields high-precision alarms, it is worth noting that not all exploit functions are parameterless.

Machine learning-based detection is effective but not 100% accurate. This means that if they are implemented in a naïve way, the protocol may be paused because of a false positive, then impact its reputation, or worse, the model may fail to detect a hack.

3. Aleno’s contribution to the community

As Aleno works on monitoring financial threats for DeFi Asset Managers, it is in our interest to improve DeFi security as much as possible.

So it was only natural that when we came across Forta’s article, we used our Data Science skills and our knowledge of DeFi to create more effective models.

Improved raw Forta’s Models

By focusing on their proposed enhancements, enhancing the dataset with recent hacks, and addressing dataset imbalance, particularly due to the scarcity of examples of malicious contracts, we have been able to improve the existing malicious smart contract detection bot’s recall (percentage of true positives correctly identified from all positives) by 30%, consequently improving the overall model’s F1 score by over 20%.

Aleno’s raw models scores :
84% F1-Score, 87% Recall, 81% Precision

We believe that the techniques and insights we have employed in building this model can be invaluable for improving the other existing ML models facing the same symptoms of low recall.

Improvement by Funding provenance analysis

This analysis helps assess whether the detection is genuinely positive, especially considering that exploiters often use privacy tools to obscure their activities.

Additionally, thanks to ScoreChain API, we have demonstrated effective strategies for mitigating false positives in the model. This was accomplished by thoroughly analyzing the funding phases of the deployer’s malicious Smart contracts, resulting in an impressive precision rate of 92%. It is worth noting that exploiters typically source their funds from platforms such as Tornado Cash or KYC-less exchanges like FixedFloat.

Aleno Model with ScoreChain enrichment scores :
87% F1-Score, 80% Recall, 92% Precision

Open Access to Aleno’s Enhanced Model

In a community-first perspective, we released our model on Forta (available here) allowing everyone to use it in their various applications.

Moreover, we deployed a Telegram bot available here. It automatically sends an alert every time a Smart-contract detected as Critical is deployed.

Link to Forta’s bot
Link to telegram’s bot

Limitations in a protocol’s pausing context

Predictive models help protocols make decisions before hacks occur. Even if their performances have been improved and cross-sourcing has been settled, they are still trained on existing smart contract exploits.

Redeploying these models from a tornado-funded account could trigger an alarm, potentially launching a DDoS attack on a target protocol. Unpausing the affected protocols can be challenging and time-consuming, especially if the contract cannot be simulated.

Reactivating protocols involves a thorough analysis of the deployed contract to identify vulnerabilities, deploy fixes, and proceed with reactivation. While this process is preferable to fund loss, it can be cumbersome and disruptive in a DDoS scenario.

4. Emerging and Alternative Solutions

We have presented existing solutions to hack potentialities and their limitations.

However, threats’ response must be use-case specific and will depend on the strategy and the actor typology. Here we present other promising methods that are under development.

Blacklisting

A blacklisting module can prevent potentially malicious addresses from interacting with your protocol, a solution that may be more suitable than a complete protocol freeze.

Such a module could tolerate false positives during the preparation phase while reserving the option to pause the protocol during the activation phase, which is the best course of action to ensure equitable treatment of users and prevent further exploitation of remaining funds.

However, it is important to note that on-chain implementation of a blacklist can be costly. Storing addresses on-chain consumes additional gas, and the list could be susceptible to pollution through spamming. Conversely, an alternative approach is to initially block all interactions with the protocol and then selectively whitelist trusted addresses.

This approach resembles KYC procedures and may not align well with the ethos of permissionless blockchain networks, where openness and inclusivity are typically valued.

ERC-20R

As a new standard example, Circle is reworking the ERC20 token standard with the concept of “ERC-20R” and recoverable wrapper tokens. These tokens provide a unique approach to asset security by allowing users to wrap their ERC-20 assets, protecting them from theft while maintaining most of their utility. The recoverable wrapper tokens can be recovered back to the sender within a specified time window post-transaction, providing an added layer of security. Circle Research has introduced multiple configuration sets for these tokens, offering flexibility and interoperability.

Censorship at the sequencer level

Some other innovative approaches aim to prevent problematic transactions from being integrated in the first place.

For instance, consider Zircuit’s approach, which implements censorship at the sequencer level. By monitoring the mempool, the sequencer can identify malicious transactions and choose not to include them in the block. Alternatively, in the absence of censorship, another sequencer could prioritize pausing operations within a block. In such a scenario, a detected smart contract exhibiting suspicious behavior could be closely monitored within its mempool. If a threat is detected and the protocol is configured with a circuit breaker for such events, the pausing transaction could be executed before any potential exploit, enhancing security.

5. Conclusion

In this post, we have delved into the workflow of on-chain exploits and explored the current alerting techniques implemented to safeguard user funds within the protocol.

We have also shared how we enhanced the current Malicious Smart-contracts detection model, achieving an improvement of over ~20%. This improved model is now available on the Forta network and through telegram.

Despite the effectiveness of these models, it is imperative to acknowledge the substantial risk of DDoS attacks that can potentially arise from their use in pausing protocols. We strongly recommend not relying solely on a single model but instead incorporating multiple sources to mitigate this threat.

Promising solutions are emerging, such as Blacklisting, ERC20-R and censorship at a sequencer level and it is worth keeping an eye on them.

Keep in mind all solutions have their pros and cons depending on the use case. The best solution for your use-case likely lies in a combination of the methods presented in this article

Aleno’s core business is to develop DeFi threat monitoring tools for professionals, enabling them to react to risks and opportunities. In this perspective, the aggregated predictive model provides a suitable response to threats of hacks as the risk of false positive circuit-breaking activation is far less harmful than on a protocol level.We hope that this article and the models will provide everyone with a better understanding of these risks and help create the healthiest possible environment for DeFi growth with maximum security.

Don’t miss out on our next post! Follow us on LinkedIn and Twitter for updates. If you want to fully leverage on-chain data, leave a message at contact@aleno.ai & visit our website to learn more.

Psalion
On-chain Markets Alerting System
Real-time
Prices
AMM
May 16, 2024
Forta
Hack Detection ML Model
Machine Learning
OPCode
Bots
March 5, 2024
Kaiko
Wallet Data
Wallet Balances
Historical Data
Live Data
February 28, 2024
Chainlink
Market Depth & Robust Pricing Methodology
Real-time
Market Depth
Volumes & TVL
September 2, 2024