[Paper] Your Botnet is My Botnet: Analysis of a Botnet Takeover

Title Your Botnet is My Botnet: Analysis of a Botnet Takeover [link]
Author Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna From UCSB
Publishing CCS ’09 Year 2009
Abstract Botnets, networks of malware-infected machines that are controlled by an adversary, are the root cause of a large number of security problems on the Internet. A particularly sophisticated and insidious type of bot is Torpig, a malware program that is designed to harvest sensitive information (such as bank account and credit card data) from its victims. In this paper, we report on our efforts to take control of the Torpig botnet and study its operations for a period of ten days. During this time, we observed more than 180 thousand infections and recorded almost 70 GB of data that the bots collected. While botnets have been “hijacked” and studied previously, the Torpig botnet exhibits certain properties that make the analysis of the data particularly interesting. First, it is possible (with reasonable accuracy) to identify unique bot infections and relate that number to the more than 1.2 million IP addresses that contacted our command and control server. Second, the Torpig botnet is large, targets a variety of applications, and gathers a rich and diverse set of data from the infected victims. This data provides a new understanding of the type and amount of personal information that is stolen by botnets.
Summary
1. Introduction (Approach)
  • Passive analysis of secondary effects caused by the activity of compromised machines
  • Active study for botnets (Torpig) via infiltration
  • Properties of Torpig:
    a. transmits identifiers that permit us to distinguish between individual infections
    b. harvests data from various applications and information from the infected victims

2. Torpig Infrastructure and background

torpig_network
(1) Background
  • Distributed to victims as part of Mebroot which makes use of the evasion technique by MBR (Master Boot Record) manipulation
  • Infected through drive-by-download attacks (inclusion of HTML tags to request JavaScript)
  • Injected a DLL into “explorer.exe” by installer, then loads kernel driver (disk.sys)
  • Contacted C&C server to obtain malicious modules initially, where all communication were done with a sophisticated, custom encryption algorithm
  • Uploaded stolen data (i.e stored passwords and accounts) into C&C server periodically
  • Took advantage of phishing site which consists of HTML form to enter sensitive info
(2) Domain Flux
  • Each bot uses a DGA, Domain Generation Algorithm, with domain flux which generates a list of “rendezvous points” that could be used by botmasters to control their bots
  • Bots queried a certain domain mapped onto a set of IPs, changing frequently
(3) Taking control of the botnet
  • By registering .com / .net domains for 3 weeks
  • Sinkhole the traffic, ended up with gathering 8.7GB web log and 69GB pcap data
3. Botnet analysis
(1) Stolen Data
  • AVG Antivirus Free v2.9 (or AVG)
  • Lookout Security & Antivirus v6.9 (or Lookout)
  • Norton Mobile Security Lite v2.5.0.379 (Norton)
  • TrendMicro Mobile Security Personal Edition v2.0.0.1294 (TrendMicro)
(2) Botnet Size
  • Counting bots by nid: 180,835,  by submission header fields: 182,914 machines
  • 1,247,642 unique IP addresses

(3) Some statistics from the paper

bot_statistics

Note It is a quite interesting paper because it contains live analysis of sensitive data harvested from the machines infected by real botnets on the fly. Also it is impressive to perform active infiltration impersonating C&C server in order to take over a botnet. This paper discusses one of the notorious botnets, Torpig, which was widely prevalent over the world back in 2009. The authors tried to reach comprehensive understanding on Torpig including infection path, static and dynamic analysis by reversing, relevant analysis of collected/stolen data from diverse perspectives.

[Paper] Dissecting Android Malware: Characterization and Evolution

Title Dissecting Android Malware: Characterization and Evolution [link]
Author Yajin Zhou and Xuxian Jiang from CS in NSCU Email yajin zhou@ncsu.edu
Publishing SP ’12 Proceedings of the 2012 IEEE Symposium on Security and Privacy Year 2012
Abstract The popularity and adoption of smartphones has greatly stimulated the spread of mobile malware, especially on the popular platforms such as Android. In light of their rapid growth, there is a pressing need to develop effective solutions. However, our defense capability is largely constrained by the limited understanding of these emerging mobile malware and the lack of timely access to related samples. In this paper, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011. In addition, we systematically characterize them from various aspects, including their installation methods, activation mechanisms as well as the nature of carried malicious payloads. The characterization and a subsequent evolution-based study of representative families reveal that they are evolving rapidly to circumvent the detection from existing mobile anti-virus software. Based on the evaluation with four representative mobile security software, our experiments show that the best case detects 79.6% of them while the worst case detects only 20.2% in our dataset. These results clearly call for the need to better develop next-generation anti-mobile-malware solutions.
Summary
1. Goals and contributions
  • Presenting large collection of 1260 Android malware samples in 49 malware families
  • Performing a timeline analysis of discovery based on characterization
  • Performing an evolution-based study of representative Android malare

 androidmalware

2. Malware Characterization

(1) Malware Installation:
  • Repackaging (86%) to piggyback malicious payloads into popular applications
  • Update Attack: Updating component that fetch or download the malicious payloads at runtime
  • Drive-by Download: Enticing users to download “interesting” or “feature-rich” applications
  • Other groups: spyware, fake apps, apps including functionality with purposeful malice, root privilege
(2) Activation: BOOT_COMLETED, SMS_RECEIVED, ACTION_MAIN
(3) Malicious Payloads
  • Privilege Escalation (36.7%): Platform level exploits
  • Remote Control (93%): Bot-like capability (C&C)
  • Financial Charge(45.3%): Premium background SMS
  • Information Collection
(4) Permission Uses
  • Benign and malicious app: INTERNET, READ_PHONE_STATE, ACCESS_NETWORK_STATE, WRITE_EXTERNAL_STORAGE
  • Malicious app only: READ_SMS, WRITE_SMS, RECEIVE_SMS, SEND_SMS
 
3. Malware Detection
(1) Anti-Virus Products
  • AVG Antivirus Free v2.9 (or AVG)
  • Lookout Security & Antivirus v6.9 (or Lookout)
  • Norton Mobile Security Lite v2.5.0.379 (Norton)
  • TrendMicro Mobile Security Personal Edition v2.0.0.1294 (TrendMicro)
(2) signature base: 20.2% – 79.6%
Note

This paper illustrates Android malware features in common, analyzing a large collection (over 1,200) in a chronological order. It  raised a need of systematic Android malware analysis, which is not presence today despite of rapid growing in number. The authors collected 1,200 malware samples and classified them into 49 families (categories). A variety of findings has been shown in terms of characterization including malware installation, activation, malicious payloads, and permission use, which provides useful insight to identify Android malware in the near future when deciding if an application is suspicious.

Although the authors mentioned that existing mobile anti-virus application poorly detected malware, the biggest reason might be due to lack of samples as anti-virus detection normally depends on the signatures. It would be great the study could be carried out in a regular basis, making a comparison in changes.

Malware and its Use of Implemented Anonymous Networks

현대 악성코드는 시간이 지날수록 점점 지능화되고 정교해지고 있다. 때로는 공격용으로 만든 단순한 부산물이라기보다 정교하게 작성한 창조물같아 보이기도 한다. 악성코드 기능, 전파 그리고 악용하는 형태를 아래와 같이 정리해 봤다.
Modern malware gets more intelligent and elaborated as time goes by. Sometimes it looks like a creature of sophisticated work, rather than just a by-product for the purpose of attack. I have organized the features, distributions and misuses of malware.
요즘 공격은 악성코드를 수반하는 경우가 많으며 다음과 같은 특성을 지닌다.
 공격 자체에서 네트워크 상에서 추적하기 어렵도록 또는 시스템에서 흔적을 남기지 않는 방식으로 은닉을 시도함
 안티 리버싱 (안티-VM, 안티-디스어셈블리, 안티-디버깅)과 암호화/인코딩 기법을 이용해 분석을 어렵게 함
 특정 대상만을 공격하며 먹이가 아닌 경우 아무런 동작을 하지 않음
 대상을 파괴적이거나 무자비한 형태로 공격함
 주요정보 또는 데이터 절취와 블랙마켓 판매
 업데이트 서버나 대중적인 애플리케이션을 통한 배포 사용
Modern attack shows that it mostly involves malware, which
 Attempts to conceal attack itself by:
    . making it hard to trace themselves down from network perspective
    . making it difficult to find artifacts with wiping out themselves from system perspective
 Employs many techniques to be hard for analysis including:
    . Anti-VM, Anti-disassembly and Anti-debugging
    . Encryption and decoding
 Infects a target but do nothing harm until they achieve their goals
 Assaults a target in a destructive and reckless manner
 Steals useful information and/or data and trade on the black market
 Uses an update server and/or popular applications to maximize distribution
또한 악성코드는 다음과 같이 진화하거나 확장하리라 쉽게 상상할 수 있다.
 기존의 합법적인 도구 여러 개를 악의적인 방식으로 활용함
 클라우드 컴퓨팅 인프라에 특화된 변종 출현 가능성
 특정한 대상 (특정 장비나 조직)에 완전히 국한된 공격
 스테가노그래피를 이용한 교묘한 공격
 이미 공격당한 장비를 전용 익명 네트워크로 활용함
 기 공격으로 이미 노출된 개인정보(건강/신용카드/종교나 신념/주민번호/주소/계정 등)를 입수한 후 협박
Also, it is easy to imagine how future malware will evolve and/or expand, which
 Employs the combination of existing – even legitimate – tools/techniques in a malicious fashion
 Emerges new variables targeting cloud computing infrastructure
 Focuses highly on target-oriented attack (a specific device or organzation) which does not affect others at all
 Uses steganography technique in the wild more cleverly
 Forms private p2p anonymous network (e.g tor) with exploited zombie machines
 Makes threats against a privacy, exposing individual information stolen in the past and/or other area including:
    . credit card, health history, religion or belief, social security number, address, account and so on
이제 악성코드가 자신을 은폐하기 위해 사용할 가능성이 있는 익명 네트워크를 알아보자.
The anonymous networks below are implemented ones, which might be used as hiding malware trace.
악성코드가 자신을 은폐할 목적으로 배포하는 효율적인 방법 중 하나는 익명 네트워크를 활용하는 것이다. 가장 크게 구현된 네트워크는 Tor인데, 이전 차움의 믹스넷 설계라는 익명 우편을 전송하는 아이디어로 시작됐다. 토르를 구현한 Roger Dingledine은 2004년 이 특별한 형태의 네트워크를 구성했고 아래 링크에서 당시 발표한 페이퍼를 볼 수 있다.
One of the effective way to distribute malware secretly is to use the existing implemented anonymous network. The biggest implemented one is tor or the Onion Routing, whose idea comes from Chaum’s Mix-net design at first. The creator of tor, Roger Dingledine, has started to design this special network since 2004. You can read the paper here: https://svn.torproject.org/svn/projects/design-paper/tor-design.pdf
아래는 tor에서 간단히 경로를 생성하는 메커니즘이다.
a. 토르 네트워크에 접속하려는 자는 우선 디렉토리 노드에서 노드 리스트를 받아 선택한다.
b. 경로에 참여하는 노드 (선택된 노드)는 디피-헬만 키 교환 알고리즘을 통해 세션 키를 생성한다. (TLSv1 상에서)
c. 각 노드는 공개키 스키마를 이용해 라우팅 정보를 포함해 모든 데이터를 암호화한다.
중간 노드는 트래픽이 온 바로 이전 노드와 트래픽을 전송할 바로 다음 노드만 알고 있다. 따라서 마지막 노드를
제외한 노드는 트래픽을 추적할 수 없다.
d. 접속자는 사용할 경로를 여러 개 생성한 다음 이전 세션이 종료되면 다른 경로를 택해 전송한다.
The below shows a brief circuit (chain) establishment mechanism in tor.
a. The directory node provides node list to originator to choose nodes.
b. Each participating node does Diffie-Hellman key exchange to create session key. (over TLSv1)
c. Each node encrypts all data including routing information with public key scheme.
The node only understands the previous node which this traffic comes from and the next node which it goes to.
Therefore, there is no way to trace the traffic back except exit node.
d. The originator creates several circuits to make use of, and change a new chain when old session is over.
Freenet은 인터넷에서 동작하는 별도 네트워크이다. 하지만 tor와 다르게 모든 컨텐츠는 Freenet을 통해서만 접속할 수 있다. 매우 큰 분산 데이터베이스 형태로 동작한다. 한 사용자가 파일이나 특정 페이지를 공유할 경우 이를 요청하는 사람이 많을수록 더 분산된 캐시에 저장되고 다 빠르게 다운로드할 수 있는 구조다. 분산 저장소는 C:\Users\[UserID]\AppData\Local\Freenet\datastore 경로에 존재하며 암호화되어 있어 통제를 할 수 없는 구조다Freenet은 사용자가 요청시 키를 가지고 접속하고 해당 파일에 대한 관리를 할 수 있다. 관련 키는 4가지가 존재하며, 반드시 fproxy를 통해 접근할 수 있다.

Freenet is a separate network that runs over the Internet. However, other than tor, its content can be accessed only through Freenet including: Freesites (websites on Freenet), in-Freenet chat forums (FMS, Sone, etc), files shared within Freenet, and in-Freenet email.

It has a large distributed database. Thus the more popular a file or page, the more widely it will be cached and the faster it will download. With an appropriate key, Freenet returns the proper file which a user have requested. Here is the location to store data: C:\Users\[UserID]\AppData\Local\Freenet\datastore. There is little or no control over what is stored in the datastore folder as you might imagine.

There are four different keys associated with contents, and you have to get access to them with fproxy.

(3) Gnunet (https://gnunet.org/)

Gnunet은 2001년도에 시작한 프로젝트로 안전한 p2p 네트워크를 목적으로 하고 있다. ECRS라는 컨텐츠 인코딩 방식을 사용하며 검열에 대응한 파일 공유 기법이다. 정식 웹사이트에서는 중앙 데이터베이스를 사용하지 않은 안전한 개별 간 네트워킹 프레임워크를 사용한다고 소개한다.

Gnunet은 주로 파일 공유를 목적으로 하며 실제 웹을 통한 접속은 tor를 이용하라고 권고한다. 파일 공유, 검색, 분배, 캐싱 등을 익명으로 할 수 있도록 설계한 대표적인 익명 네트워크다.

Gnunet has started in late 2001. It also aims to implement for secure peer-to-peer networking. It uses improved content encoding: ECRS or the encoding for censorship resistant sharing. Accroding to Gnunet official website, it is a framework for secure peer-to-peer networking that does not use any centralized database.

Gnunet mainly focuses on anonymous censorship-resistant file-sharing, which provides anonymity by
. making messages originating from a peer indistinguishable from messages that the peer is routing
. acting as routers and use link-encrypted connections with stable bandwidth utilization
It is similar to tor, but limited to share files anonymously, searching, swarming, and caching.

I2P는 2003년에 시작한 익명 네트워크로 낮은 지연율을 자랑한다. 설계 목표는 완전히 분산된, 확장 가능하고 익명의 견고하고 안전한 네트워크를 지향하고 있다. 모든 데이터는 여러 겹의 암호화 과정을 거쳐 종단간 암호화하는데, 이를 특히 Garlic Routing (마늘 라우팅)이라고 한다. tor의 양파 라우팅과 대조적으로 이름을 붙였고 실제 가상의 경로(터널)과 수많은 일방향 인바운드 아웃바운드 터널이 있는 노드로 구성된다. tor의 경우 하나의 라우팅 경로가 수립되면 양방향 통신에 사용하는 것에 비해 이는 일방향으로 송수신 터널이 다르다는 점이 특징이다. 중앙에서 관리하는 구조가 아니고 분산/동적 환경이며 제3의 신뢰자 역시 존재하지 않는다. 안전한 라우팅과 정보는 Kademlia라는 알고리즘 변형을 사용하는 내부 네트워크 데이터베이스를 보유하고 있다.

I2P has begun in 2003, which is an anonymizing network, a low latency mix network. According to the original designers, the goal is to to produce a low latency, fully distributed, autonomous, scalable, anonymous, resilient, and secure network. All data is wrapped with several layers of encryption. (End-to-End) This is called Garlic Routing. I2P is made up of a set of nodes (“routers”) with a number of unidirectional inbound and outbound virtual paths (“tunnels”).

The network is both distributed and dynamic, with no trusted parties and no centralized resources. Moreover it has its own internal network database (using a modification of the Kademlia algorithm) for distributing routing and contact information securely.