MemForC - Memory Forensic Corpus Creation for Malware Analysis

Augustine Orgah

Xavier University of Louisiana

Abstract

Millions of malware samples are recorded daily with little or no information about their activity or behavior. As of 2019, recorded malware passed the billion mark according to a report from AV-Test from October 21, 2024. VirusTotal, an online malware scanner, reported over 1.7 million distinct new malware samples for seven days and over 3 million submissions on October 16, 2024 alone. As the arms race between malware authors and security professionals continues, it is imperative that we have better methods for detecting malware and for gaining better insight into malware behaviors. Memory forensics has emerged as a very promising set of techniques for detecting malware and analyzing malicious behavior. Memory forensics techniques can be used to detect code injection, hooks that malware places to monitor system activities, persistence mechanisms, and much more. There is, however, a critical need for ground truth data for both memory forensics investigations and to support new research in the area. For investigators, ground truth is essential in distinguishing "normal" from "malicious". For researchers, memory forensics frameworks must carefully model important data structures and algorithms, which is both difficult and frequently dependent on specific versions of operating systems and applications. Ground truth provides essential data to support testing and verification. Currently, there are no large-scale repositories that provide "known clean" memory captures for investigators to compare against those from potentially infected systems nor for developers to confirm their tools work correctly. Development of a large-scale, freely available, repository of memory captures is therefore crucial. MemForC is an open-source framework of techniques designed to create a corpus of memory captures from the successful execution of malware in Windows, Linux, and MacOS systems. MemForC is designed using best practices for creating a dynamic analysis system and leverages existing memory forensic tools. This repository will provide ground truth for investigators, allow malware research: to proceed quickly, be reproduceable and verifiable, enhance education and training to meet the demand for skilled memory forensics professionals. Our corpus will be freely available to the forensics community.

About the Speaker

Augustine Orgah is a Computer Science instructor and Program Manager for Xavier University of Louisiana’s Computer Science department. He has taught computer science and cybersecurity courses while nurturing the budding computing students in their development. A security researcher and enthusiast, he enjoys teaching and is committed to building a cybersecurity pipeline at Xavier University. His main research area is Information Assurance/Cybersecurity, specifically malware analysis and memory forensics. In his spare time, he enjoys soccer, reading and watching shows about nature.