Matt Mahoney

I am an adjunct instructor in computer science at Florida Tech I currently have an MS degree in computer engineering (1988) and a Ph.D. in computer science (2003). My Ph.D. topic was network intrusion detection. I also did postdoctoral work in time series anomaly detection in 2003-05. Vitae.

My current research interest is data compression. I helped develop the PAQ series of compressors, which are top ranked on many benchmarks, using a new algorithm called context mixing. I also maintain the large text benchmark which I hope will promote research in natural language modeling.

Classes I Teach

Fall 2007 Other classes I have taught.

What do you think of my (or anyone else's) courses at FIT? Post your reviews anonymously at

Papers on Time Series Anomaly Detection

Modeling Multiple Time Series for Anomaly Detection, P. Chan & M. Mahoney, Proc. IEEE Intl. Conf. on Data Mining, p. 90-97, 2005.

Trajectory Boundary Modeling of Time Series for Anomaly Detection, M. Mahoney and P. Chan, Workshop on Data Mining Methods for Anomaly Detection, SIGKDD Conf., 2005.
(This is a minor revision of technical report TR-2005-08)

NASA valve data used in these papers.

Papers on Intrusion Detection

Computer Security: A Survey of Attacks and Defenses (HTML, 2000, unpublished)

Detecting Novel Attacks by Identifying Anomalous Network Packet Headers by Philip K. Chan and Matthew V. Mahoney, 2001, Florida Tech. technical report CS-2001-2 (PDF, 10 pages). The paper describes how attacks can be detected because the network packet headers differ from normal traffic in the DARPA intrusion detection evaluation data set.

PHAD: Packet Header Anomaly Detection for Indentifying Hostile Network Traffic by Matthew V. Mahoney and Philip K. Chan, 2001, Florida Tech. technical report CS-2001-4 (PDF, 17 pages). This is a later version of the previous paper.

Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. Eighth Intl. Conf. Knowledge Discovery and Data Mining, p376-385, 2002. (C) 2002, ACM (PDF, 10 pages). This paper describes ALAD (Application Layer Anomaly Detection). It applies techniques used in PHAD to TCP streams rather than packets, resulting in an improved detection rate when both systems are combined.

Learning Models of Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Florida Institute of Technology Technical Report CS-2002-08 (PDF, 48 pages double spaced). LERAD improves on PHAD and ALAD by automatically extracting conditional rules from the training data.

Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC, Melbourne FL, p. 346-350, 2003 (PDF, 5 pages). NETAD improves on PHAD by prefiltering the data, modeling common subsets (IP, TCP, HTTP, etc.), modeling rare but not necessarily novel events, and by using simplified parsing (first 48 bytes of the IP packet as 48 attributes).

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection by Matthew V. Mahoney and Philip K. Chan, TR-CS-2003-02. A similar version appears in Proc. RAID, pp. 220-237, 2003. The paper investigates simulation artifacts in the DARPA test set and proposes to solve the problem by injecting real background traffic.

Learning Rules for Anomaly Detection of Hostile Network Traffic by Matthew V. Mahoney and Philip K. Chan, Proc. Third International Conference on Data Mining (ICDM 2003), Melbourne FL, 2003 © IEEE 2003 (4 pages). A more detailed version (11 pages) appears in TR-CS-2003-16 The papers extend LERAD using both TCP stream and packet attributes, and tests it on both the DARPA IDS sets and on real traffic with real attacks.

Source Code for PHAD, ALAD, LERAD, NETAD, SAD, and EVAL.

Papers on Language Modeling and Data Compression

My dissertation originally was on estimating the cost of natural language modeling. (This was in 1999. I later changed the topic to intrusion detection because it was funded). Alan Turing predicted in 1950 that in 50 years a machine with 109 bits of memory would demonstrate artificial intelligence. For my proposal, I surveyed about 30 language models and found that the empirical relation between model size and compression ratio suggests that Turing was right (with regard to space complexity), give or take an order of magnitude.

Fusion of Information Retrieval Engines (FIRE), by S. Alaoui Mounir, N. Goharian, M. Mahoney, A. Salem, O. Frieder, 1998 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), Las Vegas. This class project suggests that combining the results of multiple search engines gives better recall/precision than all of the individual engines.

Text Compression as a Test for Artificial Intelligence, 1999 AAAI Proceedings (poster). I argue that text prediction or compression is a stricter test for AI than the Turing test. (I give a more detailed argument in the rationale for my large text compression benchmark).

Fast Text Compression with Neural Networks, 2000, Proc. AAAI (HTML, PostScript, or PDF, 5 pages). Includes an archiver (C++ source and a Windows executable) that gives better (but slower) compression than compress, zip, or gzip.

Adaptive Weighing of Context Models for Lossless Data Compression, 2005, Florida Tech. Technical Report TR-CS-2005-16. Describes the PAQ6 series of context mixing compressors, which are top ranked on several benchmarks.

The PAQ series of data compression programs.


Ph.D. Dissertation: A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic, Florida Tech., 2003 (PDF, 160 pages double spaced). This covers all of the above work. Slides (PowerPoint).

MSCS Thesis: Complexity of Adaptive Spatial Indexing for Robust Distributed Data. Florida Tech, 1998. I investigated whether an index (e.g. for a search engine) could be distributed efficiently across a network instead of being centrally located on a giant server. The answer is yes.

MSEE Thesis: Grid Logic: Programmable Logic that Implements Neural Networks. Florida Tech, 1988. I simulated a form of programmable logic where the logic elements (or neurons) lie along the lines of a grid and have programmable connections at their ends or where they cross. This layout is more efficient than traditional programmable logic devices since no additional silicon is devoted to routing. (I don't have an electronic copy, sorry).


I received an A.S. from Cape Cod Community College in 1982 and a BSEE (Computer Engineering) in 1984 from UMass Dartmouth. I have 11 years experience (1984-95) as a software engineer and C/C++ instructor at Harris Corporation. I received an MSEE in computer engineering at Florida Tech. in 1988. After leaving Harris to enroll at Florida Tech. in 1995, I continued teaching C, C++, and Java part time at Koster Associates until 1998, and at Florida Tech. since then. I received an MSCS at Florida Tech. in 1998 and my Ph.D. in May, 2003.

In my spare time I occasionally compete in triathlons and ultramarathons, running beyond the 26 mile marathon. I also maintain the race results and race calendar for the Space Coast Runners in east-central Florida.

Matt Mahoney,, or if that doesn't work, try