2003 Technical Reports - Florida Tech Department of Computer Sciences

Computer Science Technical Reports
2003

CS-2003-01 (8 pages) (pdf)

Title	HEAT: Runtime Interception of Win32 Functions
Authors	Michael M. Andrews
Contact Email Address
Faculty Sponsor	Michael M. Andrews
TR number assignment date	14 January 2003

Abstract

When researching in any field, it is always important to build upon existing work. In most research disciplines, the core base is well defined and widely documented. However, the basis of innovative research in computer science, especially at the systems level, can often be proprietary information. This can be a major problem as unless you have source code available you must be granted access to the proprietary information (by usually signing non-disclosure documents) which comes with barriers on access, ownership and the ability to publish results.

As an example, Microsoft's Windows operating system (all versions) can be a good base for research as it is a "real world" environment with rich usage patterns and any potential advances made would benefit a large number of people. However, access to the Windows operating system is closely guarded via a number of API's (Application Programing Interfaces) along with the fact that Microsoft jealously guards its source code. Therefore, the ability to easily tie into and extend the functionality of applications and operating systems where source code is not available is a necessity.

With this in mind, researchers at the Florida Institute of Technology created HEAT - Hostile Environment for Application Testing. The goal of this research was initially to create a tool that would create hard-to-reproduce environments (like low or corrupt memory, insufficient disk space, slow network, etc.) to help uncover defects in software. To achieve this it was clear that HEAT would have to tie into both the operating system and the application under test.

CS-2003-02 (20 pages) (pdf)

Title	An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection
Authors	Matthew V. Mahoney and Philip K. Chan
Contact Email Address
Faculty Sponsor	Philip K. Chan
TR number assignment date	27 January 2003
Revised	7 April 2003, old version (13 pages) here

Abstract

We investigate potential simulation artifacts and their effects on the evaluation of network anomaly detection systems in the 1999 DARPA/MIT Lincoln Laboratory off-line intrusion detection evaluation data set. A statistical comparison of the simulated background and training traffic with real traffic collected from a university departmental server suggests the presence of artifacts that could allow a network anomaly detection system to detect some novel intrusions based on idiosyncrasies of the underlying implementation of the simulation, with an artificially low false alarm rate. The evaluation problem can be mitigated by mixing real traffic into the simulation. We compare five anomaly detection algorithms on simulated and mixed traffic. On mixed traffic they detect fewer attacks, but the explanations for these detections are more plausible.

CS-2003-03 (7 pages) (pdf)

Title Testing with Hostile Data Streams

Author Alan A. Jorgensen

Contact Email Address

TR number assignment date 30 January 2003

Abstract

This note describes a method of testing software for response to malicious data streams. Systems that process data streams obtained from an external source such as the Internet are vulnerable to security issues if malicious data is not processed correctly. This note describes a testing method that creates malicious data streams, applies them to a software application and checks the appropriateness of the application response.

The note begins with a description of the problem: inadequate testing of software response to malicious data streams. I present a method of testing the response to malicious data streams and introduce the concepts of lexical, syntactic and semantic data stream deformation. I provide a description of a system that produces and applies such tests. This description divides the testing system into components and provides some detail about each component. This system applied to Adobe® Acrobat® Reader® version 5.0.1 provides a case study. The study applied 141,306 unique test cases and revealed 11 distinct indications of buffer overrun, numerous program lock-ups, and four steganographic possibilities.

Research is on-going in the following areas: generalized buffer overrun exploitation, maliciously testing protocols and testing with encoded or encrypted data streams.

CS-2003-04 (8 pages) (pdf)

Title	A New Approach to Scalable Linda-systems Based on Swarms (EXTENDED VERSION)
Authors	Ronaldo Menezes and Robert Tolksdorf (Freie Universitat Berlin, Germany)
Contact Email Address
TR number assignment date	7 March 2003

Abstract

Natural forming multi-agent systems (aka Swarms) have the ability to grow to enormous sizes without requiring any of the agents to oversee the entire system. The success of these systems comes from the fact that agents are simple and the interaction with the environment and neighboring agents is local in nature. In this paper we look at abstractions in the field of swarms and study their applicability in the context of coordination systems. In particular, we focus on the problematic issue of scalability of Linda systems. The purpose of this work is to look at abstractions yielded from observations of swarms and the way they are organized, and demonstrate how these abstractions may be used to implement a scalable tuple distribution mechanism in a Linda system -- to be named SwarmLinda.

CS-2003-05 (10 pages) (pdf)

Title	Learning States and Rules for Time Series Anomaly Detection
Authors	Stan Salvador, Philip Chan, John Brodie
Contact Email Address
Faculty Sponsor	Philip Chan
TR number assignment date	9 March 2003

Abstract

In this paper, we investigate machine learning techniques for discovering knowledge that can be used to monitor the operation of devices or systems. Specifically, we study methods for generating models that can detect anomalies in time series data. The normal operation of a device can usually be characterized in different temporal states. To identify these states, we introduce a clustering algorithm called Gecko that can automatically determine a reasonable number of clusters using our proposed "L" method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko algorithm is comparable to a human expert in identifying states and our overall system can track normal behavior and detect anomalies.

CS-2003-06 (13 pages) (pdf)

Title	A Machine Learning Approach To Anomaly Detection
Authors	Philip K. Chan, Matthew V. Mahoney, Muhammad H. Arshad
Contact Email Address
Faculty Sponsor	Philip K. Chan
TR number assignment date	29 March 2003

Abstract

Much of the intrusion detection research focuses on signature (misuse) detection, where models are built to recognize known attacks. However, signature detection, by its nature, cannot detect novel attacks. Anomaly detection focuses on modeling the normal behavior and identifying significant deviations, which could be novel attacks. In this paper we explore two machine learning methods that can construct anomaly detection models from past behavior. The first method is a rule learning algorithm that characterizes normal behavior in the absence of labeled attack data. The second method uses a clustering algorithm to identify outliers.

CS-2003-07 (5 pages) (pdf)

Title	A Representation Scheme for Finite Length Strings
Authors	Gaurav Tandon, Debasis Mitra
Contact Email Address
Faculty Sponsor	Dr. Debasis Mitra
TR number assignment date	1 April 2003

Abstract

This study is an attempt to create a canonical representation scheme for finite length strings to simplify the study of the theory behind different classes of patterns and to ease the understanding of the underlying separability issues. This could then be used to determine what kinds of techniques are suitable for what class of separability (linear, multi-linear, or non-linear). This representation can then be used in intrusion detection, biological sequences, pattern recognition/classification and numerous other applications.

CS-2003-08 (106 pages) (pdf)

Title	Pair Programming to Facilitate the Training of Newly-Hired Programmers
Authors	Mark Anthony Poff
Contact Email Address
Faculty Sponsor	David W. Clay
TR number assignment date	7 April 2003

Abstract

Pair programming is touted by some software professionals as a style which allows developers to produce superior code in less time, and with fewer defects, than code produced by individuals. However, the acclaim for the pair methodology is not universal; some see it was a waste of effort which produces marginal improvements. Reported experiments which obtain quantitative results have typically been performed in an educational environment, and may not reflect actual workplace conditions. This thesis reports on an experiment using pair programing in an industrial setting. Its goal is to determine if this programming style can be used to increase the technical and environmental knowledge of newly-hired programmers, and verify the claims stated above.

CS-2003-09 (+++ pages) (not yet available)

Title	Scenario-Based Testing-What, How, and Why?
Authors	Sowmya Padmanabhan
Contact Email Address
Faculty Sponsor	Dr. Cem Kaner
TR number assignment date	9 April 2003

Abstract

+++

CS-2003-10 (18 pages) (pdf)

Title	A Local Search Optimization Algorithm Based On Natural Principles Of Gravitation
Authors	Barry Webster and Philip J. Bernhard
Contact Email Address
Faculty Sponsor	Philip J. Bernhard
TR number assignment date	14 April 2003

Abstract

This paper discusses the concept of an algorithm designed to locate the optimal solution to a problem in a (presumably) very large solution space. The algorithm attempts to locate the optimal solution to the problem by beginning a search at an arbitrary point in the solution space and then searching in the "local" area around the start point to find better solutions. The algorithm completes either when it locates what it thinks is the optimal solution or when predefined halt conditions have been met. The algorithm is repair-based, that is, it begins with a given solution and attempts to "repair" that solution by changing one or more of the components of the solution to bring the solution closer to the optimal. The algorithm uses the natural principles of gravity that act on a body in motion through space and simulates those principles to take a given solution to the problem at hand and repair it to locate an optimal solution. In this document two versions of the algorithm, called GLSA1 (based on simple gravitational force calculation) and GLSA2 (based on gravitational field calculation), are presented and the manner in which an initial evaluation of the algorithm was conducted.

Then, by way of example a particular problem is given that the algorithm can be used to solve, along with a description of how the algorithm would be used to solve that problem. Finally, some conclusions and opportunities for future direction are presented.

CS-2003-11 (87 pages) (pdf)

Title	Strategies for Testing Web Applications from the Client Side
Authors	Lawrence T. Prevatte III
Contact Email Address
Faculty Sponsor	Dr. James Whittaker
TR number assignment date	25 April 2003

Abstract

This thesis is presented on the testing of computer software designed to operate over the World Wide Web. A fault model to guide Web software testing is developed, the components of Web software applications and their interfaces to each other are defined, and strategies to test Web software are discussed. The approach taken is to simplify the concepts of testing Web software to aid in a methodical treatment of the techniques required to conduct meaningful tests using a minimum amount of testing resources, conducting the tests from the client side of the Web application during the "ready for users" or acceptance phase. This text will present how the industry arrived at the current state of Web application quality, how the industry views Web applications, how to test them, strategies for testing Web applications from the client side, and interesting examples using the proposed methodology.

CS-2003-12 (88 pages) (pdf)

Title	Migration From A Waterfall/Non SEI CMM Compliant Process to a RUP/SEI CMM Compliant Process
Authors	Chad Amos Chamberlin
Contact Email Address
Faculty Sponsor	Mr. David Clay
TR number assignment date	29 April 2003

Abstract

Software Life Cycle processes play a key role in the successful development of software products. Software processes continue to evolve as the field of software development progresses toward being a true engineering discipline. This theses has two objectives:

(1) To apply software engineering knowledge gained during the pursuit of this degree toward the design and development of a complete, Software Engineering Institute (SEI) Capability Maturity Model (CMM) compliant process for the design phase of the software lifecycle.

(2) To evaluate a Waterfall Lifecycle Model vs. an Iterative Lifecycle Model and compare SEI CM compliant processes to non-SEI CM compliant process using a "real world" software development project.

CS-2003-13 (160 pages) (pdf)

Title	A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic (Ph.D. dissertation)
Authors	Matthew V. Mahoney
Contact Email Address
Faculty Sponsor	Philip K. Chan
TR number assignment date	15 May 2003

Abstract

The current approach to detecting novel attacks in network traffic is to model the normal frequency of session IP addresses and server port usage and to signal unusual combinations of these attributes as suspicious. We make four major contributions to the field of network anomaly detection. First, rather than just model user behavior, we also model network protocols from the data link through the application layer in order to detect attacks that exploit vulnerabilities in the implementation of these protocols. Second, we introduce a time-based model suitable for the bursty nature of network traffic: the probability of an event depends on the time since it last occurred rather than just its average frequency. Third, we introduce an algorithm for learning conditional rules from attack free training data that are sensitive to anomalies. Fourth, we extend the model to cases where attack-free training data is not available.

On the 1999 DARPA/Lincoln Laboratory intrusion detection evaluation data set, our best system detects 75% of novel attacks by unauthorized users at 10 false alarms per day after training only on attack-free traffic. However this result is misleading because the background traffic is simulated and our algorithms are sensitive to artifacts. We compare the background traffic to real traffic collected from a university departmental server and conclude that we could realistically expect to detect 30% of these attacks in this environment, or 47% if we are willing to accept 50 false alarms per day.

CS-2003-14 (165 pages) (pdf)

Title	Design and Implementation of Exception Handling with Zero Overhead in Functional Languages (Ph.D. dissertation)
Authors	Ramon Zatarain Cabada
Contact Email Address
Faculty Sponsor	Dr. Ryan Stansifer
TR number assignment date	3 June 2003

Abstract

This dissertation considers the implementation of exception handling specifically for functional languages. Some implementations incur overhead for using exception handling even when no exceptions are raised. We show the results of some experiments with the SML of New Jersey and OCAML compilers, two well-known compilers for functional languages. Imperative languages avoid this overhead by using tables, but the approach does not easily transfer to compilers using continuation passing style (CPS). This dissertation proposes an approach that works with CPS compilers like SML of New Jersey.

We first present an experiment where programs in SML are written with and without exception handlers. From these results, we conclude that programs with exception handling produce overhead even when no exceptions are raised. Then, we analyze the source of the exception handling overhead in the SML of New Jersey compiler. We present a solution to the problem. The new approach uses two continuations instead of the one continuation. One continuation encapsulates the rest of the normal computation as usual. A second continuation is used for passing the abnormal computation. The second continuation is not passed as an extra argument but is passed as a displacement from the first continuation.

We have implemented a basic CPS compiler for functional languages with exception handling. With it we were able to implement the new approach to exception handling and compare it side-by-side with the approach taken by the SML of New Jersey compiler. We show that the new approach to exception handing adds no overhead to the normal flow of control.

The importance of our new approach to exception handing for CPS compilers proposed in this dissertation is the improved run-time performance in every case in which an exception handler is used.

CS-2003-15 (81 pages) (pdf)

Title	An Automated Oracle for Verifying GUI Objects (Master's Thesis)
Authors	Juichi Takahashi
Contact Email Address
Faculty Sponsor	Dr. James A. Whittaker
TR number assignment date	8 June 2003

Abstract

The promise of test automation - the automatic application of software inputs to find bugs - is tempered by the difficulty of automatically verifying behavior. Indeed, the lack of tools for behavior verification is a major factor in keeping automated testing out of the mainstream.

This thesis develops a technique that aids automatic behavior verification for a particularly difficult problem: determining the correction of screen output. A methodology to capture and compare screen output is presented and a case study using Microsoft ® PowerPoint ® is described.

CS-2003-16 (11 pages) (pdf)

Title	Learning Rules for Anomaly Detection of Hostile Network Traffic
Authors	Matthew V. Mahoney and Philip K. Chan
Contact Email Address
Faculty Sponsor	Dr. Philip K. Chan
TR number assignment date	10 June 2003

Abstract

We introduce an algorithm called LERAD that learns rules for finding rare events in nominal time-series data with long range dependencies. We use LERAD to find anomalies in network packets and TCP sessions to detect novel intrusions. LERAD outperforms the original participants in the 1999 DARPA/Lincoln Laboratory intrusion detection evaluation, and detected most attacks that eluded a firewall in a university departmental server environment.

CS-2003-17 (8 pages) (pdf)

Title	Improving Learning Implicit User Interest Hierarchy with Variable Length Phrases
Authors	Hyoung-Rae Kim and Philip K. Chan
Contact Email Address
Faculty Sponsor	Dr. Philip K. Chan
TR number assignment date	11 June 2003

Abstract

A continuum of general to specific interests of a user called a user interest hierarchy (UIH) represents a user's interests at different abstraction levels. A UIH can be learned from a set of web pages visited by a user. In this paper, we focus on improving learning the UIH by adding phrases. We propose the VPF algorithm that can find variable length phrases without any user-defined parameter. To identify meaningful phrases, we examine various correlation functions with respect to well-known properties and other properties that we propose.

CS-2003-18 (8 pages) (pdf)

Title	Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms
Authors	Stan Salvador and Philip Chan
Contact Email Address
Faculty Sponsor	Dr. Philip K. Chan
TR number assignment date	11 June 2003

Abstract

We investigate techniques to automatically determine the number of clusters to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L Method, that finds the "knee" in a '# of clusters vs. clustering evaluation metric' graph. Using the knee is well-known but is not a particularly well-understood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work.

CS-2003-19 (8 pages) (pdf)

Title	Identifying Outliers via Clustering for Anomaly Detection
Authors	Muhammad H. Arshad and Philip K. Chan
Contact Email Address
Faculty Sponsor	Dr. Philip K. Chan
TR number assignment date	11 June 2003

Abstract

Detecting known vulnerabilities (Signature Detection) is not sufficient for complete security. This has raised recent interest in Anomaly Detection (AD), in which a model is built from normal behavior and significant deviations from this model are flagged anomalous. However, most AD algorithm assume clean training data, which could be hard to obtain. Our proposed algorithm relaxes. For this, we define the notion a strong outlier, which is suspicious at both local and global levels. Finally we illustrate the effectiveness of our approach on the DARPA '99 dataset and find that our approach is at par in number of detections at 10 FA/day with the best participants in the original evaluation who employed a hybrid of techniques.

CS-2003-20 (13 pages) (pdf)

Title	Learning Rules from System Call Arguments and Sequences for Anomaly Detection
Authors	Gaurav Tandon and Philip Chan
Contact Email Address
Faculty Sponsor	Dr. Philip K. Chan
TR number assignment date	11 June 2003

Abstract

Many approaches have been suggested and various systems have been modeled to detect intrusions from anomalous behavior of systems calls as a result of an attack. Though these techniques have been shown to be quite effective, a key element seems to be missing -- the inclusion and utilization of the system call arguments to create a richer, more valuable signature and to use this information to model the intrusion detection system more accurately. We put forth the idea of adopting a rule learning approach that mobilizes rules based upon system calls and models the system for normal traffic using system call arguments and other key attributes. We present variations of our techniques and compare the results with those from some of the well known techniques based upon system call sequences. The results show that system call argument information is crucial and assists to successfully detect U2R, R2L and Data attacks generating lesser false alarms.

CS-2003-21 (15 pages) (pdf)

Title	Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection
Authors	Rachna Vargiya and Philip Chan
Contact Email Address
Faculty Sponsor	Dr. Philip K. Chan
TR number assignment date	11 June 2003

Abstract

Most of the current anomaly detection methods for network traffic rely on the packet header for studying network traffic behavior. We believe that significant information lies in the payload of the packet and hence it is important to model the payload as well. Since many protocols exist and new protocols are frequently introduced, parsing the payload based on the protocol specification is time-consuming. Instead of relying on the specification, we propose four different characteristics of streams of bytes, which can help us develop algorithms for parsing the payload into tokens. We feed the extracted tokens from the payload to anomaly detection algorithm. Our empirical results indicated that our parsing techniques can extract tokens that can improve the detection rate.

CS-2003-22 (232 pages) (pdf)

Title	A Taxonomy of E-Commerce Risks and Failures (Master's Thesis)
Authors	Giridharan Vijayaraghavan
Contact Email Address
Faculty Sponsor	Dr. Cem Kaner
TR number assignment date	16 July 2003

Abstract

Imagine being asked to test an e-commerce website. If you haven't tested one before, where would you start? What experience would you draw on? Where would you look for more information? Even very experienced testers have blind spots when they try to generate test ideas for an application that they have not tested. This thesis presents a simple outline that will help you generate test ideas and limit your blind spots. The outline is the result of about 18 months of research on classifying e-commerce related failures and risks. The result has 45 top-level categories and 700+ examples of errors (potential issues to test for) under most categories. In many cases, there is also a link to about 300+ examples of e-commerce defects that have been publicized in the press.

Using the list, you could pick a category of interest (such as accessibility or software upgrade), read descriptions of several types of problems that fit within that category, and so identify a few issues that would be appropriate to test for in your application. Based on feedback to the authors of Testing Computer Software (Kaner, et. al.), I believe that many testers will be able to use this list to identify potential problems that they would otherwise have missed.

CS-2003-23 (100 pages) (pdf)

Title	An Algorithm for Clearing Combinatorial Markets
Authors	Josiane Domgang Nzouonta
Contact Email Address
Faculty Sponsor	Dr. Marius Silaghi
TR number assignment date	15 December 2003

Abstract

It was recently shown possible to solve single item auctions without revealing any secret except for the solution. Namely, with vMB-share [4], the seller and the buyer only learn each other's identity and the selling price for a chosen M+1 pricing scheme. No trusted party was necessary. In this thesis, we show how vMB-share can be extended for the clearing of combinatorial negotiations with several items, buyers and sellers. We first show how the more general problem can be reduced to a virtual form, relatively similar to the single-item auctions. Then, some modifications in the cryptographic techniques of vMB-share are made such that it can offer a solution to problems in virtual form. While the problem is known to be NP-complete, a secure multiparty computation that avoids the secrets leak by measuring computation time necessarily takes an exponential computation cost. Some early experiments show that small realistic negotiations can nevertheless be solved with acceptable effort.

CS-2003-24 (143 pages) (pdf)

Title	Algorithms for String Searching on a Beowulf Cluster
Authors	Kishore Ramakrishna Kattamuri
Contact Email Address
Faculty Sponsor	Dr. William D. Shoaff
TR number assignment date	18 December 2003

Abstract

String matching is a subject of both theoretical and practical interest in computer science. Theoretically, time and space complexity of the algorithms and on the practical side searching for keywords, names, phrases etc., is common. Searching for patterns in DNA (Deoxyribose Nucleic Acid) is one such application and is the ultimate area for this study.

This study resulted in efficient parallel way to search for patterns, the program being operated on a parallel computer. This research provides basis for a further study and help in selecting the search algorithms for broader use.

The algorithms proposed in this research try to solve the problem in both the single pattern and multiple pattern search areas. The outcome of this study was very encouraging and paved path for the possible future study. The algorithms were written in C using the MPI model on the Beowulf cluster of Florida Tech (Bluemarlin). The timings were compared against each other. The test data set used was kept uniform throughout the experiments.

Though several ideas were formulated and were tried to be implemented in the search, only three algorithms (KTV, KTV2 and KPrime) were designed. Of these, KTV and KTV2 showed good results where as the KPrime took lot more time than expected. Investigation on the algorithm revealed a further tweaking which is left as a prt for future study.

Title	Testing with Hostile Data Streams
Author	Alan A. Jorgensen
Contact Email Address
TR number assignment date	30 January 2003

Computer Science Technical Reports2003

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

If you cannot view pdf files, you can download Adobe Reader here.

Computer Science Technical Reports
2003