Computer Science Technical Reports
2004
-
CS-2004-01 (9 pages) (pdf)
Title A faster technique for distributed constraint satisfaction and optimization with privacy enforcement Authors Marius C. Silaghi Contact Email Address TR number assignment date 20 January 2004 Abstract
A problem that received recent attention is the development of negotiation/cooperation techniques for solving naturally distributed problems with privacy requirements. An important amount of research focused on those problems that can be modeled with distributed constraint satisfaction, where the constraints are the secrets of the participants. Distributed AI develops techniques where the agents solve such problems without involving trusted servers. Some of the existing techniques aim for various tradeoffs between complexity and privacy guarantees [MTSY03], some aim only at high efficiency [ZM04], while others aim to offer maximal privacy [Sil03].
While the last mentioned work achieves an important level of privacy, it seems to be very slow. The technique we propose builds on that work, maintaining the same level of privacy, but being an order of magnitude faster, reaching the optimality in efficiency for this type of privacy. Unfortunately, all the versions of the new technique have an exponential space requirement, namely requiring the agents to store a value for each tuple in the search space. However, all existing techniques achieving some privacy were also applicable only to very small problems (typically 15 variables, 3 values), which for our technique means 14MB of required memory. Practically speaking, it improves the privacy with which this problems can be solved, and improves the efficiency with which n/2-privacy can be achieved, while remaining inapplicable for larger problems.
-
CS-2004-02 (32 pages) (pdf)
Abstract
In this work, we first have proposed a technique to identify failure in point-based and interval-based temporal constraint reasoning. Then, while many past literature would stop searching once inconsistency is found, we pushed further the possibilities of resolving such failures by proposing some algorithms to restore consistency for both, point and interval algebras. In order to prove the correctness and efficiency of these algorithms, we developed a set of theorems and properties focusing on this particular aspect of Temporal Constraint Reasoning.
-
CS-2004-03 (13 pages) (pdf)
Abstract
Natural-forming multi-agent systems (aka Swarms) can grow to enormous sizes and perform seemingly complex tasks without the existence of any centralized control. Their success comes from the fact that agents are simple and the interaction with the environment and neighboring agents is local in nature. In this paper we discuss the implementation of SwarmLinda, a Linda-based system that abstracts Linda concepts in terms of swarm intelligence constructs such as scents and stigmergy. The goal of this implementation is to achieve many characteristics such as scalability, adaptiveness and some level of fault tolerance. This paper describes our initial version of SwarmLinda and future steps to improving the implementation.
-
CS-2004-04 (16 pages) (pdf)
Abstract
Privacy requirements can be modeled within the distributed constraint satisfaction framework, where the constraints are secrets of the participants. In [20] we introduced a first technique allowing agents to solve distributed constraint problems (DisCSPs), without revealing anything and without trusting each other or some server. The first technique we propose now, MPC-DisCSP2, is a dm times improvement for m variables of domain size d. A second technique we propose, MPC-DisCSP3, has a similar complexity as MPD-DisCSP2, a little slower in the worst case, but guarantees that the returned solutions are picked according to a uniform distribution over the total set of solutions. A third technique, MPC-DisCSP0, is based solely on secure arithmetic circuit evaluation and the distribution for picking its solutions can be manipulated in a more flexible way. Nevertheless this last technique has a O(d **m factorial times d **m) complexity. All techniques of [20] can be extended to solve optimization for distributed weighted CSPs.
-
CS-2004-05 (125 pages) (pdf)
Abstract
The normal operation of a device can be characterized in different operational states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automation. Multiple time series data may be used for training, by merging several time series into a single representative time series using dynamic time warping.
Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko segmentation algorithm is comparable to a human expert in identifying states, and our L method performs better than the existing permutation tests method when determining the number of segments to return in segmentation algorithms. Empirical results have also shown that our overall system can track normal behavior and detect anomalies. Additionally, if multiple time series are used for training, the model will generalize to cover unseen normal variations and time series that are "between" the time series used for training.
-
CS-2004-06 (218 pages) (pdf)
Abstract
Modern object-oriented languages like Java and C# do not support parametric polymorphism and do not have a traditional module system to allow the development of large systems. They overload the class mechanism with several tasks and they use packages and namespaces to organize clusters of classes providing weak control for accessing members. Other languages that support generic programming and objects do not have a simple object model to support object-oriented features.
In this thesis the language MOOL is presented. MOOL is a class-based object-oriented language that supports modular programming and genericity.
The main goal in the design of MOOL was simplicity rather than efficiency. MOOL contains separated mechanisms for different concepts like classes and modules, which are unified in other languages. MOOL is not a pure object-oriented language where everything is an object. Non-object features like functions and modules are part of the language to enhance expressivity, to structure programs and to support code reuse.
-
CS-2004-07 (15 pages) (pdf)
Abstract
Digital systems have simplified the management, accounting, transfer, and dissemination of all types of information. However, they have not yet been often successfully applied to a very difficult and rewarding task of a democratic administration: facilitating the acquisition, evaluation and refinement of the ideas and opinions of large bodies of constituents. Consequently, administrative ideas from individuals, other than a predefined set of rule-makers are going to be lost, despite the fact that they may be substantial, novel, and provide solutions to significant economic, social or security problems. Some digital support already exists for assessing the opinion of the citizens:
- Classic polls on the Internet. The problem is that the formulation of the well advertised polls can be shaped by only a limited number of people and can therefore miss important issues and variations. As follows, new ideas cannot be collected, but only verified.
- Comments sent to websites such as www.regulations.gov. The comments can only be directed toward a narrow topic that has been recently addressed. Also, some issues can draw hundreds of thousands of comments and there is no feasible way to score them.
- E-Mails and Phone Calls to representatives. The main drawbacks are that they cannot reliably convey the amount of popular support, and important ideas can still be lost in large heaps of requests received.
An enhanced way to support input from constituents in democratic organizations is via citizen sourced and ranked polls and civic discourse, where debating participants can jointly formulate ideas. This article is concerned with investigation of techniques that will provide extensive digital support in gathering and refining ideas from citizens. The techniques developed will be applied and tested by obtaining administrative ideas and feedback at different levels of democratic bodies: student governments in high schools, committees in academic institutions, homeowner associations, and town administrations. We will investigate new levels of privacy that can be offered in the new framework, as well as techniques for fair management and presentation of large numbers of ideas. Each member of any democratic organization benefits if better feedback from citizens to administrations is set in place.
-
CS-2004-08 (268 pages) (pdf)
Abstract
This dissertation represents the culmination of research into the development of a new algorithm for locating optimal solutions to difficult problems. This new algorithm is founded upon one of the most basic concepts in nature - so basic that it is in fact one of the four primary forces in physics: gravity.
It is called the Gravitational Emulation Local Search algorithm, or GELS. Four variants of the algorithm were developed, representing combinations of two basic methods of operation and two modes of search space exploration. Following development, a series of experiments were conducted to assess the capabilities of this new algorithm. Three test problems were used (Traveling Salesman, Bin Packing, and File Assignment). Instances of these problems were generated using several different problem sizes, then solved using three well-known comparison algorithms (Hill Climbing, Genetic Algorithm, and Simulated Annealing) in addition to the four variants of GELS.
The outcomes of the experiments were rigorously analyzed using a variety of statistical techniques. The results of the analyses showed that GELS was able to perform on a par with, and in many cases better than, the much more mature and extensively studied comparison algorithms. One of the GELS combinations achieved the best performance rating of any algorithm in solving instances of Bin Packing, and finished in a virtual tie with Simulated Annealing for solving instances of File Assignment and for general purpose performance. Two of the four GELS combinations were also shown to outperform Hill Climbing and the Genetic Algorithm.
GELS also performed its task efficiently. Two of the four combinations were shown to be more efficient in locating their solutions than any of the comparison algorithms except Hill Climbing (a greedy algorithm known to produce solutions in very few steps). The solutions produced by GELS were thus not only of comparable or better quality than those of the comparison algorithms, but usually were arrived at more efficiently.
-
CS-2004-09 (17 pages) (pdf)
Abstract
Most of the prevalent anomaly detection systems use some training data to build models. These models are then utilized to capture any deviations resulting from possible intrusions. The efficacy of such systems is highly dependent upon a training data set free of attacks. "Clean" or labeled training data is hard to obtain. This paper addresses the very practical issue of refinement of unlabeled data to obtain a clean data set which can then train an online anomaly detection system.
Our system, called MORPHEUS, represents a system call sequence using the spatial positions of motifs (subsequences) within the sequence. We also introduce a novel representation called sequence space to denote all sequences with respect to a reference sequence. Experiments on well known data sets indicate that our sequence space can be effectively used to purge anomalies from unlabeled sequences. Although an unsupervised anomaly detection system in itself, our technique is used for data purification. A "clean" training set thus obtained improves the performance of existing online host-based anomaly detection systems by increasing the number of attack detections.
-
CS-2004-10 (10 pages) (pdf)
Abstract
Finding meaningful phrases in a document has been studied in various information retrieval systems in order to improve the performance. Many previous statistical phrase finding methods had different aim such as document classification. Some are hybridized with statistical and syntactic grammatical methods; others use correlation heuristics between words. We propose a new phrase-finding algorithm that adds correlated words one by one to the phrases found in the previous stage, maintaining high correlation within a phrase. Our results indicate that our algorithm finds more meaningful phrases than an existing algorithm. Furthermore, the previous algorithm could be improved by applying different correlation functions.
-
CS-2004-11 (23 pages) (pdf)
Abstract
Incentive auctions let several participants to cooperate for clearing a set of offers and requests, ensuring that each participant cannot do better than by inputting his true utility. This increases the social welfare by efficient allocations, and is proven to have similar outcomes as the tradition English Auctions. The desk-mates (stable matchings) problem comes from the need of placing students in pairs of two for working in projects or seating in two-seats desks. The stable marriages problem consists of finding matches of a man and a woman out of two sets of men, respectively women. Each of the persons in the previous two problems has a (hopefully stable) secret preference between every two possible partners. The participants want to find an allocation satisfying their secret preferences and without leaking any of these secret preferences, except for what a participant can infer from the identity of the partner/spouse that was recommended to her/him.
We use a distributed weighted constraint satisfaction (DisWCSP) framework where the actual constraints are secrets that are not known by any agent. They are defined by a set of functions on some secret inputs from all agents. The solution is also kept secret and each agent learns just the result of applying an agreed function on the solution. The new framework is shown to improve the efficiency in modeling the aforementioned problems. We show how to extend our previous techniques to solve securely problems modeled with the new formalism, and exemplify with the two problems in the title.
-
CS-2004-12 (77 pages) (pdf)
Abstract
Sparse-matrix/dense-vector multiplication algorithms are not as highly developed as algorithms for dense matrices. Dense matrix multiplication algorithms have been made efficient by exploiting data locality, parallelism, pipelining, and other types of optimization. Sparse matrix algorithms, on the other hand, encounter low or no data locality, indirect addressing, and no easy way to exploit parallelism. In an effort to achieve savings in storage and computational time, the topic of sparse matrix representation is often revisited.
The first contribution of this thesis is the introduction of a new representation for sparse matrices. This representation is called here the Reduced Index Sparse (RIS) representation because an ordered pair (j,v) with a single index j is used for every nonzero matrix element. This considerably reduces the required disk storage from that needed by other representations like coordinate (COO) format, which uses an ordered triple (aij, i, j) with two indices, i and j for every nonzero matrix element.
The second contribution of this thesis is a modified block cyclic data distribution for sparse matrices with arbitrary nonzero structure. And the third contribution is the implementation of this distribution using RIS to perform a parallel sparse-matrix / dense-vector multiplication on a distributed memory computer using MPI. The implementation achieved good load balancing for sparse matrices of different sparsity patterns.
Software was written to generate large random sparse matrices having well preserved sparsity patterns. The desired number of nonzero elements and matrix density are user input. The sparsity pattern is also controlled by user input.
Performance of the new RIS storage scheme was measured against SPARSKIT routines. Efficiency and scalability of the parallel sparse-matrix/dense-vector multiplication was generally good and timing analysis shows the new RIS scheme is competitive.
-
CS-2004-13 (59 pages) (pdf)
Abstract
Due to the expanding existence of old software, legacy systems, and obsolete platforms with many industries, software re-engineering has become a widespread methodology that assists engineers and software practitioners with translating inflexible, unsupportable legacy software into maintainable software. Many companies today are investing in a variety of re-engineering techniques such as translation of source code to new code structures and target platforms to ensure future software maintenance can be performed in an efficient and effective manner. With sound re-engineering principles, the application of these techniques leverage the knowledge and previous engineering endeavors to mitigate risks and provide adequate performance to ensure that code attributes retain the functionality of the legacy systems while improving software quality.
In this thesis, an evaluation will be made: What effect does the re-engineering legacy system software have on quality characteristics, with respect to maintainability? The research focuses on determining if a re-engineered methodology of translating FORTRAN to C++ resulting code using an in-house developed translator, can truly re-engineer legacy procedural source code into maintainable object-oriented source code. Based on the metric data and analysis, key measurement results of the empirical data will interpret the translated code to ascertain whether it accurately reflects factors that influence software quality and maintainability. By addressing maintainability and using a set of metrics tailored to assess the criteria, a determination will be made based on the empirical evidence to support the alternative hypothesis that the re-engineered translation of FORTRAN to C++ source code has produced maintainable software. A high-level set of characteristics evaluated in this research include measures quantifying class-related software quality attributes of analyzability, changeability, stability and testability, which include a number of metrics attributes as size, structure, complexity, cohesion and coupling, with emphasis placed on areas of object-oriented characteristics.
The results of this thesis indicate that the re-engineered effort to translate FORTRAN to C++ source code did exhibit maintainable characteristics on the basis that a majority of the metrics examined correlated with high "Maintainability" standards. It is therefore recommended that based on this interpretation of data, opportunities to use the translator in the future for re-engineering efforts should be retained and implemented.
-
CS-2004-14 (10 pages) (pdf)
Abstract
In the recent years we have proposed a set of secure multiparty computations for solving distributed constraint satisfaction and optimization problems, with applications to areas like distributed scheduling, configurations, team-making, and auctions. Some of the newest versions were based on an arithmetic circuit for selecting a random element out of the elements with a given value in a (secret) array. Here we show how to improve that arithmetic circuit by a factor of at least 4. The improvement is based on an optimization of the functions and on the usage of CSP solvers to exploit public constraints.
-
CS-2004-15 (119 pages) (pdf)
Abstract
Malicious mobile code causes billions of dollars every year in damages, and that cost keeps increasing. Traditional signature-based anti-virus software is a reactive solution that can not detect fast spreading malicious code quickly enough to prevent widespread infection. If we hope to prevent widespread infection of future malicious mobile code, new prevention techniques must be developed that either stop a new infection completely, or at least limit the spread until signature-based anti-virus software can be updated.
Simulators exist that model the spread of malicious mobile code, but none currently exists that can efficiently model host-based and network-based spread prevention techniques and the effect that those techniques have on the spread of the infection. This thesis presents Hephaestus, which is a new simulator framework designed to meet these requirements and be flexible enough to meet future requirements. This thesis also presents the results of four experiments: one that models spread with no prevention techniques applied, one that models the effects of a monoculture on spread, one that models the effect of lost detection, and one that shows the effects of tar pits.
-
CS-2004-16 (10 pages) (pdf)
Abstract
Sending a message to a destination anonymously so that the destination cannot figure out who the originator was, and ensuring that the message reaches the destination untampered with, can be a difficult problem. Chaum's mix-net solves this problem but offers little motivation for a person to set up a mix-net server. This paper discusses a method that not only allows monetary payments as an incentive to create a public mix-net server, but also provides some fraud detection at the time a message is being sent. Also compared with other methods, the method described here can be considered reasonable from the users perspective, because if the message is not delivered the user does not lose any money.
-
CS-2004-17 (121 pages) (pdf)
Abstract
The study of Artificial Intelligence attempts to simulate the processes of human intelligence in a set of computable algorithms. The purpose of Random Algorithms in this field is to provide a best-guess approach at identifying the unknown. In this thesis, research shows that random algorithms are able to break down many intelligent processes into a set of solvable problems. For example, solving puzzles and playing games involve the same estimating ability shown in standard problems such as the Coupon Collector problem or the Monty Hall problem. This thesis shows Random Algorithmic applications in two overlapping categories of intelligent behavior: Pattern Recognition (to solve puzzles) and Mind Simulation (to play games). The first category focuses on one of the prominent intelligent processes, recognizing patterns from randomness, which the human mind must continually and dynamically perform. The second category deals with simulating the processes of making decisions and solving problems in a more abstract and uncontrolled way, much like the unpredictable human mind.