Computer Science Technical Reports
2005
-
CS-2005-01 (130 pages) (pdf)
Title Web-based Trend Analysis of Rocket Data Using XML Authors Mark Randall Gibson Contact Email Address Faculty Sponsor Dr. Rhoda Baggs Koss TR posted 22 January 2005 Abstract
This paper will discuss and prove the feasibility of using XML to transfer, compare, and view data created during automated testing of black boxes; stored both in simple text format and relational database format. This system arises from the need for the reliability department to review the test data from the flight hardware at all testing locations. For the purposes of this paper we will refer to this system as the "Test Data Retrieval System" (TDRS). This data review is being done to determine if there are any negative trends in the hardware that may be surfacing. This data resides in two different filing systems and must be displayed together at the same time to be of much value.
In this study a "system" will be presented consisting of a web-server and several web pages containing "PHP" scripting to interface with a "MYSQL" database and text files to extract test data. The text files contain the test data of the hardware (black box's) acceptance testing done at the vendor and at the final assembly plant. This web server based system uses Apache as the web server and contains the add-ins for XML and PHP. The XML formatted data is being used to be portable to other platforms other than Windows. The PHP scripting language is employed to aid in the use of transmitting the XML formatted data and makes dynamic web page generation relatively easy, and is also portable to other platforms other than Windows.
One of the main benefits of the proposed system (with the implementation of PHP) is that it provides a seamless integration of popular "open-source" software applications to achieve the desired results. Another benefit is that "basic" code is readily available from many different sources and requires little modification to adapt to successfully meet the requirements of this endeavor.
-
CS-2005-02 (7 pages) (pdf)
Abstract
Besides voting, another way of practicing democracy is by signing popular / citizen initiatives or optional referendums of certain types. We address the problems related to the electronic remote signing of support for referendums and popular initiatives (e-referendums). We propose, publicly verifiable private credentials (PVPCs) , a type of certified pseudonym digital signatures for which one can publicly prove that only eligible users (belonging to a specified group) got them and that no user got two of them. They can be used to sign citizen initiatives. Our technique generates PVPCs that can be publicly proven to belong to an unknown permutation of the eligible users (proving the aforementioned property). We also argue that e-referendum systems can achieve more success than remote e-voting (being more robust to the main weaknesses of the SERVE project, namely denial-of-service, "Man in the middle", and virus attacks). In particular we provide a technique to reduce the risk of exposure to virus attacks by incorporating manually generated passwords into computer-generated random numbers.
-
CS-2005-03 (12 pages) (pdf)
Abstract
Popular / citizen initiatives provide a way for the inclusion of constitutional or statutory proposals on the ballot (e.g., at an election) if enough signatures are collected in support of the proposal. Once citizens are enabled to digitally sign such initiatives remotely, the next challenge will be to provide support for verified eligible citizens to debate on running initiatives. Intelligent ways of structuring information for easy access and cooperation is a major research interest in computer science, with results like WWW, Semantic Web, Forums, Blogs, Slashdot. We propose here a new interaction paradigm for debates in the setting where participants are verified for eligibility and have equal weight. The estimation of the popular support for proposed initiatives and emerging comments (justifications) forms a basis (and a by-product) of such debates. The idea central to our approach (akin to Slashdot) is to use voting for visibility and visibility for voting. A novelty lies in the generalization of this concept to a second layer, namely to initiatives (articles). To defend users for spam and distasteful language, we propose a scalable solution based on a collaborative filtering paradigm. The technique also helps authors to write texts that are acceptable for the users, and can find application for forums in general. This paradigm can find additional applications in supporting debates of shareholders for decision making, as well as for debates of working groups and committees, or for the administration of other entities like towns and countries. An implementation of the system is available.
-
CS-2005-04 (14 pages) (pdf)
Abstract
We describe a multi-dimensional time series anomaly detection method in which each point in a test series is required to match the value, slope, and curvature of a point seen in training (with an optional sequential constraint), and a method of generating a model in the form of concise, human-comprehensible, and editable rules. Training time complexity is O(n log n), and testing is O(n). The method generalizes to multiple training series by allowing test points to fall within the range bounded by the training data. We use this approach to detect test failures in Marotta fuel control valves used on the Space Shuttle.
-
CS-2005-05 (8 pages) (pdf)
Abstract
A user's interest in a web page can be estimated by unobtrusively (implicitly) observing his or her behaviour rather than asking for feedback directly (explicitly). Implicit methods are naturally less accurate than explicit methods, but they do not waste a user's time or effort. Implicit indicators of a user's interests can also be used to create models that change with a user's interests over time. Research has shown that a user's behavior is related to his/her interest in a web page. We evaluate previously studied implicit indicators and examine the time spent on a page in more detail. For example, we observe whether a user is really looking at the monitor when we measure the time spent on a web page. Our results indicate that the duration is related to a user's interest of a web page regardless a user's attention to the web page.
-
CS-2005-06 (318 pages) (pdf)
Abstract
Network security systems today such as current intrusion detection systems, intrusion prevention systems and firewalls are good at reacting to attacks as they occur or shortly after they occur. Current security systems lack the ability to identify and detect the activity that usually precedes an attack. This activity is known as network reconnaissance. In this thesis we have developed a technique that can assist current security systems to detect hostile network reconnaissance to anticipate and mitigate network attacks.
-
CS-2005-07 (21 pages) (pdf)
Title Zero-Knowledge Proofs for Mix-nets of Secret Shares Authors Dr. Marius C. Silaghi Contact Email Address Faculty Sponsor Dr. Marius C. Silaghi TR number assignment date 7 March 2005 TR posted 7 March 2005 TR modified 9 March 2005 (Old version here) TR modified 14 March 2005 (Old version here) TR modified 13 April 2005 (Old version here) TR modified 26 April 2005 (Old version here) TR modified 1 May 2005 (Old version here) Abstract
Mix-nets can be used to shuffle vectors of shared secrets. This operation can be an important building block for solving combinatorial problems where constraints are secret to different participants. A main contribution of this paper is to show how participants in the mix-net can provide Zero-Knowledge proofs to convince each other that they do not tamper with the shuffled secrets, and that inverse permutations are correctly applied. The approach is related to the proof of knowing an isomorphism between large graphs. We also make a detailed review and comparison with rationales and analysis of Chaum's and Merritt's mix-nets.
Mix-nets offer only computational security since participants get encrypted versions of all the shares. Information theoretically secure algorithms can be obtained using secure arithmetic circuit evaluation. The arithmetic circuit previously proposed for shuffling a vector of size k was particularly slow. Here we also propose a new arithmetic circuit for performing the operation in O(k^2) multiplications and requiring k-1 shared random numbers with different domains. Another contribution is to provide more efficient arithmetic circuits for combinatorial optimization problems, exploiting recent secure primitives. Examples are shown of how these techniques can be used in the Secure Multi-party Computation (SMC) language [Sil04b]. SMC's procedures for generating uniformly distributed random permutations are also detailed.
-
CS-2005-08 (9 pages) (pdf)
Title Trajectory Boundary Modeling of Time Series for Anomaly Detection Authors Matthew V. Mahoney & Philip K. Chan Contact Email Address
Faculty Sponsor Dr. Philip K. Chan TR number assignment date 23 March 2005 TR posted 23 March 2005 TR modified 2 May 2005 (Old version here) Abstract
We address the problem of online detection of unanticipated modes of mechanical failure given a small set of time series under normal conditions, with the requirement that the anomaly detection model be manually verifiable and modifiable. We specify a set of time series features, which are linear combinations of the current and past values, and model the allowed feature values by a sequence of minimal bounding boxes containing all of the training trajectories. The model can be constructed in O(n log n) time. If there are at most three features, the model can be displayed graphically for verification, otherwise a table is used. Test time is O(n) with a guaranteed upper bound on computation time for each test point. The model compares favorably with anomaly detection algorithms based on Euclidean distance and dynamic time warping on the Space Shuttle Marrotta fuel control valve data set.
-
CS-2005-09 (80 pages) (pdf)
Abstract
Keyword spotting techniques deal with recognition of predefined vocabulary keywords from a voice stream. This research uses HMM based keyword spotting algorithms for this purpose. The three most important components of a keyword detection system are confidence measure, pruning technique and evaluation of results. We suggest that best match for a keyword would be an alignment in which all constituent states have high emission probabilities. Therefore score of even the worst subsequence must also be better than a threshold and a path can be represented by the score of its worst subsequence match. This confidence measure is called Real Fitting. The harsher the pruning in a technique, the fewer paths survive. This increases the speed as well as the risk of pruning the best match. Three levels of pruning are explored and results and performance are compared. Since the proposed algorithms do not follow the principle of optimality, possible optimal paths could be pruned. Therefore we propose a pruning that permits a set of paths to be retained in every frame instead of a single path. Finally, the evaluation of these algorithms plays an important role in assessing the performance of these algorithms. Since the choice of keywords can affect the results, an equal opportunity evaluation is proposed which evaluates the algorithm on their respective best keywords.
-
CS-2005-10 (101 pages) (pdf)
Abstract
We present Mindshare, a system for small group collaboration using Peer to Peer networking technology. This paper details the motivation behind its design, how it benefits users and details of its construction and operation. The solution focuses on the needs of small collaborating groups with limited computing experience and resources. Mindshare allows the group to share an unlimited number of files and visualize them in unified hierarchical file system. Mindshare synchronizes the files between users without user input. Its robust design allows files to be shared even when the owner is offline and allows users to work with files from the groups while not connected.
-
CS-2005-11 (10 pages) (pdf)
Abstract
Web search engines are usually designed to serve all users, without considering the interests of individual users. Personalized web search incorporates an individual user's interests when deciding relevant results to return. We propose to learn a user profile, called a user interest hierarchy (UIH), from web pages that are of interest to the user. The user's interest in web pages will be determined implicitly, without directly asking the user. Using the implicitly learned UIH, we study methods that rank the results from a search engine. Experimental results indicate that our personalized ranking methods, when used with a popular search engine, can yield more relevant web pages for individual users.
-
CS-2005-12 (204 pages) (pdf)
Abstract
Most web search engines are designed to serve all users in a general way, without considering the interests of individual users. In contrast, personalized web search engines incorporate an individual user's interests when choosing relevant web pages to return. In order to provide a more robust context for personalization, a user interest hierarchy (UIH) is presented. The UIH extracts a continuum of general to specific user interests from web pages and generates a uniquely personalized order to search results.
This dissertation consists of five main parts. First, a divisive hierarchical clustering (DHC) algorithm is proposed to group words (topics) into a hierarchy where more general interests are represented by a larger set of words. Second, a variable-length phrase-finding (VPF) algorithm that finds meaningful phrases from a web page is introduced. Third, two new desirable properties that a correlation function should satisfy are proposed. These properties will help understand the general characteristics of a correlation function and help choose or devise correct correlation functions for an application domain. Fourth, methods are examined that rank the results from a search engine depending on user interests based on the contents of a web page and the UIH. Fifth, previously studied implicit indicators for interesting web pages are evaluated. The time spent on a web page and other new indicators are examined in more detail as well.
Experimental results indicate that the personalized ranking methods presented in this study, when used with a popular search engine, can yield more relevant web pages for individual users. The precision/recall analysis showed that our weighted term scoring function could provide more accurate ranking than Google on average.
-
CS-2005-13 (12 pages) (pdf)
Abstract
This paper addresses the problem of detecting keywords in unconstrained speech. The proposed algorithms search for the speech segment maximizing the average observation probability along the most likely path in the hypothesized keyword model. As known, this approach (sometimes referred to as sliding model method) requires a relaxation of the begin/endpoints of the Viterbi matching, as well as a time normalization of the resulting score. This makes solutions complex (i.e., LN2/2 basic operations for keyword HMM models with L states and utterances with N frames).
We present here two alternative (quite simple and efficient) solutions to this problem. a) First we provide a method that finds the optimal segmentation according to the criteria of maximizing the average observation probability. It uses Dynamic Programming as a step, but does not require scoring for all possible begin/endpoints. While the worst case remains O(LN2), this technique converged in at most 3(L+2)N basic operations in each experiment for two very different applications. b) The second proposed algorithm does not provide a segmentation but can be used for the decision problem of whether the utterance should be classified as containing the keyword or not (provided a predefined threshold on the acceptable average observation probability). This allows the algorithm to be even faster, with fix cost of (L+2)N.
-
CS-2005-14 (11 pages) (pdf)
Title Secure Stochastic Multi-party Computation for Combinatorial Problems Authors Marius C. Silaghi & Gerhard Friedrich Contact Email Address Faculty Sponsor Dr. Marius C. Silaghi TR number assignment date 21 May 2005 TR posted 21 May 2005 TR modified 23 May 2005 (Old version here) TR modified 27 May 2005 (Old version here) Abstract
High levels of security often imply that the computation time should be independent of the value of involved secrets. When the expected answer of the solver is either a solution or "unsatisfiable", then the previous assumption leads to algorithms that take always the computation time of the worst case. This is particularly disturbing for NP-hard combinatorial problems.
Here we start from the observation that sometimes (specially for hard problems) users find it acceptable to receive as answer either a solution, the answer "unsatisfiable" or a failure with meaning "don't know". More exactly users accept "incomplete" solvers. For certain problems privacy reasons lead users to prefer having an answer meaning "don't know" even when the secure multi-party computation could have proven "unsatisfiable" (to avoid revealing that all alternatives are infeasible). While the solution proposed previously is slower than complete algorithms, here we show secure stochastic solutions that are faster than complete solvers, allowing to address larger problem instances.
-
CS-2005-15 (7 pages) (pdf)
Abstract
It is known that, in general, Constraint Optimization Problems (COP) are NP-hard. Existing arithmetic circuits for secure protocols solving such problems are exponential in the number of variables, $n$. Recently a combinatorial optimization algorithm was proposed whose cost is exponential only in a parameter of the Depth First Search tree (DFS) of the constraint graph, smaller than $n$. We show how to construct an arithmetic circuit with this property and solving any COP. For forest constraint graphs, this leads to a linear cost secure solver.
-
CS-2005-16 (6 pages) (pdf)
Abstract
Until recently the state of the art in lossless data compression was prediction by partial match (PPM). A PPM model estimates the next-symbol probability distribution by combining statistics from the longest matching contiguous contexts in which each symbol value is found. We introduce a context mixing model which improves on PPM by allowing contexts which are arbitrary functions of the history. Each model independently estimates a probability and confidence that the next bit of data will be 0 or 1. Predictions are combined by weighted averaging. After a bit is arithmetic coded, the weights are adjusted along the cost gradient in weight space to favor the most accurate models. Context mixing compressors, as implemented by the open source PAQ project, are now top ranked on several independent benchmarks.