Testing AI-based Applications: Challenges and Solutions

Khaled Slhoub

Florida Tech

Abstract

Artificial Intelligence is transforming modern software through applications like Generative AI (GenAI), which enable tasks such as natural language understanding, content creation, and image generation. These AI-based applications, however, pose distinct challenges compared to traditional software, particularly due to their reliance on data, dynamic learning, and probabilistic outputs. Key issues such as bias, explainability, security, and ethical considerations complicate the testing and evaluation processes. Ensuring rigorous testing is critical to guarantee the quality of the content produced by these systems, protecting against harmful outputs and maintaining reliability and fairness. This presentation explores the key challenges of testing AI-based applications and emphasizes the importance of thorough testing to ensure correctness, accuracy, fairness, safety, and high-quality outcomes in AI-driven systems.

About the Speaker

Dr. Khaled Slhoub is an Associate Professor in the Department of Electrical Engineering and Computer Science at Florida Institute of Technology, where he also serves as the Program Chair for Computer Information Systems and Human-Centered Design. His research focuses on software engineering, particularly in software requirements, testing, and measurement, as well as the formal development of agent-based systems, social media analysis, and the evaluation and testing of AI-based systems. A key aspect of his work involves creating standardized frameworks to formalize the development of agent-based systems. Currently, he is dedicated to analyzing and improving agent-oriented methodologies with the goal of providing unified development approaches that can be applied in industrial contexts. Additionally, Dr. Slhoub is developing a framework to detect and manage disruptive behaviors in distributed social bots, employing policing bots to assess risk. His research also extends to the verification and testing of autonomous systems, focusing on detecting irregular behavior. He is further exploring methods to evaluate and test Generative AI models to ensure their accuracy, fairness, and robustness across various domains.