Two important goals in the evaluation of artificial intelligence systems are to assess the merit of alternative design decisions in the performance of an implemented computer system and to analyze the impact in the performance when the system faces problem domains with different characteristics. Achieving these objectives enables us to understand the behavior of the system in terms of the theory and design of the computational model, to select the best system configuration for a given domain, and to predict how the system will behave when the characteristics of the domain or problem change. In addition, for case-based reasoning and other machine learning systems, it is important to evaluate the improvement in the performance of the system with experience (or with learning), to show that this improvement is statistically significant, to show that the variability in performance decreases with experience (convergence), and to analyze the impact of the design decisions on this improvement in performance.
We present a methodology for the evaluation of CBR and other AI systems through systematic empirical experimentation over a range of system configurations and environmental conditions, coupled with rigorous statistical analysis of the results of the experiments. We illustrate this methodology with a case study in which we evaluate a multistrategy case-based and reinforcement learning system which performs autonomous robotic navigation. In this case study, we evaluate a range of design decisions that are important in CBR systems, including configuration parameters of the system (e.g., overall size of the case library, size or extent of the individual cases), problem characteristics (e.g., problem difficulty), knowledge representation decisions (e.g., choice of representational primitives or vocabulary), algorithmic decisions (e.g., choice of adaptation method), and amount of prior experience (e.g., learning or training). We show how our methodology can be used to evaluate the impact of these decisions on the performance of the system and, in turn, to make the appropriate choices for a given problem domain and verify that the system does behave as predicted.
Read the paper: