Harvey Mudd CS Students Publish and Present WorkApril 29, 2021
The Harvey Mudd College Department of Computer Science celebrates students’ paper acceptances and presentations at peer-reviewed archival conferences this spring.
Each Harvey Mudd CS faculty member has an active research program that involves undergraduates. A substantial portion of this research is funded by the National Science Foundation. The department has several active NSF grants, and additional research support comes from sources including the Howard Hughes Medical Institute (HHMI), the Mellon Foundation, and the Baker and Rose-Hills Foundations.
“Metrinome: Path Complexity Predicts Symbolic Execution Path Explosion”
The paper and software demo by Gabe Bessler ’20, Josh Cordova ’22, Shaheen Cullen-Baratloo ’23, Sofiane Dissem ’22, Emily Lu ’21, Ibrahim Abughararh ’22 and Sofia Devin ’22 was accepted at the International Conference on Software Engineering 2021.
Lucas Bang, advisor and assistant professor of computer science, notes that Bessler has worked on the project since 2019. “Gabe’s enthusiasm, hard work, leadership, and breadth and depth of technical skills have been crucial to the success of this long-term project,” Bang says.
The team presented Metrinome, a tool for performing automatic path complexity analysis of C functions. The path complexity of a function is an expression that describes the number of paths through the function up to a given execution depth. Metrinome constructs the control flow graph (CFG) of a C function using LLVM utilities, analyzes that CFG using algebraic graph theory and analytic combinatorics, and produces a closed-form expression for the path complexity as well as the asymptotic path complexity of the function. Our experiments show that path complexity predicts the growth rate of the number of execution paths that Klee, a popular symbolic execution tool, is able to cover within a given exploration depth. Metrinome is open-source, available as a Docker image for immediate use, and all of our experiments and data are available in our repository and included in our Docker image.
“Undecidability of Underfitting”
Sonia Sehra ’20, David Flores ’21, Professor George Montañez. Second International Conference on Computing and Data Science 2021, Jan. 28, 2021
Abstract: Using recent machine learning results that present an information-theoretic perspective on underfitting and overfitting, we prove that deciding whether an encodable learning algorithm will always underfit a dataset, even if given unlimited training time, is undecidable. We discuss the importance of this result and potential topics for further research, including information-theoretic and probabilistic strategies for bounding learning algorithm fit.
“The Gopher’s Gambit: Survival Advantages of Artifact-Based Intention Perception” | Short presentation video
Cynthia Hom ’23, Amani Maina-Kilaas ’23, K Ginta (Biola University), C. Lay ’22 CMC, George Montañez. 13th International Conference on Agents and Artificial Intelligence 2021, Feb. 4-6, 2021
Abstract: Being able to assess and calculate risks can positively impact an agent’s chances of survival. When other intelligent agents alter environments to create traps, the ability to detect such intended traps (and avoid them) could be life-saving. We investigate whether there are cases for which an agent’s ability to perceive intention through the assessment of environmental artifacts provides a measurable survival advantage. Our agents are virtual gophers assessing a series of room-like environments, which are potentially dangerous traps intended to harm them. Using statistical hypothesis tests based on configuration coherence, the gophers differentiate between designed traps and configurations that are randomly generated and most likely safe, allowing them access to the food contained within them. We find that gophers possessing the ability to perceive intention have significantly better survival outcomes than those without intention perception in most of the cases evaluated.
“A Probabilistic Theory of Abductive Reasoning”
Nico Espinosa Dice ’22, Megan Kaye ’22, H. Ahmed SCR ’23, George Montañez. 13th International Conference on Agents and Artificial Intelligence 2021, Feb. 4-6, 2021
Abstract: We present an abductive search strategy that integrates creative abduction and probabilistic reasoning to produce plausible explanations for unexplained observations. Using a graphical model representation of abductive search, we introduce a heuristic approach to hypothesis generation, comparison, and selection. To identify creative and plausible explanations, we propose 1) applying novel structural similarity metrics to a search for simple explanations, and 2) optimizing for the probability of a hypothesis’ occurrence given known observations.
“A Castro Consensus: Understanding the Role of Dependence in Consensus Formation”
Jana Allen ’22, C. Lay CMC ’22, George Montañez. Conference on Truth and Trust Online 2020, Oct. 16-17, 2020
Abstract: Consensus is viewed as a proxy for truth in many discussions of science. When a consensus is formed by the independent and free deliberations of many, it is indeed a strong indicator of truth. Yet not all consensuses are independent and freely formed. We investigate the role of dependence and pressure in the formation of consensus, showing that strong polarization, external pressure, and dependence among individuals can force consensus around an issue, regardless of the underlying truth of the affirmed position. Dependence breaks consensus, often rendering it meaningless; a consensus can only be trusted to the extent that individuals are free to disagree with it.
“An Information-Theoretic Perspective on Overfitting and Underfitting”
Daniel Bashir ’20, George Montañez, Sonia Sehra ‘20, Pedro Sandoval Segura ’19, Julius Lauw ’20. Australasian Joint Conference on Artificial Intelligence 2020, Nov. 29-30, 2020
Abstract: We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.