{"id":14070,"date":"2024-06-04T09:40:16","date_gmt":"2024-06-04T16:40:16","guid":{"rendered":"https:\/\/www.hmc.edu\/clinic\/?p=14070"},"modified":"2024-06-04T09:41:11","modified_gmt":"2024-06-04T16:41:11","slug":"automatic-spark-tuning-for-low-waste-batch-processing","status":"publish","type":"post","link":"https:\/\/www.hmc.edu\/clinic\/2024\/automatic-spark-tuning-for-low-waste-batch-processing\/","title":{"rendered":"Automatic Spark Tuning For Low Waste Batch Processing"},"content":{"rendered":"<h2 class=\"project-sponsor-dept-year\"><span class=\"project-sponsor\">Quantcast<\/span> <span class=\"project-dept-year\">Computer Science\/Mathematics, 2023\u201324<\/span><\/h2>\n<div class=\"project-team\">\n<p><strong>Liaison(s):<\/strong> Theo Bayard de Volo PZ \u201922, Scott McCoy<br \/>\n<strong>Advisor(s):<\/strong> Mark Kampe<br \/>\n<strong>Students(s):<\/strong> Tesfa Asmara (TL-S), Liam Martin (TL-F), Teja Reddy, Jimmy Chen, Jaime Pacheco<\/p>\n<\/div>\n<div class=\"project-abstract\">\n<p>Quantcast is an American technology company, founded in 2006, that specializes in AI-driven real-time advertising, audience insights and measurement. Many of Quantcast\u2019s data-workflows run atop Apache Spark. While Spark has many built-in optimizations, Quantcast has noticed that the clusters they run on are leaving a significant portion of their memory and processors unused. The goal of this project was to develop a system extension that automatically tunes Spark configurations. We developed a Spark plugin to capture critical statistics previously unavailable directly from Spark, such as memory and CPU utilization. The team has also been diligently working on training linear regression and decision tree models using simulated data sets to recommend more efficient cluster configurations for the Spark jobs. Next steps involve assessing the models\u2019 predictive accuracy and reliability by applying these models to real data from Spark jobs at Quantcast by incorporating our plugin.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Quantcast Computer Science\/Mathematics, 2023\u201324 Liaison(s): Theo Bayard de Volo PZ \u201922, Scott McCoy Advisor(s): Mark Kampe Students(s): Tesfa Asmara (TL-S), [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[140,185],"tags":[673],"class_list":["post-14070","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-mathematics","tag-quantcast"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/posts\/14070","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/comments?post=14070"}],"version-history":[{"count":2,"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/posts\/14070\/revisions"}],"predecessor-version":[{"id":14128,"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/posts\/14070\/revisions\/14128"}],"wp:attachment":[{"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/media?parent=14070"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/categories?post=14070"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hmc.edu\/clinic\/wp-json\/wp\/v2\/tags?post=14070"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}