{"id":5601,"date":"2026-03-23T10:19:53","date_gmt":"2026-03-23T17:19:53","guid":{"rendered":"https:\/\/www.hmc.edu\/calendar\/?post_type=event&#038;p=5601"},"modified":"2026-03-30T13:53:55","modified_gmt":"2026-03-30T20:53:55","slug":"cs-colloquium-a-theoretical-cs-lens-on-language-modeling-clayton-sanford","status":"publish","type":"event","link":"https:\/\/www.hmc.edu\/calendar\/events\/cs-colloquium-a-theoretical-cs-lens-on-language-modeling-clayton-sanford\/","title":{"rendered":"CS Colloquium: \u201cA Theoretical CS Lens on Language Modeling,\u201d Clayton Sanford"},"content":{"rendered":"<p>Multi-layer transformer models form the backbone of modern deep learning, yet little mathematical work details their benefits and deficiencies as compared with other architectures. This makes it difficult to answer practical and fundamental questions about the transformer architecture: What powers increase with model depth? Can alternative architectures improve efficiency without sacrificing expressivity? Clayton Sanford presents a communication-based theoretical framework for understanding the representational capabilities and limitations of multi-layer transformers. These results imply that parallelizability is a key property of the standard transformer that other architectures cannot easily replicate. Stanford contextualizes these results within the broader conversation about the challenges of developing a principled theory of neural networks and share opinions on how theoretical computer science can remain relevant to their study.<\/p>\n<h2>Speaker<\/h2>\n<p>Clayton Sanford is a senior research scientist at Google, where he works on distillation and pretraining for Gemini. He has a PhD in computer science from Columbia University and studied machine learning theory with advisors Rocco Servedio and Daniel Hsu. His research focuses on the theoretical capabilities of neural architectures, particularly transformers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Multi-layer transformer models form the backbone of modern deep learning, yet little mathematical work details their benefits and deficiencies as [&hellip;]<\/p>\n","protected":false},"author":185,"featured_media":0,"template":"","class_list":["post-5601","event","type-event","status-publish","hentry","event-categories-faculty","event-categories-staff","event-categories-students"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.hmc.edu\/calendar\/wp-json\/wp\/v2\/event\/5601","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hmc.edu\/calendar\/wp-json\/wp\/v2\/event"}],"about":[{"href":"https:\/\/www.hmc.edu\/calendar\/wp-json\/wp\/v2\/types\/event"}],"author":[{"embeddable":true,"href":"https:\/\/www.hmc.edu\/calendar\/wp-json\/wp\/v2\/users\/185"}],"wp:attachment":[{"href":"https:\/\/www.hmc.edu\/calendar\/wp-json\/wp\/v2\/media?parent=5601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}