I am currently a researcher at Numbers Station part of the Numbers Station Labs where I think about all things foundation models and data tasks.
Before Numbers Station, I was a PostDoc at Stanford working with Chris Ré in the Hazy Research Lab. In August of 2019, I graduated with a PhD from Paul G Allen School for Computer Science and Engineering at the University of Washington in Seattle. I was part of the Database Group and advised by Dan Suciu and Magdalena Balazinska.
For my undergraduate degree, I went to Carleton College in Northfield, MN, where the city's motto is "Cows, Colleges, and Contentment" and graduated in 2013 as a Computer Science and Mathematics double major.
My research interests are broadly at the intersection of artificial intelligence, foundation models, and data management. I focus on how to train, customize, and deploy foundation models to data tasks. This includes problems around data curation and management for RAG systems, efficient model training and inference for batch workloads, and agentic prompting paradigms for end-to-end analytic workflow automation.
I am a 2020 winner of the IC Postdoc Research Fellowship Program and am one of the 2015 winners of the NSF GRFP in Computer Science. In the summer of 2016 and 2017, I interned at Microsoft Research as a PhD research intern, and in the summer of 2015, I interned at Tableau as a software developer. From the summer of 2012 to the spring of 2015, I interned at Sandia National Laboratories working on high performance computing and image reconstruction.
Meadow: A Framework for Multi-Agent Data Workflows
[repo]
Manifest: Prompt Programming for Foundation Models
[repo]
Text-to-SQL That Isn’t. Laurel Orr and Chris Aberger.
Team: Numbers Station Labs
[blog]
DuckDB-NSQL: How to Quack in SQL. Laurel Orr and Sen Wu.
Team: Numbers Station Labs
[blog]
Introducing NSQL: Open-source SQL Copilot Foundation Models.Sen Wu, Laurel Orr, and Manasi Ganti.
Team: Numbers Station Labs
[blog]
SQL Coding Assistants Customized to Enterprise Logs. Laurel Orr, Xiao Ling, and Vishal Motwan.
Team: Numbers Station Labs
[blog]
Misral-A Journey Torwards Reproducible Language Model Training. Laurel Orr* and Siddharth Karamcheti*.
Team: Jason Bolton, Tianyi Zhang, Karan Goel, Avanika Narayan, Rishi Bommasani, Deepak Narayanan
Advisors: Tatsunori Hashimoto, Dan Jurafsky, Christopher D. Manning, Christopher Potts, Christopher
Ré, Percy Liang
[blog], [talk]
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation. Laurel
Orr, Megan Leszczynski, Simran Arora, Neel Guha, Xiao Ling, Sen Wu, and Christopher Ré
[blog]
Ask Me Anything: A simple strategy for prompting language models. Simran Arora, Avanika Narayan,
Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré.
arXiv 2022.
[paper]
Can Foundation Models Wrangle Your Data? Avanika Narayan, Ines Chami, Laurel Orr,
Christopher Ré.
arXiv 2022.
[paper]
Data Management Opportunities for Foundation Models. Laurel Orr, Karan Goel,
Christopher Ré. CIDR 2022.
[paper]
On the Opportunities and Risks of Foundation Models (Lead of Data Section). Laurel
Orr, Simran Arora, Karan Goel, Avanika Narayan, Michael Zhang, Christopher Ré. arXiv 2021.
[paper]
Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text. Maya Varma,
Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré. EMNLP 2021.
[paper]
Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems (Tutorial).
Laurel Orr, Atindriyo Sanyal, Xiao Ling, Karan Goel, Megan Leszczynski. VLDB 2021.
[paper], [slides]
Goodwill Hunting: Analyzing and Repurposing Off-the-Shelf Named Entity Linking Systems. Karan Goel,
Laurel Orr, Nazneen Fatema Rajani, Jesse Vig, Christopher Ré. NAACL Industry 2021.
[paper]
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation. Laurel
Orr*, Megan Leszczynski*, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, Christopher Ré. CIDR
2021.
[paper], [talk]
Mosaic: A Sample-Based Database System for Open World Query Processing. Laurel Orr,
Samuel Ainsworth, Walter Cai, Kevin Jamieson, Magda Balazinska, Dan Suciu. CIDR 2020.
[paper]
Sample Debiasing in the Themis Open World Database System. Laurel Orr, Magdalena
Balazinska, and Dan Suciu. SIGMOD 2020.
[paper]
Pushing Data-Induced Predicates Through Joins in Big-Data Clusters. Srikanth Kandula, Laurel
Orr, and Surajit Chaudhuri. VLDB 2019.
[paper]
EntropyDB: A Probabilistic Approach to Approximate Query Processing. Laurel Orr,
Magdalena Balazinska, and Dan Suciu. VLDB Journal 2019.
[paper]
Probabilistic Database Summarization for Interactive Data Exploration. Laurel Orr,
Magdalena Balazinska, and Dan Suciu. VLDB 2017.
[paper]
Explaining Query Answers with Explanation-Ready Databases. Sudeepa Roy, Laurel Orr,
and Dan Suciu. VLDB 2015.
[paper]
Big-Data Management Use-Case: A Cloud Service for Creating and Analyzing Galactic Merger Trees. S.
Loebman, J. Ortiz, L. Choo, L. Orr, L. Anderson, D. Halperin, M. Balazinska, T. Quinn, F. Governato. SIGMOD
Workshop on Data Analytics in the Cloud (DanaC) 2014.
[paper]
Cluster-Based Approach to a Multi-GPU CT Reconstruction Algorithm. Laurel J. Orr, Edward S. Jimenez, Kyle R. Thompson. Conference Proceedings for the IEEE Nuclear Science Symposium and Medical Imaging Conference 2014.
Preparing for the 100-Megapixel Detector: Reconstruction a Multi-Terabyte Computed Tomography Dataset. Laurel J. Orr, and Edward S. Jimenez. Conference Proceedings for the Penetrating Radiation Systems and Applications XIV Workshop at the SPIE International Symposium on SPIE Optical Engineering+Applications 2013.