Many biological advances could be made were it possible to better identify and quantify the molecular contents of biological samples. Mass spectrometers are one key device used for measuring such samples, and software is used to translate experimental measurements into understandable information. This project will develop two new software methods to identify and quantify a wider range of the molecular contents of biological samples. Each solution aims to increase both the number of molecular measurements that can be extracted and the confidence in the accuracy of the measurements, allowing scientists who rely on mass spectrometry to answer more experimental questions than is possible with current software tools. This has the potential to enable breakthroughs in basic biological sciences such a cell biology, as well as applied fields like medicine, where limitations in the amount of information per assay drives up the cost of discovery. In addition to facilitating scientific progress, this project will provide pathways into STEM training for students through industry internships at the graduate and undergraduate levels. Formal understanding of the STEM principles will be facilitated through coursework modules at a number of levels, including some that aim to help high school students transition to enrollment in undergraduate computer science and computational biology programs. In particular, high-school students will learn aspects of basic programming literacy expected in first-year computer science courses. Mass spectrometers are hardware detectors that observe signal corresponding to the identities and quantities of molecules in a sample, including those in the liquid, solid, or gaseous phase. Instrumental output requires signal processing steps to render results interpretable. The complex nature of biological samples leads to considerable overlap in the mass spec signals and hence considerable computational challenges in identifying individual molecular species. Because of the combinatorial nature of the number and composition of the molecules in samples, current algorithms for mass spectrometry interpretation do not scale well, so they generally sub-select and analyze only a few of the measurements. This limits the effectiveness of the method when samples include multiple types of molecules, when molecules having similar chemical characteristics are present, and when molecules are low in abundance. New algorithms have the potential to overcome these data processing limitations. This project will develop two novel computational approaches for identifying the complete set of possible biomolecules that could produce a given spectrum from a sample in tractable time. The first approach will focus on leveraging data patterns to extract latent information from mass spectrometry tandem mass spectra. The second approach will use novel data representations to shrink the search space for potential pattern matches in protein sequences. These algorithms will increase coverage, accuracy, and sensitivity of proteomics results from mass spectrometry measurements, for both modified and unmodified proteins, enabling testing of numerous biological hypotheses precluded by the limitations of current methods. The successful completion of the research will provide expanded access to the currently unused information in mass spectrometry experiments for scientists by creating publicly available advanced algorithms for data processing. Additionally, the project includes two explicit outreach programs to enable broader participation in bioinformatics through curriculum for students new to computer science and computational biology and research experience opportunities at the undergraduate and graduate levels. The results of this project will be posted at ms.cs.umt.edu. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.