Quickstart

Here we briefly introduce spectrum_utils’ spectrum processing and visualization functionality:

  • Load a spectrum from an online data resource by its Universal Spectrum Identifier (USI).

  • Restrict the mass range to 100–1400 m/z to filter out irrelevant peaks.

  • Remove the precursor peak.

  • Remove low-intensity noise peaks by only retaining peaks that are at at least 5% of the base peak intensity and restrict the total number of peaks to the 50 most intense peaks.

  • Scale the peak intensities by their square root to de-emphasize overly intense peaks.

  • Annotate peaks corresponding to a, b, and y peptide fragments in the spectrum based on a ProForma 2.0 peptide string.

  • Visualize the spectrum with the annotated peaks highlighted.

IO functionality to read spectra from MS data files is not directly included in spectrum_utils. Instead you can use excellent libraries to read a variety of mass spectrometry data formats such as Pyteomics or pymzML.

import matplotlib.pyplot as plt
import spectrum_utils.plot as sup
import spectrum_utils.spectrum as sus


# Retrieve the spectrum by its USI.
usi = "mzspec:PXD004732:01650b_BC2-TUM_first_pool_53_01_01-3xHCD-1h-R2:scan:41840"
peptide = "WNQLQAFWGTGK"
spectrum = sus.MsmsSpectrum.from_usi(usi)

# Process the spectrum.
fragment_tol_mass, fragment_tol_mode = 10, "ppm"
spectrum = (
    spectrum.set_mz_range(min_mz=100, max_mz=1400)
    .remove_precursor_peak(fragment_tol_mass, fragment_tol_mode)
    .filter_intensity(min_intensity=0.05, max_num_peaks=50)
    .scale_intensity("root")
    .annotate_proforma(
        peptide, fragment_tol_mass, fragment_tol_mode, ion_types="aby"
    )
)

# Plot the spectrum.
fig, ax = plt.subplots(figsize=(12, 6))
sup.spectrum(spectrum, grid=False, ax=ax)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
plt.savefig("quickstart.png", bbox_inches="tight", dpi=300, transparent=True)
plt.close()

As demonstrated, each of the processing steps can be achieved using a single, high-level function call. These calls can be chained together to easily perform multiple processing steps.

Spectrum plotting can similarly be achieved using a high-level function call, resulting in the following figure:

Note that several processing steps modify the peak m/z and intensity values and are thus not idempotent. It is recommended to make a copy of the MsmsSpectrum object prior to any processing if the raw peak values need to remain available as well.