Python API

spectrum_utils.spectrum module

class spectrum_utils.spectrum.MsmsSpectrum(identifier: str, precursor_mz: float, precursor_charge: int, mz: numpy.ndarray | Iterable, intensity: numpy.ndarray | Iterable, retention_time: float = numpy.nan)[source]

Bases: object

Class representing a tandem mass spectrum.

__init__(identifier: str, precursor_mz: float, precursor_charge: int, mz: numpy.ndarray | Iterable, intensity: numpy.ndarray | Iterable, retention_time: float = numpy.nan) → None[source]

Instantiate a new MsmsSpectrum consisting of fragment peaks.

Parameters:

identifier (str) – Spectrum identifier. It is recommended to use a unique and interpretable identifier, such as a Universal Spectrum Identifier (USI) as defined by the Proteomics Standards Initiative.
precursor_mz (float) – Precursor ion m/z.
precursor_charge (int) – Precursor ion charge.
mz (array_like) – M/z values of the fragment peaks.
intensity (array_like) – Intensities of the corresponding fragment peaks in mz.
retention_time (float, optional) – Retention time at which the spectrum was acquired (the default is np.nan, which indicates that retention time is unspecified/unknown).

classmethod from_usi(usi: str, backend: str = 'aggregator', timeout: float | None = None, **kwargs) → MsmsSpectrum[source]

Construct a spectrum from a public resource specified by its Universal Spectrum Identifier (USI).

See the Universal Spectrum Identifier (USI) specification by the Proteomics Standards Initiative for more information on how to use USIs to refer to spectra.

Besides official USIs, the USI extension to refer to MS/MS spectra from various metabolomics resources is also supported. This behavior can be explicitly requested by specifying the “gnps” backend. See the corresponding preprint for more information.

See the Pyteomics documentation for details on how to use specific PROXI backends.

Parameters:

usi (str) – The USI from which to generate the spectrum.
backend (str) – PROXI host backend (default: ‘aggregator’).
timeout (Optional[float], optional) – Timeout in seconds for network requests (default: None, uses pyteomics default).
kwargs – Extra arguments to construct the spectrum that might be missing from the PROXI response (e.g. precursor_mz or precursor_charge) and the PROXI backend (see the pyteomics.usi.proxi documentation).

Returns:

The spectrum corresponding to the given USI

Return type:

MsmsSpectrum

Raises:

ValueError – If the PROXI response does not contain information about the precursor m/z or precursor charge. Explicit precursor m/z and precursor charge values can be provided using the precursor_mz and precursor_charge keyword arguments respectively.

property identifier: str

property precursor_mz: float

property precursor_charge: int

property retention_time: float

property mz: numpy.ndarray

The m/z values of the fragment peaks.

Returns:: The m/z values of the fragment peaks.
Return type:: np.ndarray

property intensity: numpy.ndarray

TGet or set the intensities of the fragment peaks.

When setting new intensity values it should be possible to convert the given values to a NumPy array and the number of intensity values should be equal to the number of m/z (and annotation) values.

Returns:: The intensity values of the fragment peaks.
Return type:: np.ndarray

property annotation: numpy.ndarray | None

The annotated labels of the fragment peaks.

Returns:: The labels of the fragment peaks, or None if the spectrum has not been annotated.
Return type:: Optional[np.ndarray]

round(decimals: int = 0, combine: str = 'sum') → MsmsSpectrum[source]

Round the m/z values of the fragment peaks to the given number of decimals.

Intensities of peaks that have the same m/z value after rounding will be combined using the specified strategy.

Note: This peak manipulation will reset any ProForma annotations that were applied previously.

Parameters:

decimals (int, optional) – Number of decimal places to round the mz to (default: 0). If decimals is negative, it specifies the number of positions to the left of the decimal point.
combine ({'sum', 'max'}) – Method used to combine intensities from merged fragment peaks. sum specifies that the intensities of the merged fragment peaks will be summed, max specifies that the maximum intensity of the fragment peaks that are merged is used (the default is sum).

Return type:

MsmsSpectrum

set_mz_range(min_mz: float | None = None, max_mz: float | None = None) → MsmsSpectrum[source]

Restrict the m/z values of the fragment peaks to the given range.

Note: This peak manipulation will reset any ProForma annotations that were applied previously.

Parameters:

min_mz (Optional[float], optional) – Minimum m/z (inclusive). If not set no minimal m/z restriction will occur.
max_mz (Optional[float], optional) – Maximum m/z (inclusive). If not set no maximal m/z restriction will occur.

Return type:

MsmsSpectrum

remove_precursor_peak(fragment_tol_mass: float, fragment_tol_mode: str, isotope: int = 0) → MsmsSpectrum[source]

Remove fragment peak(s) close to the precursor m/z.

Note: This peak manipulation will reset any ProForma annotations that were applied previously.

Parameters:

fragment_tol_mass (float) – Fragment mass tolerance around the precursor mass to remove the precursor peak.
fragment_tol_mode ({'Da', 'ppm'}) – Fragment mass tolerance unit. Either ‘Da’ or ‘ppm’.
isotope (int) – The number of precursor isotopic peaks to be checked (the default is 0 to check only the mono-isotopic peaks).

Return type:

MsmsSpectrum

filter_intensity(min_intensity: float = 0.0, max_num_peaks: int | None = None) → MsmsSpectrum[source]

Remove low-intensity fragment peaks.

Only the max_num_peaks most intense fragment peaks are retained. Additionally, noise peaks whose intensity is below min_intensity percentage of the intensity of the most intense peak are removed.

Note: This peak manipulation will reset any ProForma annotations that were applied previously.

Parameters:

min_intensity (float, optional) – Remove peaks whose intensity is below min_intensity percentage of the intensity of the most intense peak (the default is 0, which means that no minimum intensity filter will be applied).
max_num_peaks (Optional[int], optional) – Only retain the max_num_peaks most intense peaks (the default is None, which retains all peaks).

Return type:

MsmsSpectrum

scale_intensity(scaling: str | None = None, max_intensity: float | None = None, *, degree: int = 2, base: int = 2, max_rank: int | None = None) → MsmsSpectrum[source]

Scale the intensity of the fragment peaks.

Two types of scaling can be performed: scaling all peaks using a specific transformation and scaling the peaks relative to the most intense peak.

Parameters:

scaling ({'root', 'log', 'rank'}, optional) –
Method to scale the peak intensities (the default is None, which means that no transformation will be performed). Potential transformation options are:
- ’root’: Root-transform the peak intensities. The default is a square root transformation (degree is 2). The degree of the root can be specified using the degree kwarg.
- ’log’: Log-transform the peak intensities. The default is a log2 transformation (base is 2) after summing the intensities with 1 to avoid negative values after the transformation. The base of the logarithm can be specified using the base kwarg.
- ’rank’: Rank-transform the peak intensities. The maximum rank of the most intense peak can be specified using the max_rank kwarg, by default the number of peaks in the spectrum is used as the maximum rank.
degree (int) – The degree of the root transformation (the default is 2 for square root scaling).
base (int) – The base of the log transformation (the default is 2 for log2 scaling).
max_rank (Optional[int]) – The maximum rank of the rank transformation (the default is None, which uses the number of peaks in the spectrum as maximum rank). max_rank should be greater than or equal to the number of peaks in the spectrum.
max_intensity (Optional[float], optional) – Intensity of the most intense peak relative to which the peaks will be scaled (the default is None, which means that no scaling relative to the most intense peak will be performed).

Return type:

MsmsSpectrum

_annotate_proteoforms(proteoforms: list[Proteoform], proforma_str: str, fragment_tol_mass: float, fragment_tol_mode: str, ion_types: str = 'by', *, max_isotope: int = 0, max_ion_charge: int | None = None, neutral_losses: bool | Dict[str | None, float] = False) → MsmsSpectrum[source]

Assign fragment ion labels to the peaks from a ProForma annotation.

This is meant to be an internal function that uses a pre-parsed ProForma string instead of parsing it internally. This can be useful when the same parsed sequence is used multiple times (since parsing the sequence is a lot slower than annotating the peaks).

>>> proforma_sequence = "MYPEPTIDEK/2"
>>> spectrum.annotate_proforma(proforma_sequence, ...)

or

>>> parsed_proforma = proforma.parse(proforma_sequence)
>>> spectrum._annotate_proteoforms(parsed_proforma, proforma_sequence, ...)

Warns:

This function does not check that the passed sequence
corresponds to the passed proteoforms.

For additional information on the arguments, see the MsmsSpectrum.annotate_proforma documentation.

annotate_proforma(proforma_str: str, fragment_tol_mass: float, fragment_tol_mode: str, ion_types: str = 'by', *, max_isotope: int = 0, max_ion_charge: int | None = None, neutral_losses: bool | Dict[str | None, float] = False) → MsmsSpectrum[source]

Assign fragment ion labels to the peaks from a ProForma annotation.

Parameters:

proforma_str (str) – The ProForma spectrum annotation.
fragment_tol_mass (float) – Fragment mass tolerance to match spectrum peaks against theoretical peaks.
fragment_tol_mode ({'Da', 'ppm'}) – Fragment mass tolerance unit. Either ‘Da’ or ‘ppm’.
ion_types (str, optional) – The ion types to generate. Can be any combination of ‘a’, ‘b’, ‘c’, ‘x’, ‘y’, and ‘z’ for peptide fragments, ‘I’ for immonium ions, ‘m’ for internal fragment ions, ‘p’ for the precursor ion, and ‘r’ for reporter ions. The default is ‘by’, which means that b and y peptide ions will be generated.
max_isotope (int) – The maximum isotope to consider for the fragment ions (the default is 0 to consider only monoisotopic peaks).
max_ion_charge (Optional[int], optional) – All fragments up to and including the given charge will be annotated (by default all fragments with a charge up to the precursor minus one (minimum charge one) will be annotated).
neutral_losses (Union[bool, Dict[Optional[str], float]], optional) –
Neutral losses to consider for each peak. If None or False, no neutral losses are considered. If specified as a dictionary, keys should be the molecular formulas of the neutral losses and values the corresponding mass differences. Note that mass differences should typically be negative. If True, all of the following neutral losses are considered:
- Loss of hydrogen (H): -1.007825.
- Loss of ammonia (NH3): -17.026549.
- Loss of water (H2O): -18.010565.
- Loss of carbon monoxide (CO): -27.994915.
- Loss of carbon dioxide (CO2): -43.989829.
- Loss of formamide (HCONH2): -45.021464.
- Loss of formic acid (HCOOH): -46.005479.
- Loss of methanesulfenic acid (CH4OS): -63.998301.
- Loss of sulfur trioxide (SO3): -79.956818.
- Loss of metaphosphoric acid (HPO3): -79.966331.
- Loss of mercaptoacetamide (C2H5NOS): -91.009195.
- Loss of mercaptoacetic acid (C2H4O2S): -91.993211.
- Loss of phosphoric acid (H3PO4): -97.976896.

Return type:

MsmsSpectrum

spectrum_utils.proforma module

class spectrum_utils.proforma.LookupType(value)

Bases: Enum

An enumeration.

ACCESSION = 1

NAME = 2

MASS = 3

class spectrum_utils.proforma.LabelType(value)

Bases: Enum

An enumeration.

XL = 1

BRANCH = 2

GENERAL = 3

class spectrum_utils.proforma.Ion(ion: str)[source]

Bases: object

ion: str

__init__(ion: str) → None

class spectrum_utils.proforma.Charge(charge: int, ions: List[spectrum_utils.proforma.Ion] | None = None)[source]

Bases: object

charge: int

ions: List[Ion] | None = None

__init__(charge: int, ions: List[Ion] | None = None) → None

class spectrum_utils.proforma.ModificationSource[source]

Bases: ABC

abstract get_mass() → float[source]

Get the mass of the modification from its source.

Returns:: The modification mass.
Return type:: float

_abc_impl = <_abc._abc_data object>

class spectrum_utils.proforma.CvEntry(controlled_vocabulary: str | None = None, accession: str | None = None, name: str | None = None)[source]

Bases: ModificationSource

__init__(controlled_vocabulary: str | None = None, accession: str | None = None, name: str | None = None)[source]

property controlled_vocabulary: str

property accession: str

property name: str

get_mass() → float[source]

Get the mass of the modification, as defined by its CV term.

Returns:: The modification mass.
Return type:: float

_read_from_cv(cv: str, accession: str | None = None, name: str | None = None)[source]

Read the term in its controlled vocabulary entry.

Parameters:

cv (str) – The controlled vocabulary identifier.
accession (Optional[str]) – The accession to query the CV. Set as None to query by name,
name (Optional[str]) – The name to query the CV. Set as None to query by accession,

Raises:

KeyError – If the term was not found in its controlled vocabulary.
URLError – If the controlled vocabulary could not be retrieved from its online resource.
ValueError –
- If an unknown controlled vocabulary identifier is specified. - If no mass was specified for a GNO term or its parent terms.
SyntaxError – If the controlled vocabulary OBO file cannot be parsed (e.g., due to format issues in the upstream vocabulary file).

_abc_impl = <_abc._abc_data object>

class spectrum_utils.proforma.Mass(mass: float, controlled_vocabulary: str | None = None)[source]

Bases: ModificationSource

mass: float

controlled_vocabulary: str | None = None

get_mass() → float[source]

Get the mass of the modification.

Returns:: The modification mass.
Return type:: float

__init__(mass: float, controlled_vocabulary: str | None = None) → None

_abc_impl = <_abc._abc_data object>

class spectrum_utils.proforma.Formula(formula: str | None = None, isotopes: List[str] | None = None)[source]

Bases: ModificationSource

formula: str | None = None

isotopes: List[str] | None = None

_mass: float = None

get_mass() → float[source]

Get the mass of the modification, computed from its molecular formula.

Returns:: The modification mass.
Return type:: float
Raises:: NotImplementedError – Mass calculation using a molecular formula that contains isotopes (not supported by Pyteomics mass calculation).

__init__(formula: str | None = None, isotopes: List[str] | None = None) → None

_abc_impl = <_abc._abc_data object>

class spectrum_utils.proforma.Monosaccharide(monosaccharide: str, count: int)[source]

Bases: object

monosaccharide: str

count: int

__init__(monosaccharide: str, count: int) → None

class spectrum_utils.proforma.Glycan(composition: List[spectrum_utils.proforma.Monosaccharide])[source]

Bases: ModificationSource

composition: List[Monosaccharide]

_mass: float = None

get_mass() → float[source]

Get the mass of the modification, computed from its monosaccharide composition.

Returns:: The modification mass.
Return type:: float
Raises:: URLError – If the monosaccharide definitions could not be retrieved from its online resource.

__init__(composition: List[Monosaccharide]) → None

_abc_impl = <_abc._abc_data object>

class spectrum_utils.proforma.Info(message: str)[source]

Bases: object

message: str

__init__(message: str) → None

class spectrum_utils.proforma.Label(type: spectrum_utils.proforma.LabelType, label: str, score: float | None = None)[source]

Bases: object

type: LabelType

label: str

score: float | None = None

__init__(type: LabelType, label: str, score: float | None = None) → None

Bases: object

__init__(mass: float | None = None, position: int | Tuple[int, int] | str | None = None, source: List[ModificationSource] | None = None, label: Label | None = None)[source]

position: int | Tuple[int, int] | str

source: List[ModificationSource] = None

label: Label | None = None

_mass: Field(name=None,type=None,default=None,default_factory=<dataclasses._MISSING_TYPE object at 0x7764283e98a0>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=<dataclasses._MISSING_TYPE object at 0x7764283e98a0>,_field_type=None)

property mass: float | None

class spectrum_utils.proforma.Proteoform(sequence: str, modifications: List[spectrum_utils.proforma.Modification] | None = None, charge: spectrum_utils.proforma.Charge | None = None)[source]

Bases: object

sequence: str

modifications: List[Modification] | None = None

charge: Charge | None = None

__init__(sequence: str, modifications: List[Modification] | None = None, charge: Charge | None = None) → None

class spectrum_utils.proforma.ProFormaTransformer(*args: Any, **kwargs: Any)[source]

Bases: Transformer

__init__()[source]

_sequence: List[str]

_modifications: List[Modification]

_global_modifications: Dict[str, List[Modification]]

_range_pos: List[int]

_preferred_labels: alias of Set[str]

proforma(tree) → List[Proteoform][source]

proteoform(tree) → Proteoform[source]

peptide(_) → None[source]

aa(tree) → None[source]

AA(token) → str[source]

mod_global(mods) → None[source]

ISOTOPE(token) → str[source]

mod_unknown_pos(mods) → None[source]

mod(mod_annotations) → Modification[source]

mod_labile(mod_annotations) → None[source]

MOD_COUNT(token) → int[source]

mod_n_term(mods) → None[source]

mod_c_term(mods) → None[source]

mod_range(tree) → None[source]

mod_range_pos(_) → Tuple[int, int][source]

MOD_RANGE_L(_) → None[source]

mod_name(tree) → CvEntry[source]

CV_ABBREV_OPT(token) → str[source]

CV_ABBREV(token) → str[source]

mod_accession(tree) → CvEntry[source]

CV_NAME(token) → str[source]

mod_mass(tree) → Mass[source]

MOD_MASS_OBS(_) → str[source]

MOD_MASS(token) → float[source]

mod_formula(tree) → Formula[source]

FORMULA(token) → str[source]

mod_glycan(tree) → Glycan[source]

monosaccharide(tree) → Monosaccharide[source]

info(tree) → Info[source]

mod_label(tree) → Label[source]

MOD_LABEL_XL(token) → Tuple[LabelType, str][source]

MOD_LABEL_BRANCH(token) → Tuple[LabelType, str][source]

MOD_LABEL(token) → Tuple[LabelType, str][source]

MOD_SCORE(token) → float[source]

charge(tree) → Charge[source]

ion(tree) → List[Ion][source]

TEXT(token) → str[source]

spectrum_utils.proforma._modification_sort_key(mod: Modification)[source]

Key to sort modifications.

The sort order is: 1. Unknown modifications. 2. Labile modifications. 3. C-terminal modifications. 4. Modifications on specific amino acids and modification ranges based on the start of the range. 5. N-terminal modifications.

spectrum_utils.proforma._build_parser() → lark.Lark[source]

Build a lark parser for proforma sequences.

This function also caches the parser in-memory, thus loading it only once per process.

spectrum_utils.proforma.parse(proforma: str) → List[Proteoform][source]

Parse a ProForma-encoded string.

The string is parsed by building an abstract syntax tree to interpret the ProForma language. Next, modifications are localized to residue positions and translated into Python objects.

Parameters:

proforma (str) – The ProForma string.

Returns:

A list of proteoforms with their modifications (and additional) information specified by the ProForma string.

Return type:

List[Proteoform]

Raises:

URLError – If a controlled vocabulary could not be retrieved from its online resource.
KeyError – If a term was not found in its controlled vocabulary.
NotImplementedError – Mass calculation using a molecular formula that contains isotopes (not supported by Pyteomics mass calculation).
ValueError – If no mass was specified for a GNO term or its parent terms.

spectrum_utils.proforma._import_cv(cv_id: str, cache: str | None) → Tuple[Dict[str, Tuple[float, str]], Dict[str, Tuple[float, str]]] | Dict[str, float][source]

Import a ProForma controlled vocabulary from its online resource.

Parameters:

cv_id (str) – The controlled vocabulary identifier.
cache (Optional[str]) – Directory used to cache downloaded controlled vocabularies, or None to disable caching.

Returns:

Union[ – Tuple[Dict[str, Tuple[float, str]], Dict[str, Tuple[float, str]]], Dict[str, float],
] –
- For modification controlled vocabularies: A tuple with mappings (i) from term accession to modification mass and term name, and (ii) from term name to modification mass and term accession.
- For the monosaccharide controlled vocabulary: A dictionary with as keys the monosaccharide strings and as values the corresponding masses.

Raises:

URLError – If the controlled vocabulary could not be retrieved from its online resource.
ValueError –
- If an unknown controlled vocabulary identifier is specified. - If no mass was specified for a GNO term or its parent terms.

spectrum_utils.proforma._parse_obo(obo_fh: BinaryIO, cv_id: str) → Tuple[Dict[str, Tuple[float, str]], Dict[str, Tuple[float, str]]][source]

Parse an OBO controlled vocabulary.

Supported OBO files are: UNIMOD, MOD (including RESID), XLMOD, GNO.

Parameters:

obo_fh (BinaryIO) – The OBO file handle.
cv_id (str) – The controlled vocabulary identifier.

Returns:

A tuple with mappings (i) from term accession to modification mass and term name, and (ii) from term name to modification mass and term accession.

Return type:

Tuple[Dict[str, Tuple[float, str]], Dict[str, Tuple[float, str]]]

Raises:

ValueError – If no mass was specified for a GNO term or its parent terms.
SyntaxError – If the OBO file cannot be parsed (e.g., due to format issues in the upstream controlled vocabulary file).

spectrum_utils.proforma._store_in_cache(cache: str | None, filename: str, obj: Any) → None[source]

Store an object in the cache.

Parameters:

cache (Optional[str]) – Directory where the cached objects are stored, or None to disable caching.
filename (str) – Filename of the stored cache object.
obj (Any) – Object to store in the cache.

spectrum_utils.proforma._load_from_cache(cache: str | None, filename: str) → Any | None[source]

Load an object from the cache.

Parameters:

cache (Optional[str]) – Directory where the cached objects are stored, or None to disable caching.
filename (str) – Filename of the stored cache object.

Returns:

The object retrieved from the cache, or None if not found.

Return type:

Any

spectrum_utils.proforma.clear_cache(resource_ids: Sequence[str] | None = None) → None[source]

Clear the downloaded resources from the cache.

Parameters:: resource_ids (Optional[Sequence[str]]) – Identifiers of the resources to remove from the cache. If None, all known files will be removed.

spectrum_utils.fragment_annotation module

class spectrum_utils.fragment_annotation.FragmentAnnotation(ion_type: str, neutral_loss: str | None = None, isotope: int = 0, charge: int | None = None, adduct: str | None = None, analyte_number: int | None = None, mz_delta: Tuple[float, str] | None = None)[source]

Bases: object

Individual fragment ion annotation.

This fragment annotation format is derived from the PSI peak interpretation specification: https://docs.google.com/document/d/1yEUNG4Ump6vnbMDs4iV4s3XISflmOkRAyqUuutcCG2w/edit?usp=sharing

Fragment notations have the following format:

(analyte_number)[ion_type](neutral_loss)(isotope)(charge)(adduct)(mz_delta)

Examples:

“y4-H2O+2i^2[M+H+Na]” : Fragment annotation for a y4 ion, with a water neutral loss, the second isotopic peak, charge 2, adduct [M+H+Na].

Parameters:

ion_type (str) –
Specifies the basic type of ion being described. Possible prefixes are:
- ”?”: unknown ion
- ”a”, “b”, “c”, “x”, “y”, “z”: corresponding peptide fragments
- ”I”: immonium ion
- ”m”: internal fragment ion
- ”_”: named compound
- ”p”: precursor ion
- ”r”: reporter ion (isobaric label)
- ”f”: chemical formula
neutral_loss (Optional[str]) – A string of neutral loss(es), described by their molecular formula. The default is no neutral loss. Note that the neutral loss string must include the sign (typically “-” for a neutral loss).
isotope (int) – The isotope number above or below the monoisotope. The default is the monoisotopic peak (0).
charge (Optional[int]) – The charge of the fragment. The default is an unknown charge (only valid for unknown ions).
adduct (Optional[str]) – The adduct that ionized the fragment. The default is a hydrogen adduct matching the charge ([M+xH]).
mz_delta (Optional[Tuple[float, str]]) – The m/z delta representing the observed m/z minus the theoretical m/z and its unit (“Da” or “ppm”).

property mz_delta: Tuple[float, str] | None

property charge: int | None

class spectrum_utils.fragment_annotation.PeakInterpretation[source]

Bases: object

_unknown = ?

__init__()[source]: Fragment annotation(s) to interpret a specific peak.

spectrum_utils.fragment_annotation.get_theoretical_fragments(proteoform: Proteoform, ion_types: str = 'by', *, max_isotope: int = 0, max_charge: int = 1, neutral_losses: Dict[str | None, float] | None = None) → List[Tuple[FragmentAnnotation, float]][source]

Get fragment annotations with their theoretical masses for the given sequence.

Parameters:

proteoform (proforma.Proteoform) – The proteoform for which the fragment annotations will be generated.
ion_types (str) – The ion types to generate. Can be any combination of ‘a’, ‘b’, ‘c’, ‘x’, ‘y’, and ‘z’ for peptide fragments, ‘I’ for immonium ions, ‘m’ for internal fragment ions, ‘p’ for the precursor ion, and ‘r’ for reporter ions. The default is ‘by’, which means that b and y peptide ions will be generated.
max_isotope (int) – The maximum isotope to consider (the default is 0 to only generate the monoisotopic peaks).
max_charge (int) – All fragments up to and including the given charge will be generated (the default is 1 to only generate singly-charged fragments).
neutral_losses (Optional[Dict[Optional[str], float]]) – A dictionary with neutral loss names and (negative) mass differences to be considered.

Returns:

All possible fragment annotations and their theoretical m/z in ascending m/z order.

Return type:

List[Tuple[FragmentAnnotation, float]]

spectrum_utils.plot module

spectrum_utils.plot._format_ax(ax: matplotlib.pyplot.Axes, grid: bool | str)[source]: Set ax formatting options that are common to all plot types.

spectrum_utils.plot._get_xlim(spec: MsmsSpectrum) → Tuple[float, float][source]: Get plot x-axis limits for a given spectrum.

spectrum_utils.plot._annotate_ion(mz: float, intensity: float, annotation: FragmentAnnotation | None, color_ions: bool, annot_fmt: Callable | None, annot_kws: Dict[str, object], ax: matplotlib.pyplot.Axes) → Tuple[str, int][source]

Annotate a specific fragment peak.

Parameters:

mz (float) – The peak’s m/z value (position of the annotation on the x axis).
intensity (float) – The peak’s intensity (position of the annotation on the y axis).
annotation (Optional[fa.FragmentAnnotation]) – The annotation that will be plotted.
color_ions (bool) – Flag whether to color the peak annotation or not.
annot_fmt (Optional[Callable]) – Function to format the peak annotations. See FragmentAnnotation for supported elements. By default, only canonical b and y peptide fragments are annotated. If None, no peaks are annotated.
annot_kws (Dict[str, object]) – Keyword arguments for ax.text to customize peak annotations.
ax (plt.Axes) – Axes instance on which to plot the annotation.

Returns:

A tuple of the annotation’s color as a hex string and the annotation’s zorder.

Return type:

Tuple[str, int]

spectrum_utils.plot.annotate_ion_type(annotation: FragmentAnnotation, ion_types: Iterable[str]) → str[source]

Convert a FragmentAnnotation to a string for annotating peaks in a spectrum plot.

This function will only annotate singly-charged, mono-isotopic canonical peaks with the given ion type(s).

Parameters:

annotation (fa.FragmentAnnotation) – The peak’s fragment annotation.
ion_types (Iterable[str]) – Accepted ion types to annotate.

Returns:

The peak’s annotation string.

Return type:

str

spectrum_utils.plot.spectrum(spec: ~spectrum_utils.spectrum.MsmsSpectrum, *, color_ions: bool = True, annot_fmt: ~typing.Callable | None = functools.partial(<function annotate_ion_type>, ion_types='by'), annot_kws: ~typing.Dict | None = None, mirror_intensity: bool = False, grid: bool | str = True, ax: matplotlib.pyplot.Axes | None = None) → matplotlib.pyplot.Axes[source]

Plot an MS/MS spectrum.

Parameters:

spec (MsmsSpectrum) – The spectrum to be plotted.
color_ions (bool, optional) – Flag indicating whether or not to color annotated fragment ions. The default is True.
annot_fmt (Optional[Callable]) – Function to format the peak annotations. See FragmentAnnotation for supported elements. By default, only canonical b and y peptide fragments are annotated. If None, no peaks are annotated.
annot_kws (Optional[Dict], optional) – Keyword arguments for ax.text to customize peak annotations.
mirror_intensity (bool, optional) – Flag indicating whether to flip the intensity axis or not.
grid (Union[bool, str], optional) – Draw grid lines or not. Either a boolean to enable/disable both major and minor grid lines or ‘major’/’minor’ to enable major or minor grid lines respectively.
ax (Optional[plt.Axes], optional) – Axes instance on which to plot the spectrum. If None the current Axes instance is used.

Returns:

The matplotlib Axes instance on which the spectrum is plotted.

Return type:

plt.Axes

spectrum_utils.plot.mass_errors(spec: MsmsSpectrum, *, unit: str | None = None, plot_unknown: bool = True, color_ions: bool = True, grid: bool | str = True, ax: matplotlib.pyplot.Axes | None = None) → matplotlib.pyplot.Axes[source]

Plot mass error bubble plot for a given spectrum.

A mass error bubble plot shows the error between observed and theoretical mass (y-axis) in function of the m/z (x-axis) for each peak in the spectrum. The size of the bubble is proportional to the intensity of the peak.

Parameters:

spec (MsmsSpectrum) – The spectrum with mass errors to be plotted.
unit (str, optional) – The unit of the mass errors, either ‘ppm’, ‘Da’, or None. If None, the unit that was used for spectrum annotation is used. The default is None.
plot_unknown (bool, optional) – Flag indicating whether or not to plot mass errors for unknown peaks.
color_ions (bool, optional) – Flag indicating whether or not to color dots for annotated fragment ions. The default is True.
grid (Union[bool, str], optional) – Draw grid lines or not. Either a boolean to enable/disable both major and minor grid lines or ‘major’/’minor’ to enable major or minor grid lines respectively.
ax (Optional[plt.Axes], optional) – Axes instance on which to plot the mass errors. If None the current Axes instance is used.

Returns:

The matplotlib Axes instance on which the mass errors are plotted.

Return type:

plt.Axes

Notes

The mass error bubble plot was first introduced in Barsnes et al. (2010).

References

Barsnes, H., Eidhammer, I. and Martens, L. FragmentationAnalyzer: An open-source tool to analyze MS/MS fragmentation data. PROTEOMICS 10, 1087–1090 (2010). doi:10.1002/pmic.200900681

spectrum_utils.plot.mirror(spec_top: MsmsSpectrum, spec_bottom: MsmsSpectrum, spectrum_kws: Dict | None = None, ax: matplotlib.pyplot.Axes | None = None) → matplotlib.pyplot.Axes[source]

Mirror plot two MS/MS spectra.

Parameters:

spec_top (MsmsSpectrum) – The spectrum to be plotted on the top.
spec_bottom (MsmsSpectrum) – The spectrum to be plotted on the bottom.
spectrum_kws (Optional[Dict], optional) – Keyword arguments for plot.spectrum.
ax (Optional[plt.Axes], optional) – Axes instance on which to plot the spectrum. If None the current Axes instance is used.

Returns:

The matplotlib Axes instance on which the spectra are plotted.

Return type:

plt.Axes

spectrum_utils.plot.facet(spec_top: MsmsSpectrum, spec_mass_errors: MsmsSpectrum | None = None, spec_bottom: MsmsSpectrum | None = None, spectrum_kws: Mapping[str, Any] | None = None, mass_errors_kws: Mapping[str, Any] | None = None, height: float | None = None, width: float | None = None) → matplotlib.pyplot.Figure[source]

Plot a spectrum, and optionally mass errors, and a mirror spectrum.

Parameters:

spec_top (MsmsSpectrum) – The spectrum to be plotted on the top.
spec_mass_errors (Optional[MsmsSpectrum], optional) – The spectrum for which mass errors are to be plotted in the middle.
spec_bottom (Optional[MsmsSpectrum], optional) – The spectrum to be plotted on the bottom.
spectrum_kws (Optional[Mapping[str, Any]], optional) – Keyword arguments for plot.spectrum for the top and bottom spectra.
mass_errors_kws (Optional[Mapping[str, Any]], optional) – Keyword arguments for plot.mass_errors.
height (Optional[float], optional) – The height of the figure in inches.
width (Optional[float], optional) – The width of the figure in inches.

Returns:

The matplotlib Figure instance on which the spectra and mass errors are plotted.

Return type:

plt.Figure

spectrum_utils.iplot module

spectrum_utils.iplot.spectrum(spec: ~spectrum_utils.spectrum.MsmsSpectrum, *_, color_ions: bool = True, annot_fmt: ~typing.Callable | None = functools.partial(<function annotate_ion_type>, ion_types='by'), annot_kws: ~typing.Dict | None = None, mirror_intensity: bool = False, grid: bool = True) → LayerChart[source]

Plot an MS/MS spectrum.

Parameters:

spec (MsmsSpectrum) – The spectrum to be plotted.
color_ions (bool, optional) – Flag indicating whether or not to color annotated fragment ions. The default is True.
annot_fmt (Optional[Callable]) – Function to format the peak annotations. See FragmentAnnotation for supported elements. By default, only canonical b and y peptide fragments are annotated. If None, no peaks are annotated.
annot_kws (Optional[Dict], optional) – Keyword arguments for altair.Chart.mark_text to customize peak annotations.
mirror_intensity (bool, optional) – Flag indicating whether to flip the intensity axis or not.
grid (bool, optional) – Draw grid lines or not.

Returns:

The Altair chart instance with the plotted spectrum.

Return type:

altair.LayerChart

spectrum_utils.iplot.mirror(spec_top: MsmsSpectrum, spec_bottom: MsmsSpectrum, spectrum_kws: Dict | None = None, *_) → LayerChart[source]

Mirror plot two MS/MS spectra.

Parameters:

spec_top (MsmsSpectrum) – The spectrum to be plotted on the top.
spec_bottom (MsmsSpectrum) – The spectrum to be plotted on the bottom.
spectrum_kws (Optional[Dict], optional) – Keyword arguments for iplot.spectrum.
*_ – Ignored, for consistency with the plot.mirror API.

Returns:

The Altair chart instance with the plotted spectrum.

Return type:

altair.LayerChart

spectrum_utils.utils module

spectrum_utils.utils.mass_diff(mz1, mz2, mode_is_da)

Calculate the mass difference(s).

Parameters:

mz1 – First m/z value(s).
mz2 – Second m/z value(s).
mode_is_da (bool) – Mass difference in Dalton (True) or in ppm (False).

Return type:

The mass difference(s) between the given m/z values.

spectrum_utils.utils.da_to_ppm(delta_mz: int | numpy.ndarray, mz: int | numpy.ndarray) → int | numpy.ndarray[source]

Convert a mass difference in Dalton to ppm.

Parameters:

delta_mz (int or np.ndarray) – Mass difference in Dalton.
mz (int or np.ndarray) – m/z value of peak.

Return type:

int or np.ndarray

spectrum_utils.utils.ppm_to_da(delta_mz: int | numpy.ndarray, mz: int | numpy.ndarray) → int | numpy.ndarray[source]

Convert a mass difference in ppm to Dalton.

Parameters:

delta_mz (int or np.ndarray) – Mass difference in ppm.
mz (int or np.ndarray) – m/z value of peak.

Return type:

int or np.ndarray