Artefacts in the Mass Spectra Output from MALDI-TOF and MALDI-TOF/TOF Machines

Werner Van Belle¹^* - werner@yellowcouch.org, werner.van.belle@gmail.com
Olav Mjaavatten²

1- Bioinformatics Group Norut IT; Research Park; 9294 Tromsø; Norway
2- Proteomic Unit (Probe) University of Bergen ; Bergen; Norway
* Corresponding author

Abstract : MALDI-TOF mass spectrometry is a well known and widely used technique to fingerprint and sequence proteins. A carefull investigation of the mass spectra output from unnamed machines shows a number of artefacts produced by the machines themselves. Because these artefacts complicate a number of procedures we present a number of preliminary techniques we developed to get rid of most of the artefacts.

Keywords: matrix assisten laser desorption ionisation MALDI time of flight TOF artefacts noise
Reference: Werner Van Belle, Olav Mjaavatten; Artefacts in the Mass Spectra Output from MALDI-TOF and MALDI-TOF/TOF Machines; Proceeding of the VIIth International Symposium of the Protein Society section proteomics, interactomics and protein networks; April 2005

Table Of Contents

Introduction
Artefacts
    Tones in Reflection Mode
    Decaying tones in Lift
    Lineair Mode

Removing Artefacts
    Baseline Extraction
    Mass Spectrum Denoising
    Data Enhancing in High Noise Lineair Mode
    Automatic Detection of important peaks
Summary

Introduction

In MALDI-TOF (Matrix assisted laser desorption ionization) a sample is mixed with a matrix. When this mixture dries it forms crystals. When such a crystallized mixture is targeted with a high energy laser beam with the correct wavelength, the matrix itself will suddenly absorb the incoming energy and heat up. This rapid heating causes sublimation of the matrix and subsequent expansion of the molecules co-crystallized within the matrix. The ions are then accelerated using a strong electrical field and thus separated based on their $\frac{m}{z}$ ratio. The ions can then be detected at the end of the tube, or reflected and then be detected. This (optional) reflection phase increases the accuracy of the technique substantially.

In a typical proteomics setup a mass spectrogram is taken, the peaks are selected and then used to fingerprint proteins. Some machines offer the possibility to use an advanced lift system which makes it possible to measure the mass of the (poly)peptides within a larger fragment of a specific weight. This makes sequencing of proteins possible.

Artefacts

We performed a number of measurements on different mass spectrometers. Surprisingly, the output from these machines contains a number of artifacts, which were also present on machines located at other sites, such as the Flemish Biotechnology Center and freely published online spectra.

We believe that these artifacts complicate a number of possible uses of those machines

Some of the artifacts might actually shift peaks a little backward or forward over the $\frac{m}{z}$ axis.
The artifacts make it difficult to automatically select smaller peaks. E.g; when the noise level is close to the peak level, a human expert is able to select these, however a computer is unable to do so as long as the data is noise filled. This might be important for fingerprinting multi protein complexes.
Some of the artifacts have signal levels which might even exceed the actual signal.

Below we present the artifacts we found. The investigation of the spectral output of the machine is based on a sliding windows Fourier transform. When a data series is converted to the frequency domain we can see which frequency is present at which time and with how much strength. E.g; the right side of figure

has 2 axis. The X-axis is the $\frac{m}{z}$ axis. The Y-axis is the frequency axis. On the top of this axis we find high frequencies, on the bottom we find low frequencies. Every

position in this diagram has a color. White means that frequency

is not present at time

. Yellow means that some signal is present and red to dark red indicates a very strong presence of the given frequency. Typically, a particle hitting a detector will give rise to a vertical line in the frequency diagram.

Tones in Reflection Mode

Artefacts in a typical mass spectrometry using the reflection mode.

The first experiment concerns the typical fingerprinting of a protein. In this experiment the reflection mode was turned on. The mass spectrum output consist of 158548 samples between 100.003 and 4019.170 Da. The window size of the SFFT is 2048 samples, which forms a good compromise between frequency-accuracy and position accuracy. In all the figures we present, both the m/z axis and the energy axis have been normalized. The frequency analysis has also been normalized and is shown in dB.

This experiment (figure ) clearly shows

3 static tones superimposed over the signal (these are the three horizontal lines), as well as
3 linear upward sweeping tones (the three slightly upward slanted lines).
a burst of noise shortly after the deflection mode of the machine.

The tones are very likely not created by ions hitting the detector because this would mean that the ions are released at a steady frequency, independent of their size. Since the laser desorption results in a sublimation burst, such a steady periodic phenomenon is highly unexpected. On the other hand, the noise burst after the deflection phase is what we would expect, nevertheless it still makes finding peaks more difficult.

Decaying tones in Lift

Noise of the Lift Spectrum

In a second experiment we measured the lift of a peak using a MALDI-TOF/TOF machine. The mixture contained a protein-fragment which was to be sequenced. The output from the machine ranges from 20.067 till 1264.626, in 67873 samples. Again, the m/z, energy and frequency content are all three normalized. The frequency analysis (figure ) shows

2/3 static tones at a low frequency
2/3 decaying tones which start at a high frequency and decay exponentially.

These artifacts are clearly different from the ones previously encountered. Also, in this experiment the signal/noise level is quite high. Even so that an expert is needed to select the correct peaks for further analysis.

Lineair Mode


One shot	10 shots

Noise in lineair mode


100 shots	1000 shots

Noise in lineair mode

In a third experiment we measured the pure noise output of a MALDI machine in linear mode. The output shown in figures and covers 110296 samples between 40 kDa and 80kDa. During the experiment, the laser was switched off, as such we measure only the noise generated by the machine. The artifacts we now observed were even more interesting then the previous ones.

White noise, as could have been expected. Please note here that this is with the laser turned off and as such it does not say anything about the signal/noise ratio.
A probabilistic distribution of pulses. To clarify this further: the upper part of figure is the noise of 1 shot. The lower part of the same figure is after 10 shots. The upper part of figure if the noise after 100 shots and its lower part is after 1000 shots. 1 shot gives rise to a strong pulse at a certain position. However, as observed in the other measurements, the location of this pulse is dependent on the actual shot. So, depending on the number of times we shoot we get different noise fingerprints.

Clearly this probabilistic pulse train forms a big problem because it is highly dependent on the number of shots performed. As such it can a) easily be misinterpreted as a valid peak if there are few shots performed or b) will overrun the actual measurement when too much shots are performed.

Removing Artefacts

To investigate the feasibility to obtain more data out of the spectra, we created a number of denoising and enhancing techniques which we briefly present below.

Baseline Extraction

Baseline removal

The first step is to remove the energy overhead in the measurements. This is done by removing the baseline of the spectrum using a specific filter technique. The result is shown in figure .

Mass Spectrum Denoising

Denoised sample

In order to denoise the data we first tried the creation of a number of digital notch filters. Because we don't want to shift the peaks back or forth in time, such a filter was required to have a zero-phase response over its entire spectrum. Also the impulse response of the filter needed to be as small as possible because we did not want to broaden the peaks, nor introduce unwelcome echos. A number of small experiments indicates that the results of such a filter would not be so very good. It became also clear that the chirp could not easily be removed by such a time independent filter. Therefore we created another technique of which you see the result in figure . A local closeup of the denoised data (figure ) shows how the peaks are located at the same places, but now allow for fully automatic detection (certainly if you look at the SFFT of the data), which makes its very attractive in high throughput proteomics.

Up: zoomed in sample output from the machine. Bottom: the same data denoised

The accuracy of the algorithm we created is extremely high. It will retain position information exact. However the resolution of lower peaks will be a little bit less than the higher peaks. This however should not form a problem because these peaks are still well differentiated. As can be seen in the previous pictures, accuracies far below 0.1 dalton can be achieved for smaller peaks.

Data Enhancing in High Noise Lineair Mode

Another experiment we performed was data enhancement of a linear mode mass spectrum. The mass spectrum we present is the output from a sample containing the cell-lysaat of Hela-cells. Clearly it is a relatively bad sample to put into MALDI heavy mass linear mode. Not only are these heavy masses difficult to get suspended, but also because the noise level might suffocate what we actually want to measure. Figure shows how data enhancing helps in filtering out the noise.

The result of the algorithm on a standard protein mixture is shown in figure . Important here is that certain peaks which would normally not be selected if we simply look at the highest value now show up. Whether some of these new peaks are important might be interesting to investigate.

Upper figure is the SFFT of a measurement of a Hela cell substrate. The figure below is the SFFT of the same substrate after data enhancing (but without removal of the pulse-train).

Lineair mode data enhancing of the output of a ProtMix II. Bottom is the actual output. Top is the enhanced output.

Automatic Detection of important peaks

Correlation Measure to further select peaks

Technique to detect important peaks in enhanced lineair mode signal

A phenomenon often used to detect important peaks is the fact that isotopes will weigh different. For every ionized similar fragment we will sometimes measure x Dalton, sometimes we might measure x+1 Dalton (if there is one neutron more), and so on. This knowledge can be used to automatically detect important peaks as shown in figure . The visualized graph is the autocorrelation graph which mainly measures whether a peak has 'echos'. If it has echos, then it probably is a series of peaks of the same fragment.

In a similar way, if we measure the autocorrelation of the enhanced linear mode experiment, then we clearly see vertical bands. Very likely the content of every band will allow us to detect which bands are important. However, this is merely an educated guess.

Summary

We have presented a number of artifacts we have encountered in MALDI TOF and MALDI TOF/TOF machines. These are

Static tones
Up-sweeping tones
Decaying tones
Probabilistic pulse trains

We also presented the output of some preliminary techniques we developed to show the feasibility of data denoising

Denoising of the static tones and up-sweeping tones, without shifting the peaks back or forth.
Enhancing of heavy mass linear mode spectra.
Automatic importance assessment when looking at multiple peaks.

http://werner.yellowcouch.org/
werner@yellowcouch.org