Opened 8 years ago

Last modified 7 years ago

#365 assigned enhancement

Multi-wavelength detector

Reported by: demeler Owned by: gegorbet
Priority: normal Milestone: future
Component: ultrascan3 Version:
Keywords: Cc:

Description

Efforts lead by Johannes Walter have the goal to provide a new data acquisition software for the multiwavelength detector from Helmut Coelfen. This software aims for compatibility with the US3 data format. This ticket will track these development efforts

Attachments (2)

protein_test.setting.mwrs.xml (1.0 KB) - added by jwalter 8 years ago.
test_simulation_55k.zip (6.9 MB) - added by jwalter 8 years ago.

Change History (13)

comment:1 Changed 8 years ago by demeler

  • Type changed from defect to enhancement

The MW detector from Helmut captures data as individual wavelength scans for each radial position. One of the issues we are dealing with is how AUC data are addressed. From an AUC perspective, looking at AUC data only makes sense if the radial domain is available. I realize that Helmut's design will give you wavelength scans at fixed radial positions, and that it would be most convenient to save the data in such a layout. From a programming perspective the best solution may be to save the data in an intermediate binary format and then write another procedure for importing this intermediate format into the UltraScan? program in the radial layout way so it can be immediately used with all other existing routines that expect radial data. Conversion routines that would export the data in one or the other format could also be written easily. Such a scheme would probably work best by buffering each wavelength scan in memory until every radial position has been acquired, and then to dump all buffered wavelength scans to a binary file.

Another issue to consider is the huge data amount. The CCD detector has a very high resolution, I believe 1024 pixels, which translates to about 2 pixels/nanometer. However, the diffraction grating resolution is much worse, so it doesn't make sense to store all of these as individual wavelengths. What would be much better is to average 2-5 wavelengths, maybe even more (please review the diffraction grating bandwidth separation) into one and only store this average. This would improve data quality and would be more realistic in representing the true information content in the data. Also, such an average would have to be calculated *after* the entire scan (or could be done after an entire run) has been recorded.

Just to review:
In the multiwavelength detector, a scan is a three dimensional object, a different wavelength scan for each radial position, with the amplitude being either the absorbance or intensity of the data. If intensity is collected, routines exist in US3 that deal with all the postprocessing. Each scan has a w2t, time, rotorspeed and temperature associated with it. There is also the competing multi-wavelength design from Tom Laue we will integrate into the software which collects data in the radial domain, but different scans for each wavelength. Here, the acquisition is made by a linear diode array along the radial domain which avoids temporal distortion in the data and changes the diffraction grating for each wavelength. One advantage of that design is that the xenon flashlamp has a highly irregular emission intensity along the wavelength domain, and in Laue's design you can stop at one low emitting lambda and flash multiple times and integrate on the linear diode array, which has a much higher dynamic range than the CCD. This will result in better quality data devoid of any temporal distortions. Anyway, the point is, that Laue's design provides radial data streams, not wavelength data streams, and so it doesn't need the extra conversion step. I don't think however the conversion step is any problem, though.

So here is what I suggest:
In the DA software, you should dump all wavelength scans for each radial position into a very simple binary format that reflects the accuracy of the detector. A short integer will give you 65536 values (16 bit accuracy) you can use for the intensity recordings of the data, which will be more than sufficient to capture all the values. For each radial point, you also collect the radius value. At some point in the middle of the scan you also record the starting and ending wavelength, w2, temperature and rotor speed. Somewhere you should also record the wavelength resolution, but that will be fixed and doesn't need to be changed except perhaps in a config file. All those values can be written to a different file - they can be ascii, since it is a trivial amount of data.

At the end of the scan you have 400-900 wavelength scans with 1024 values each (plus the radial position) in a binary format. If you want to cache these values in RAM, you can write them to a single file as a 2D array at the end of each scan as you wait for the radial positioner to reset to the beginning of the next scan. You then collect multiple scans over the course of the experiment, resulting in a number of binary scan files that can then be imported into US3 with the appropriate reading/conversion routine.

At this point, you should only keep track if you are recording intensity or absorbance values. For equilibrium data you would need to also consider the need for absorbance data, although I cannot envision at this point why anyone would want to use the instrument for that purpose. The default measurement method should definitely be intensity mode.

Next, we should import this *binary* data into the US3 format, and then we will also write some import/export functions to re-export the data in legacy Beckman format to either wavelength or radial scans as we have it in US2. A full multiwavelength viewer will need to be written that allows 3D viewing like in US2, and permit you to work with multiwavelength data in US3. The data format already supports multi-wavelength data, although we will need new run types, mwri and mwra for multi-wavelength radial intensity/absorbance scan files. Such run types could then also be re-exported to wavelength absorbance/intensity files.

comment:2 follow-up: Changed 8 years ago by jwalter

Dear US3 developers,

I would propose the following:
A binary data is used for all files to reduce the memory consumption and increase the overall performance of data acquisition as well as transmission.

There will be three to four different file types.

First there will be one file per cell and experiment:

runID.cell.setting.mwl.auc

which contains all information about the experiment:

  • runID matches the directory name
  • cell is the location of the centerpiece; i.e. the hole in the rotor. This is a digit from 1 to 8
  • setting indicates the file type

The file format will use the following structure:

 magic_number     4 bytes. Containing the letters 'UCSE' (UltraCentrifuge Settings)
 description      240 bytes.
 minimum radius   2 bytes. Short Integer. (value * 1000)
 maximum radius   2 bytes. Short Integer. (value * 1000)
 radius delta     2 bytes. Short Integer. (radial step width with value * 1000)
 scan_count       2 bytes. Short Integer. (number of scans)
 avaraged scans   2 bytes. Short Integer. (number of averaged scans at each radial position)
 avaraged WL      2 bytes. Short Integer. (number of averaged WL at each radial position and the already averaged scans)
 WL_count	  2 bytes. Short Integer. (number of wavelength for each scan and position)
 WL array         4 bytes. Integer.

Second there will be two or three files per cell and scan:

scan.cell.type.mwl.auc

which contains all information about the current run:

  • scan is the number of the scan
  • cell is the location of the centerpiece; i.e. the hole in the rotor. This is a digit from 1 to 8.
  • type is the method of data acquisition
    • REFI – file contains the reference intensity (this file is optional depending on if adsorption measurements are desired)
    • SAMI - file contains the sample intensity
    • PROP - file contains the scan properties

The REFI file format will use the following structure:

 2D-array with column containing the radial positions and rows the wavelength. With each value a 4 byte Integer.

At the moment I measure at different radial positions (not the whole reference cell) and avarage these values to one 1D array. Therefore the 2D-array consists of duplicates by now.

The SAMI file format is in accordance with the REFI file.

The PROP file format will use the following structure:

 data_flag      4 bytes. Containing the letters 'DATA'.  For internal consistency checks.
 temperature    2 bytes. Short integer. Temperature in degrees C * 10.
 rpm            2 bytes. Short Integer. 
 seconds        4 bytes. Integer.
 omega2t        8 bytes. Double precision Float.

So far my thoughts about a possible file format for the MWL data.
Of course, I can create some exemplary binary files with my DA software this week, too.

Best regards
Johannes

comment:3 in reply to: ↑ 2 Changed 8 years ago by demeler

Thinking ahead a bit I feel it would be best if in US3 we developed a new module that is derived from the current us_convert program with some expanded capabilities for MW-specific needs. This new module would have the same functionality as the us_convert program, but would also function as the MW data viewer with similar capability as the us2 MW viewer. This would add functions for viewing data in 3D and 2D, viewing movies along the wavelength or radial domain, exporting data back to Beckman legacy format both for radial and wavelength files, and for limiting each dimension before committing the data to the database. I feel also that it would be the best place for doing the wavelength averaging. This means that the original DA acquisitions that Johannes is working on still show the raw data for all CCD pixels without any combining or averaging.

Further to the task at hand - development of the DA file format, I would like to reply to some specific points from Johannes:

First there will be one file per cell and experiment:
runID.cell.setting.mwl.auc

For intensity runs, it is possible to measure from both channels, as long as the absorbance is below 0.5 OD in the reference channel. I think this is also true for the MW. So we should include the channel number. Let's define 'experiment' first, we consider an experiment an entire run, all scans from all cells and all channels, and for all speeds (if speeds were changed during the run). Since right now we are discussing just the raw binary file format for the DA program, let's try to use a format that makes sense for the DA process, and worry about the US3 conversion later. To avoid confusion, I suggest we use the suffix 'mw' for multi-wavelength, and reserve the auc suffix for US3 format. I also think you should write each scan individually to the disk, producing separate files just like the Beckman does. I think this would be more reliable in case of failure during the experiment, since you are check-pointing each scan as soon as it is collected. It would also be best to write each scan's temperature, rotor speed and w2 into the header. We can very easily combine all scans, cells, channels later on in the post-processing stage. If you can collect data for both sample and reference channel in intensity mode, we need to also account for the channel somehow, so I propose to use the following file name format for the initial DA data:

runID.cell.channel.setting.scan_number.mw

I assume with 'setting' you mean intensity or absorbance mode, right? For all other settings I would recommend to create an (ASCII) XML file that keeps track of these things very similarly as we are already doing in US3, more on that later.

For these initial DA data I would not worry about averaging just yet, this may be a user-option later in the post-processing stage like we did in US2. I am also not sure if we can rely on the detector to produce regularly spaced radial data. If there are gaps, we need to keep a radius vector. Your approach seems to suggest that there are no gaps, is that so?

So, as far as I see it, the format would be:

cell

channel

rotor speed

temperature

w^2t

elapsed time (in secs)

number of radial points

if regularly spaced radii:
   radius start
   radius end (or delta-radius)

number of wavelength points

if regularly spaced wavelengths:
   wavelength start
   wavelength end (or delta-wavelength)

if wavelengths are not regular, all the wavelengths are written in sequence from low to high

2D-array of readings in short int format. For each radius positions, all wavelengths are written from wavelength_start to wavelength_end. Radii go from low to high. If radii are not regularly spaced, the radius position is written first.

The other files are probably not needed, or can be generated from the headers of each scan during the post processing stage. We may want to have one global XML file with this info:

Run ID
rotor type (could be added during post-processing)
for each cell
   for each channel
       description of contents
       corresponding binary file name
       centerpiece type (could be added during post-processing)
       cell contents, solution, buffer, analyte (could be added during post-processing)    
       

The data types you propose are fine.

Later on, we will write the postprocessor/viewer, which will be based on the US3 us_convert routine, and the US2 us_viewmwl routine. Does that sound acceptable?

comment:4 Changed 8 years ago by jwalter

Yes Borries this sounds good, especially the implementation and functionalities in US3.

I feel also that it would be the best place for doing the wavelength averaging. This means that the original DA acquisitions that Johannes is working on still show the raw data for all CCD pixels without any combining or averaging.

One advantage of averaging the data directly in the DA software would the reduction of memory. But I agree with you that it would be better to do this in US3 because of the flexibility.

For intensity runs, it is possible to measure from both channels, as long as the absorbance is below 0.5 OD in the reference channel. I think this is also true for the MW. So we should include the channel number.

This function isn’t implemented yet but will be in the future. But I think we should think about saving always the intensity of the sample in channel A and either save the reference intensity in channel B or the intensity of the second sample in channel B. This would give the opportunity to change from absorption to intensity after the experiment.
Maybe we can indicate this as follows:
runID.cell.channel.reference.scan_number.mwrs
runID.cell.channel.sample.scan_number.mwrs

I suggest we use the suffix 'mw' for multi-wavelength

I would suggest to extend this suffix with rs to mwrs to indicate that this is a radial scan, because there may be other measuring scan types in the future, too. With this indication we can keep the flexibility in our format.

So, as far as I see it, the format would be:

For the 2D-array of readings a short int format will not be sufficient because the absolute intensity value Xe flash lamp differ considerably with the wavelength whereas the relative change can be quite small. Therefore I’d like to propose to multiply all intensity data with 100 or 1000 and use a long 4 bytes int format instead.
The radial points are spaced regularly whereas the wavelength is not. Hence a wavelength array is necessary.

We may want to have one global XML file with this info:

A XML for the scan settings is just fine. Which name convention would you propose?
Maybe:
runID.setting.mwrs.xml

Changed 8 years ago by jwalter

comment:5 Changed 8 years ago by jwalter

Hallo everybody, please feel free to take a look at the attached file. Today I have written a VI to create the aforementioned XML data file. I simulated a four hole rotor with four cells whereas a intensity measurement was done in the cells 1 and 3 and a absorption measurement in the cells 2 and 4.

Changed 8 years ago by jwalter

comment:6 Changed 8 years ago by jwalter

I added some exemplary data using this format:

cell                          1 byte. Char. 
channel                       1 byte. Char.
scan_count                    2 bytes. Short Integer.
rotor speed                   2 bytes. Short Integer. Rotor speed in rpm.
temperature                   2 bytes. Short integer. Temperature in degrees C * 10.
omega2t                       4 bytes. Float. Omega2t in 1/s.
elapsed time                  4 bytes. Integer. Time in seconds.
number of radial points       2 bytes. Short Integer.
radius start                  2 bytes. Short Integer. Value in cm * 1000.
radius step                   2 bytes. Short Integer. Value in cm * 1000.
number of wavelength points   2 bytes. Short Integer.
wavelength 1D array           4 bytes * # of wl points . Integer. Value in nm * 1000.
DATA                          4 bytes * # of radial points for the first wl.
                              4 bytes * # of radial points for the second wl.
                              ....

comment:7 follow-up: Changed 7 years ago by jwalter

I have one more question about the endianess of the binary data. I'm using big endian at the moment but I can choose another format too (e.g. little endian). I'd like to define this as soon as possible to have consisty in my data, therefore a brief comment would be great. Thanks!

comment:8 in reply to: ↑ 7 Changed 7 years ago by demeler

Replying to jwalter:

I have one more question about the endianess of the binary data. I'm using big endian at the moment but I can choose another format too (e.g. little endian). I'd like to define this as soon as possible to have consisty in my data, therefore a brief comment would be great. Thanks!

Johannes,
It would be best if you reused existing US3 code to write out the data, and followed the schema we used in UltraScan3 already. You can find the details on how to write the data in the correct format in $ULTRASCAN3/utils/us_dataIO2.[cpp,h]. In particular, please follow the format in:
int US_DataIO2::writeRawData( const QString& file, RawData?& data )

Qt has functions to convert between little and big endian:
http://harmattan-dev.nokia.com/docs/library/html/qt4/qtendian.html
which are used here to assure consistency. If you want your files to be readable in US3 it would be best if you used the already existing function to write your data.

As I said earlier, you may have to save the data in an intermediate format because of the order in which the data are collected (all wavelength for each radial point, which doesn't lend itself well for AUC analysis), it needs to be in all radial points for one wavelength. I would think that each data acquisition computer has enough memory to hold an entire radial/wavelength scan in memory, so perhaps you could skip this step and you can write out the data after the entire scan has been collected. But after you have the entire data from one scan in memory, you can write it out using the writeRawData() function. There is another benefit: Gary recently wrote a function that allows the user to export any US3 data in the legacy format Beckman used, so you could use that as well if anyone really needed it.

Gary, Dan: If you have any other suggestions please share them before Johannes goes too far down this track.

-Borries

comment:9 follow-up: Changed 7 years ago by jwalter

Dear Borries, thanks for your fast reply.

Yes the computer is fast enough to buffer all data of one complete scan. As you can see two posts above I have already found an intermediate format fitting quit well for me. Of course I can try to write my data in labview according to the schema of us3 respectively of the writeRawData function but this won't work straight forward after each scan because the minimum and maximum data values for the whole experiment are required.

Please correct me if I'm wrong!

So I would need to record all scans using my binary file format and then write it to the us3 format. By the way it's not exactly clear to me how I can get the mwl data into the us3 format without creating one file per wavelength. Moreover I thought that there would an reading/conversion routine to import the data into US3 including wavelength averaging for example. Will this be possible if I save the data in the us3 format?

comment:10 in reply to: ↑ 9 Changed 7 years ago by demeler

Replying to jwalter:

Dear Borries, thanks for your fast reply.

Yes the computer is fast enough to buffer all data of one complete scan. As you can see two posts above I have already found an intermediate format fitting quit well for me. Of course I can try to write my data in labview according to the schema of us3 respectively of the writeRawData function but this won't work straight forward after each scan because the minimum and maximum data values for the whole experiment are required.

Please correct me if I'm wrong!

Sorry, Johannes, I must still be half asleep - of course you are correct.
You will need all scans to calculate the min/max values, so your approach of storing the entire run first in an intermediate format is perfectly reasonable. And I agree to make this format binary to speed up I/O.

So I would need to record all scans using my binary file format and then write it to the us3 format. By the way it's not exactly clear to me how I can get the mwl data into the us3 format without creating one file per wavelength. Moreover I thought that there would an reading/conversion routine to import the data into US3 including wavelength averaging for example. Will this be possible if I save the data in the us3 format?

Either way we will still have to write the conversion routine to import MWL data into UltraScan3, but I think you could write this yourself with the functions I mentioned above. The wavelength averaging will be handled in this import function. We should integrate this function into the existing us_convert GUI which has 95%+ of the required functionality already.

Given that Qt has the necessary endian conversion functions, I guess it would not matter how you store your intermediate format, as long as you use the US3 functions to write them back out. And yes, you are correct, each (averaged) wavelength would need its own radial scan written separately. We call this entity a "triple" (cell/channel/wavelength). A MWL experiment by definition would have multiple triples for each cell/channel/wavelength triple, but other than that it would be identical to the current setup. In a way, it is already possible in Beckman's software to acquire multiple wavelengths for each cell/channel (up to three), the only problem is that in practice this never works satisfactorily since the stepping motor does a poor job of resetting to the correct wavelength after each scan.

-Borries

comment:11 Changed 7 years ago by gegorbet

  • Owner changed from johannes.walter to gegorbet
  • Status changed from new to assigned
Note: See TracTickets for help on using tickets.