iFit: iFiles import sub-library


  1. iLoad the main file importer
    1. Distant files (http://, ...)
    2. Compressed files (.zip, ...)
    3. Internal anchor references (file#anchor)
  2. Supported data formats (default)
    1. Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)
    2. More specific data formats (neutrons and X-ray)
    3. Unsupported data formats: how to import them anyway ?
  3. The import filters: customizing data formats (iLoad_ini)
    1. Reading and Writing the iLoad configuration
    2. iLoad configuration structure
    3. In short: procedure to add a new data format


Commands we use in this page: iLoad

The iFiles sub-library takes care to import a large set of file formats, and return them as a structure. This structure can then be converted into an iData object.

iLoad the main file importer

When a new file is imported into an iData object by means of e.g.
>> a = iData([ ifitpath 'Data/ILL_IN6.dat' ]);
>> a = load(iData, [ ifitpath 'Data/ILL_IN6.dat' ]);
the request for file access is sent transparently to the iLoad function. This latter performs the following operations:
  1. resolves the file name, and identifies directory importation, wildcards (*, ?), distant files with URLs (file://, ftp://, http://, https://), compressed files (zip, gzip, tar, compress) ;
  2. selects a list of possible file types from e.g. file extension ;
  3. searches for patterns in text headers to better determine file type ;
  4. requests importation by all possible file types one after the other until one succeeds ;
  5. formats the file content so that it is returned as a structure containing standard fields ;
  6. executes remaining post-process scripts.
The list of known file formats which can be imported is defined in the iLoad_ini file (see below), which is a list of format definitions with extension, patterns to search for, and post-process filters to run after importation. The order of these filters is important as they are parsed one after the other. A lower level of known formats is then added to the iLoad_ini preferences, with more standard file formats (mainly those that Matlab knows de-facto).

A file selector, which supports multiple file selection, may be used by issuing
>> a = iLoad('')
The result of all this mechanics is that importation is fully transparent. The file type is determined automatically, imported and formatted accordingly. The resulting structure is then ready to be converted into an iData object.
>> a = iLoad(filename); % this is a structure
>> a = iData(a); % convert to iData object
In the case where the automatic importation fails, it is possible to manually force one format specification to be used
>> a = iLoad(filename, 'HDF5'); % explicitly use HDF5 loader

Distant files (http://, ...)

The specified file names may include URL tags such as
were the two latter cases first get a copy of the distant file (requires local write permission), and then import it. A valid Internet connection is then required, with proper Proxy settings if needed.

Compressed files (.zip, ...)

Compressed files (ZIP, TAR, GZIP, Z) can also be imported directly, in which case they are first extracted locally (requires write permission), and then imported. This extraction mechanism also applied for distant file.

Internal anchor references (file#anchor)

The file name may end with an anchor reference using the '#' character, such as in

'http://path/file.zip#Data'

In this case, the anchor is searched in the imported data, and only the corresponding matching elements are returned.

Supported data formats (default)

The iFiles sub-library comes with a large number of predefined import filters. These are tested one after the other, based on an initial filter selection using the file name extension and optionally pattern (wods) recognition in file headers.
A compact list of all supported formats is shown with
>> a = iLoad('formats');
The filters have been divided into two categories: those that directly use formats natively known by Matlab, and those that are more specific to neutron and X-ray communities (or other research areas).

Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)

format (load)
Text/Binary
Description
iData/saveas
Write
text

text
Any text file, using a free format, can be read. Any text editor (gedit, notepad, nedit, ...) can display such files.
Yes (DAT, M-file)
'cdf','nc'
NetCDF 1 and 2
binary
The NetCDF format is a compact binary format. Such files can be edited/viewed with e.g ncview or hdfview, ncBrowse, OpenDX. Such files can of course also be imported into iData objects. Yes
'fits'
FITS astronomical image format
binary
The FITS format is a standard data format used in astronomy. Can be displayed with e.g. GIMP, xv.

'hdf4'
HDF 4
image

binary
The HDF4 format is a compact, binary storage format. Such files can be edited/viewed with e.g hdfview. IDL and Matlab also have dedicated HDF 4 browsers (see hdftool). Also includes HDF-EOS format. Yes (image)
'hdf5'
HDF 5
binary
The HDF5 format is a compact, compressed binary storage format. However, in its use here, it partly reconstructs the initial object, with its main values and alias/axes names. Such files can be edited/viewed with e.g hdfview, OpenDX. Such files can of course also be imported into iData objects. IDL and Matlab also have dedicated HDF browsers (see hdftool). This format includes the NeXus format. Yes
'mat'
Matlab
Mat-file



binary
The Matlab workspace binary file is compact and fast to read/write. It carries the whole object information. Such files require Matlab (or Octave) to be installed prior to importation. Yes
'xls'
Excel spreadsheet
binary
Microsoft Excel spreadsheet. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric) Yes
'gif' 'bmp' 'png' 'tiff' 'jpeg' 'ico'
binary
Standard image formats. View with e.g. GIMP, xv. Yes
'csv'
Matlab
comma separated values

text
A file spreadsheet format. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric) Yes
'fig' Matlab figure binary
The Matlab figure can also be opened with openfig(file), and then converted to iData with iData(gcf).
Yes
'xml'
XML description file
text
The XML format (experimental)

'wk1'
Lotus1-2-3 (first spreadsheet)
text/binary
Lotus 123 spreadsheet format. A replacement to Microsoft Excel, from IBM.

'au'
NeXT/SUN (.au) sound
binary
Sound format, initially from Sun/NeXT

'wav'
Microsoft WAVE sound
binary
Sound format, standard

'avi'
Audio/Video Interleaved
multimedia container
binary
A video encoding format

More specific data formats (neutrons and X-ray)

format (load)
Text/Binary
Description
iData/saveas
Write
ILL data

text
Files generated by ILL instruments. May require post-processing to assign Signal and axes right, as well as metadata. This data can also be imported with Lamp.
PDB file
text
Protein Data Bank file describing e.g. proteins

'spc'
SPEC

text
The ESRF legacy format. May be slow to import due to the file format complexity for large files.
'sim'
McStas/PGPLOT

text
The legacy format generated by McStas.
Similar to the DAT export format
'inx'
INX

text
The INX format is a simple format for reduced neutron time-of-flight data (see example). This data can also be imported/generated with Lamp.
'edf'
EDF ESRF Data format
binary with 512-multiple length text header The EDF format is mainly used at the ESRF and can be viewed with e.g. PyMCA, Zimg, GnuPlot, EDFExplorer, Fit2D. Yes
'spe'
ISIS SPE
text
The SPE data format is obtained after processing ISIS RAW files with Horace and LibISIS.
'nx','nxs','n4','n5'
binary
The NeXus files are HDF4/5 files. See above format description for more information. ,This format includes all types of derivatives (such as NX SPE from ISIS).

Unsupported data formats: how to import them anyway ?

In principle, any text based data format will be imported, whatever be the internal organization of the data and comments. The text is read and parsed by the looktxt tool, and all numerical blocks are named automatically from the character strings/comments that precede these blocks. The result is a single structure which holds a Data and a Header field, plus additional identification information. The structure can be converted transparently to an iData object. However, in some cases, the Signal and Axes definitions will require to be set after creating the object.

Binary formats require more work. In principle, if iLoad does not know the structure of the file, there is no way to import an unknown data format file. However, iLoad allows a rapid prototyping of new import filters by mean of a simple routine that should take as input a file name, and return a structure or numerical block. Additionally, a post-process script can be attached to the new format, and executed when the file content is converted into an iData object. The specifications of the new data format will then be inserted inside the iLoad_ini configuration (locally or globally) together with e.g. file extension and other identification info. These steps are detailed in the section below.

The import filters: customizing data formats (iLoad_ini)

Reading and Writing the iLoad configuration

The full list of supported file formats supported by iLoad and iData can be obtained from command
>> iLoad('formats');	          % display the list of supported formats
>> config = iLoad('load config'); % retrieve the iLoad configuration and file loaders (from cache)
>> iLoad('force load config'); % force re-read of the configuration file
The configuration file iLoad_ini is stored by default in the
[ ifitpath 'iFiles' filesep 'iLoad_ini' ]
and a local copy (which overrides the default) is stored in
prefdir
when executing
>> iLoad('save config');
>> iLoad('save config', config);
% Saved iLoad configuration into /home/farhi/.matlab/R2007a/iLoad.ini <- this is prefdir
This is where you may add your own customized format definitions. Deleting this file will revert to the default configuration.
>> delete([ prefdir filesep 'iLoad_ini.m' ]);
Last, the config.UseSystemDialogs field of the iLoad configuration can be set to 'yes' to use the native Matlab/Java file selector, or to 'no' to use uigetfiles.

iLoad configuration structure

Each format entry is a structure with members:
Let's details one of the iLoad_ini entry for INX files.

format14.name       ='INX tof data';
format14.extension ='inx';
format14.patterns ={'INX'};
format14.method ='looktxt';
format14.options ='--headers --fortran --catenate --fast --binary --comment=NULL';
format14.postprocess='load_ill_inx';


(...)

config.loaders = { format1, format2, format3, format4, format5, format6, ...
format7, format8, format9, format10, format11, format12, format13, format14, format15 };

Each format is a structure.
The name field gives an explicit description of the format (human readable). The extension 'inx' indicates which string should match the file extension so that this format is selected for automatic importation. An other way to select the importer is by mean of pattern search in the file header content (by default the first 10 kb of text). The list of one or more patterns is given in the patterns field (as a cellstr, here {'INX'}). The method field is the name of the function to read the file. It takes as input a string, the file name, and should preferably return a structure or numerical array. In this example, the looktxt method is used with syntax
struct = looktxt(filename)
The options field provides additional options that should be sent to the method. When given as a single string, these options are appended to the filename before being sent to the method, e.g.
struct = looktxt([ filename options ])
When given as a cell, they are passed as additional input arguments to the method, e.g.
struct=method(filename, options{1}, options{2}, ...)
The structure obtained from the method is then converted into an iData object when called from the iData or iData/load method. In these cases, the postprocess field is used so that the returned iData object is
object = postprocess( iData( method(filename, options...) ) )
The post-process is a script that takes as input an iData object, and returns a possibly modified object (or an array of objects). This is where Signal, Axes, Aliases and metadata are re-arranged in the object. The post-process scripts are store in the iFiles/postprocess directory, and use a 'load_<format>' function naming convention for clarity. When missing, the default Signal and axes definitions will be used.

The last step required for storing a new data format is to add the format definition to the config.loaders field in the iLoad_ini file. Once done, you may re-read the configuration file to update the list of known format
>> iLoad('force load config'); % force to re-read the configuration files

In short: procedure to add a new data format

  1. Start by getting the documentation about the new data format you wish to implement. This gives the format name and the extension.
  2. Then write a function (locally or in ifitpath/iFiles) that takes as input a filename and returns the file content as a structure. This is where you will open the file, access its guts, and return something (or empty/error if this fails). This is your method.
  3. Edit the iLoad_ini file from prefdir or ifitpath/iFiles.
  4. Create a new structure (which you may name as you wish) and set the name (description), method, and extension fields.
  5. Optionally define some patterns to search, to complement the extension.
  6. Optionally write a post-process scrip and preferably place it in the ifitpath/iFiles/postprocess.
  7. Add the new structure to the config.loaders list (cell) at the end of the iLoad_ini file.
  8. Save the file, and optionally re-read it with iLoad('force load config').
  9. Send me [farhi (at) ill.fr] the method/postprocess and structure information so that I can contribute your work to the package.


E. Farhi - iFit/iFiles - $Date: 2011-10-05 14:19:34 $ $Revision: 1.16 $ - back to Main iFit Page ILL, Grenoble, France <www.ill.eu>