iFit: iFiles import sub-library
- iLoad the main file importer
- Distant files (http://, ...)
- Compressed files (.zip, ...)
- Internal anchor references (file#anchor)
- Supported data formats (default)
- Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)
- More specific data formats (neutrons and X-ray)
- Unsupported data formats: how to import them anyway ?
- The import filters: customizing data formats (iLoad_ini)
- Reading and Writing the iLoad configuration
- iLoad configuration structure
- In short: procedure to add a new data format
Commands we use in this page: iLoad
The iFiles sub-library takes care to import a large set of file
formats, and return them as a structure. This structure can then be
converted into an iData object.
iLoad: the main file importer
When a new file is imported into an iData object by means of e.g.
>> a = iData([ ifitpath 'Data/ILL_IN6.dat' ]);
>> a = load(iData, [ ifitpath 'Data/ILL_IN6.dat' ]);
the request for file access is sent transparently to the iLoad function. This latter performs the following operations:
- resolves the file name, and identifies directory importation,
wildcards (*, ?), distant files with URLs (file://, ftp://, http://, https://), compressed
files (zip, gzip, tar, compress) ;
- selects a list of possible file types from e.g. file extension ;
- searches for patterns in text headers to better determine file type ;
- requests importation by all possible file types one after the other until one succeeds ;
- formats the file content so that it is returned as a structure containing standard fields ;
- executes remaining post-process scripts.
The list of known file formats which can be imported is defined in the iLoad_ini
file (see below), which is a list of format definitions with extension,
patterns to search for, and post-process filters to run after
importation. The order of these filters is important as they are parsed
one after the other. A lower level of known formats is then added to
the iLoad_ini preferences, with more standard file formats (mainly
those that Matlab knows de-facto).
A file selector,
which supports multiple file selection, may be used by
issuing
>> a = iLoad('')
The result of all this mechanics is that importation is fully
transparent. The file type is determined automatically, imported and
formatted accordingly. The resulting structure is then ready to be
converted into an iData object.
>> a = iLoad(filename); % this is a structure
>> a = iData(a); % convert to iData object
In the case where the automatic importation fails, it is possible to manually force one format specification to be used
>> a = iLoad(filename, 'HDF5'); % explicitly use HDF5 loader
Distant files (http://, ...)
The specified file names may include URL tags such as
- file://filename
- ftp://filename
- http://filename
- https://filename
were the two latter cases first get a copy of the distant file
(requires local write permission), and then import it. A valid Internet
connection is then required, with proper Proxy settings if needed.
Compressed files (.zip, ...)
Compressed files (ZIP, TAR, GZIP, Z) can also be imported directly, in
which case they are first extracted locally (requires write
permission), and then imported. This extraction mechanism also applied for
distant file.
Internal anchor references (file#anchor)
The file name may end with an anchor reference using the '#' character, such as in
'http://path/file.zip#Data'
In this case, the anchor is searched in the imported data, and only the corresponding matching elements are returned.
Supported data formats (default)
The iFiles sub-library comes with a large number of predefined import
filters. These are tested one after the other, based on an initial
filter selection using the file name extension and optionally pattern
(wods) recognition in file headers.
A compact list of all supported formats is shown with
>> a = iLoad('formats');
The filters have been divided into two categories: those that directly
use formats natively known by Matlab, and those that are more specific
to neutron and X-ray communities (or other research areas).
A set of example data files is available from the Load page and the Data directory.
Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)
format (load)
|
Text/Binary
|
Description
|
iData/saveas
Write
|
text
|
text
|
Any text file, using a free format, can be read. Any text editor (gedit, notepad, nedit, ...) can display such files.
|
Yes (DAT, M-file)
|
'cdf','nc'
NetCDF 1 and 2
|
binary
|
The NetCDF format is
a compact binary format. Such files can be
edited/viewed with e.g ncview
or hdfview, ncBrowse, OpenDX. Such files can of course
also be imported into iData objects.
|
Yes
|
'fits'
FITS astronomical image format
|
binary
|
The FITS format is a standard data format used in astronomy. Can be displayed with e.g. GIMP, xv.
|
|
'hdf4'
HDF 4
image
|
binary
|
The HDF4 format is a compact, binary storage format. Such files can be
edited/viewed with e.g hdfview. IDL and Matlab also have dedicated HDF 4 browsers (see hdftool). Also includes HDF-EOS format. |
Yes (image)
|
'hdf5'
HDF 5
|
binary
|
The HDF5
format is a compact,
compressed binary storage format. However, in its use here, it partly
reconstructs the initial object, with its main values and alias/axes
names. Such files can be edited/viewed with e.g hdfview, OpenDX. Such
files can of course also be imported into iData
objects. IDL and Matlab also have dedicated HDF browsers (see hdftool). This format includes the NeXus format. |
Yes
|
'mat'
Matlab
Mat-file
|
binary
|
The
Matlab
workspace
binary
file is compact and fast to read/write. It carries the whole object
information. Such files require Matlab (or Octave) to be installed prior to
importation. |
Yes
|
'xls'
Excel spreadsheet |
binary
|
Microsoft Excel spreadsheet. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric) |
Yes
|
'gif' 'bmp' 'png' 'tiff' 'jpeg' 'ico'
|
binary
|
Standard image formats. View with e.g. GIMP, xv. |
Yes
|
'csv'
Matlab
comma separated values
|
text
|
A file spreadsheet format. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric) |
Yes
|
'fig' Matlab figure |
binary
|
The Matlab figure can also be opened with openfig(file), and then converted to iData with iData(gcf).
|
Yes
|
'xml'
XML description file
|
text
|
The XML format (experimental)
|
|
'wk1'
Lotus1-2-3 (first spreadsheet)
|
text/binary
|
Lotus 123 spreadsheet format. A replacement to Microsoft Excel, from IBM.
|
|
'au'
NeXT/SUN (.au) sound
|
binary
|
Sound format, initially from Sun/NeXT
|
|
'wav'
Microsoft WAVE sound
|
binary
|
Sound format, standard
|
|
'avi'
Audio/Video Interleaved
multimedia container
|
binary
|
A video encoding format
|
|
More specific data formats (neutrons and X-ray)
format (load)
|
Text/Binary
|
Description
|
iData/saveas
Write
|
ILL data
|
text
|
Files generated by ILL
instruments. May require post-processing to assign Signal and axes
right, as well as metadata. This data can also be imported with Lamp.
|
|
PDB file
|
text
|
Protein Data Bank file describing e.g. proteins
|
|
'spc'
SPEC
|
text
|
The ESRF legacy format. May be slow to import due to the file format complexity for large files. |
|
'sim'
McStas/PGPLOT
|
text
|
The legacy format generated by McStas. Support for 1D, 2D and event lists.
|
Similar to the DAT export format
|
'inx'
INX
|
text
|
The INX format is a simple format for reduced neutron time-of-flight data (see example). This data can also be imported/generated with Lamp. |
|
'edf'
EDF ESRF Data format
|
binary with 512-multiple length
text header
|
The EDF format is mainly used at
the ESRF and can be viewed with e.g. PyMCA, Zimg, GnuPlot, EDFExplorer, Fit2D. |
Yes
|
'spe'
ISIS SPE
|
text
|
The SPE data format is obtained after processing ISIS RAW files with Horace and LibISIS. |
|
'nx','nxs','n4','n5'
|
binary
|
The NeXus
files are HDF4/5 files. See above format description for more
information. ,This format includes all types of derivatives (such as NX
SPE from ISIS). |
|
'cbf'
ESRF/SLS binary
|
binary with 4096-multiple length text header
|
The Crystallographic Binary File format, used on some X-ray and neutron diffractometers. See the format definition. This format gathers CIF and imgCIF standards.
|
|
'hdr'+'img'
MRI 3D volume
|
binary
|
A
MRI volume data format. The 'hdr' file requires an ssociated 'img'
file. Format from the Biomedical Imaging Resource of the Mayo Clinic. |
|
Unsupported data formats: how to import them anyway ?
In principle, any text based data format will be imported, whatever be
the internal organization of the data and comments. The text is read
and parsed by the looktxt
tool, and all numerical blocks are named automatically from the
character strings/comments that precede these blocks. The result is a
single structure which holds a Data and a Header field, plus additional
identification information. The structure can be converted transparently to an iData object. However, in some cases, the Signal and Axes definitions will require to be set after creating the object.
Binary formats require more work. In principle, if iLoad does not know
the structure of the file, there is no way to import an unknown data
format file. However, iLoad allows a rapid prototyping of new import
filters by mean of a simple routine that should take as input a file
name, and return a structure or numerical block. Additionally, a
post-process script can be attached to the new format, and executed
when the file content is converted into an iData object. The
specifications of the new data format will then be inserted inside the
iLoad_ini configuration (locally or globally) together with e.g. file
extension and other identification info. These steps are detailed in
the section below.
The class2str function can more generally write any Matlab variable as a character string. And the str2struct function can read a character string and search for [name{=: }value] pairs in single lines, to set a return structure.
The import filters: customizing data formats (iLoad_ini)
Reading and Writing the iLoad configuration
The full list of supported file formats supported by iLoad and iData can be obtained from command
>> iLoad('formats'); % display the list of supported formats
>> config = iLoad('load config'); % retrieve the iLoad configuration and file loaders (from cache)
>> iLoad('force load config'); % force re-read of the configuration file
The configuration file iLoad_ini is stored by default in the
[ ifitpath 'iFiles' filesep 'iLoad_ini' ]
and a local copy (which overrides the default) is stored in
prefdir
when executing
>> iLoad('save config');
>> iLoad('save config', config);
% Saved iLoad configuration into /home/farhi/.matlab/R2007a/iLoad.ini <- this is prefdir
This is where you may add your own customized format definitions. Deleting this file will revert to the default configuration.
>> delete([ prefdir filesep 'iLoad_ini.m' ]);
Last, the config.UseSystemDialogs field of the iLoad configuration can be set to 'yes' to use the native Matlab/Java file selector, or to 'no' to use uigetfiles.
iLoad configuration structure
Each format entry is a structure with members:
- loader.name: a description of the format
- loader.method: the function used to read the file and return a structure or numerical array: data=method(filename)
- loader.extension: a single file extension, or a cell string of file extensions
- loader.patterns: a cell string of patterns to search in text headers (less than 10% of binary data in the first 10 kb) [optional]
- loader.options:
additional options to be passed to the loader.method. A single string
is catenated after the file name, a cell array is sent as additional
arguments to the method. [optional]
- loader.postprocess: a script which handles iData commands to shape the final object, id=loader.postprocess(id) [optional]
Let's details one of the iLoad_ini entry for INX files.
format14.name ='INX tof data'; format14.extension ='inx'; format14.patterns ={'INX'}; format14.method ='looktxt'; format14.options ='--headers --fortran --catenate --fast --binary --comment=NULL'; format14.postprocess='load_ill_inx';
(...)
config.loaders = { format1, format2, format3, format4, format5, format6, ... format7, format8, format9, format10, format11, format12, format13, format14, format15 };
|
Each format is a structure.
The name field gives an explicit description of the format (human readable). The extension 'inx'
indicates which string should match the file extension so that this
format is selected for automatic importation. An other way to select
the importer is by mean of pattern search in the file header content
(by default the first 10 kb of text). The list of one or more patterns
is given in the patterns field (as a cellstr, here {'INX'}). The method
field is the name of the function to read the file. It takes as input a
string, the file name, and should preferably return a structure or
numerical array. In this example, the looktxt
method is used with syntax
struct = looktxt(filename)
The options field provides
additional options that should be sent to the method. When given as a
single string, these options are appended to the filename before being
sent to the method, e.g.
struct = looktxt([ filename options ])
When given as a cell, they are passed as additional input arguments to the method, e.g.
struct=method(filename, options{1}, options{2}, ...)
The structure obtained from the method is then converted into an iData object when called from the iData or iData/load method. In these cases, the postprocess field is used so that the returned iData object is
object = postprocess( iData( method(filename, options...) ) )
The post-process is a script that takes as input an iData object, and
returns a possibly modified object (or an array of objects). This is
where Signal, Axes, Aliases and metadata are re-arranged in the object.
The post-process scripts are store in the iFiles/postprocess directory, and use a 'load_<format>' function naming convention for clarity. When missing, the default Signal and axes definitions will be used.
The last step required for storing a new data format is to add the format definition to the config.loaders field in the iLoad_ini file. Once done, you may re-read the configuration file to update the list of known format
>> iLoad('force load config'); % force to re-read the configuration files
In short: procedure to add a new data format
- Start by getting the documentation about the new data format you wish to implement. This gives the format name and the extension.
- Then write a function (locally or in ifitpath/iFiles)
that takes as input a filename and returns the file content as a
structure. This is where you will open the file, access its guts, and
return something (or empty/error if this fails). This is your method.
- Edit the iLoad_ini file from prefdir or ifitpath/iFiles.
- Create a new structure (which you may name as you wish) and set the name (description), method, and extension fields.
- Optionally define some patterns to search, to complement the extension.
- Optionally write a post-process script and preferably place it in the ifitpath/iFiles/postprocess.
- Add the new structure to the config.loaders list (cell) at the end of the iLoad_ini file.
- Save the file, and optionally re-read it with iLoad('force load config').
- Send me [farhi (at) ill.fr] the method/postprocess and structure information so that I can contribute your work to the package.
E.
Farhi - iFit/iFiles -
$Date: 2012-02-14 04:24:33 $ $Revision: 1.21 $
- back to
Main
iFit
Page
