iFit: Loaders import sub-library


  1. iLoad the main file importer
    1. Distant files (http://, ...)
    2. Compressed files (.zip, ...)
    3. Internal anchor references (file#anchor)
  2. Supported data formats (default)
    1. Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)
    2. More specific data formats (neutrons and X-ray)
    3. Unsupported data formats: how to import them anyway ?
  3. The import filters: customizing data formats (iLoad_ini)
    1. Reading and Writing the iLoad configuration
    2. iLoad configuration structure
    3. In short: procedure to add a new data format


Commands we use in this page: iLoad(filename)
See also: Load
(import) data sets, Save (export) data sets and Save Models

The Loaders sub-library takes care to import a large set of file formats, and return them as a structure. This structure can then be converted into an iData object.

iLoad: the main file importer

When a new file is imported into an iData object by means of e.g.
>> a = iData([ ifitpath 'Data/ILL_IN6.dat' ]);
>> a = load(iData, [ ifitpath 'Data/ILL_IN6.dat' ]);
the request for file access is sent transparently to the iLoad function. This latter performs the following operations:
  1. resolves the file name, and identifies directory importation, wildcards (*, ?), distant files with URLs (file://, ftp://, http://, https://), compressed files (zip, gzip, tar, compress) ;
  2. selects a list of possible file types from e.g. file extension ;
  3. searches for patterns in text headers to better determine file type ;
  4. requests importation by all possible file types one after the other until one succeeds ;
  5. formats the file content so that it is returned as a structure containing standard fields ;
  6. executes remaining post-process scripts.
The list of known file formats which can be imported is defined in the iLoad_ini file (see below), which is a list of format definitions with extension, patterns to search for, and post-process filters to run after importation. The order of these filters is important as they are parsed one after the other. A lower level of known formats is then added to the iLoad_ini preferences, with more standard file formats (mainly those that Matlab knows de-facto).

A file selector, which supports multiple file selection, may be used by issuing
>> a = iLoad('')
The result of all this mechanics is that importation is fully transparent. The file type is determined automatically, imported and formatted accordingly. The resulting structure is then ready to be converted into an iData object.
>> a = iLoad(filename); % this is a structure
>> a = iData(a); % convert to iData object
In the case where the automatic importation fails, it is possible to manually force one format specification to be used, from the iLoad formats output
>> iLoad('formats')
...
HDF read_hdf5/load_psi_RITA HDF5
...
>> a = iLoad(filename, 'HDF5'); % explicitly use HDF5 loader

Distant files (http://, ...)

The specified file names may include URL tags such as
were the two latter cases first get a copy of the distant file (requires local write permission), and then import it. A valid Internet connection is then required, with proper Proxy settings if needed.

Compressed files (.zip, ...)

Compressed files (ZIP, TAR, GZIP, Z) can also be imported directly, in which case they are first extracted locally (requires write permission), and then imported. This extraction mechanism also applied for distant file.

Internal anchor references (file#anchor)

The file name may end with an anchor reference using the '#' character, such as in

'http://path/file.zip#Data'

In this case, the anchor is searched in the imported data, and only the corresponding matching elements are returned.

Supported data formats (default)

The Loaders sub-library comes with a large number of predefined import filters. These are tested one after the other, based on an initial filter selection using the file name extension and optionally pattern (words) recognition in file headers.
A compact list of all supported formats is shown with
>> a = iLoad('formats');
Some of these formats can be written back, providing a vast conversion capability (see the Save page).
The filters have been divided into two categories: those that directly use formats natively known by Matlab, and those that are more specific to neutron and X-ray communities (or other research areas).

A set of example data files is available from the Load page and the Data directory.

Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)

format (load)
Text/Binary
Description
iData/saveas
Write
text
'xvg','dat','txt...
text
Any text file, using a free format, can be read. Any text editor (gedit, notepad, nedit, ...) can display such files. The XVG format is the native XmGrace one.
Yes (DAT, M-file)
'cdf','nc' NetCDF
binary
The NetCDF format is a compact binary format. Such files can be edited/viewed with e.g ncview or hdfview, ncBrowse, OpenDX, Panoply, Autoplot.
Yes
'cdf' CDF
The CDF format is a compact binary format. Such files can be edited/viewed with e.g OpenDX and Autoplot. It is incompatible with NetCDF. Yes
'fits'
FITS astronomical image format
binary
The FITS format is a standard data format used in astronomy. Can be displayed with e.g. GIMP, xv, Autoplot.
Yes
'hdf4'
HDF 4
image

binary
The HDF4 format is a compact, binary storage format. Such files can be edited/viewed with e.g hdfview. IDL and Matlab also have dedicated HDF 4 browsers (see hdftool). Also includes HDF-EOS format. Yes (image)
'hdf5'
HDF 5
binary
The HDF5 format is a compact, compressed binary storage format. However, in its use here, it partly reconstructs the initial object, with its main values and alias/axes names. Such files can be edited/viewed with e.g hdfview, OpenDX, Autoplot. Such files can of course also be imported into iData objects. IDL and Matlab also have dedicated HDF browsers (see hdftool). This format includes the NeXus format.
The ROOT data format (CERN) can be converted into HDF5 with rootpy (root2hdf5). You can also read and write such files using LAMP.
Yes
'mat'
Matlab
Mat-file



binary
The Matlab workspace serialized binary file is compact and fast to read/write. It carries the whole object information. Such files require Matlab/iFit (or Octave) to be installed prior to importation. To save a file use: save(a,'mat_file','mat'). To load it use load(iData,'mat_file'). Yes
'xls'
Excel spreadsheet
binary
Microsoft Excel spreadsheet. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric) Yes
'gif' 'bmp' 'png' 'tiff' 'jpeg' 'ico'
binary
Standard image formats.  Yes
'ppm' 'pgm' 'pbm' text
Standard image formats. View with e.g. GIMP, xv, ImageJ. Yes
'csv'
Matlab
comma separated values

text
A file spreadsheet format. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric) Yes
'fig' Matlab figure binary
The Matlab figure can also be opened with openfig(file), and then converted to iData with iData(gcf).
Yes
'xml'
XML description file
text
The XML format (experimental)
Yes
'au'
NeXT/SUN (.au) sound
binary
Sound format, initially from Sun/NeXT

'wav'
Microsoft WAVE sound
binary
Sound format, standard

'avi'
Audio/Video Interleaved
multimedia container
binary
A video encoding format

'yaml','json'
meta language configuration files
text
Data serialization format such as "YAML Ain't Markup Language" and JSON JavaScript Object Notation yes
'ibw' Igor pro Wave
binary
Igor pro wave data file

'lvm' 'tdms'
LabView

text/binary
LabView LVM measurement file and TDMS hierarchical technical data files
'sav' IDL
binary
IDL saved data (see format here). You can also read and write such files using LAMP.

More specific data formats (neutrons and X-ray, ...)

format (load)
Text/Binary
Description
iData/saveas
Write
ILL data

text
Files generated by ILL instruments, with specific support for ILL TAS. May require post-processing to assign Signal and axes right, as well as metadata. This data can also be imported with Lamp.
ILL Cyclops Laue camera
text/binary
Specific data format for the Cyclops Neutron Laue diffractometer, which measures the reciprocal space image of a sample structure.

ChalkRiver CNBC/NRU
text
Files generated by the CNBC NRU instruments at Chalk River. Multi-wire and polarized data files are supported.

PDB file
text
Protein Data Bank file describing e.g. proteins. View with e.g. Jmol. The file structure is read, the gyration radius Rg, Protein density, volume, excess charge, scattering structure factor S(q) and pair correlation function. These files can be viewed by JMol, PyMol, ViewMol, RasMol, Avogadro, Garlic, GDis, VMD, Chimera Yasara ...

'spc'
SPEC
text
The SPEC ESRF legacy format. May be slow to import due to the file format complexity for large files.
'sim'
McStas/PGPLOT
text
The legacy format generated by McStas. Support for 1D, 2D and event lists.
Yes, similar to the DAT export format
'sqw','laz','lau'
McStas sample files
text
Sample files for Isotropic_Sqw, PowderN and Single_crystal McStas components, resp obtained from nMoldyn, FullProf, ICSD and Crystallographica.

'inx'
INX

text
The INX format is a simple format for reduced neutron time-of-flight data (see example). This data can also be imported/generated with Lamp.
'edf'
EDF ESRF Data format
binary with 512-multiple length text header The EDF format is mainly used at the ESRF and can be viewed with e.g. PyMCA, Zimg, GnuPlot, EDFExplorer, Fit2D, FabIO.
Yes
'spe'
ISIS SPE
text
The SPE data format is obtained after processing ISIS RAW files with Horace and LibISIS (Homer/2d).

'sqw'
ISIS SQW

binary
The SQW data format is obtained after processing ISIS RAW files with Horace and LibISIS.
'spe'
Roper Scientific
binary
Princeton/Roper Scientific WinView / PI Acton image file

'sif'
Andor Technology
binary
Andor Technology CCD Camera file

'mar','mccd'
MarResearch
binary (TIFF)
MarResearch CCD Camera (Mar345) format, can also be imported as a TIFF format

'img'
ADSC image
binary with 512 char header
ADSC CCD Camera

'nx','nxs','n4',
'n5','nxspe'

NeXus
binary
The NeXus files are HDF4/5 files. See above format description for more information. This format includes all types of derivatives (such as NX SPE from ISIS). Such files can be edited/viewed with e.g hdfview. Yes (as HDF5)
'nxs' Mantid workspace
binary
Specific support for reading Mantid and Lamp workspaces.
Yes
'cbf'
ESRF/SLS binary imgCIF
binary with 4096-multiple length text header
The Crystallographic Binary File format, used on some X-ray and neutron diffractometers. See the format definition. This format gathers CIF and imgCIF standards. These files are e.g. generated by Pilatus CCD Cameras.

'hdr'+'img'
MRI 3D volume
binary (2 files)
A MRI volume data format. The 'hdr' file requires an ssociated 'img' file. Format from the Analyze Biomedical Imaging Resource of the Mayo Clinic.Analyze files can be obtained from DICOM files using e.g. DicomNifti, MiTools, MRIconvert, or the Matlab tool dicm2nii. You can view with Invesalius, MiView (MiTools), MRICron.
Yes
'nii'
NifTI MRI volume
binary
A NifTI medical imaging volume data format (MRI). Such files can be obtained from DICOM files using e.g. DicomNifti, MiTools, MRIconvert, or the Matlab tool dicm2nii. You can view with MiView (MiTools), MRICron. Yes
STL/SLP/PLY/OFF/OBJ
Volume/geometry

text or binary
The STL SLP OFF OBJ and PLY format are common in  stereo-lithography CAD software. They describe raw unstructured triangulated surfaces/volumes by the unit normal and vertices (ordered by the right-hand rule) of the triangles using a three-dimensional Cartesian coordinate system. These files can be viewed with e.g. MeshLab, AdMesh, FreeCAD, Geomview, Chimera. Yes
CIF, CFL/PCR, INS/RES
Crystallography files (FullProf, ShelX)
text
The CIF format is the IUCr standard format for structure descriptions. The PCR and CFL file formats are used by FullProf and CrysFML. The RES, SHX and INS formats are used by ShelX. Such files can be viewed by DrawXtl, JMol, PyMol, RasMol, Avogadro, GDis, Chimera Yasara
MRC/CCP4/IMOD, EZD
electron density map
text(EZD) and binary
Electron density maps in MRC/CCP4 and EZD file formats. Can be visualized with PyMol, VMD, Chimera, Yasara, VEDA.
Yes
acqus, fid or ser
Bruker NMR
text
NMR data set from Bruker/WinNMR

fid, procpar
Varian NMR
binary
NMR data set from Varian
'hdr','jdf'
JEOL NMR
text/binary
NMR data set from JEOL

'0001','0002'...
Bruker FT-IR OPUS

binary
FT-IR Bruker OPUS format

'R*' and 'C*'
LLB TAS

binary
LLB TAS (1T, 2T, 4F1, 4F2, G43). Experimental. May not be properly imported. Users are advised to use B. Hennion conversion tools (convasc, convdat from 'wf').

DAT
text
Quantum Design VMS ppms/mpms

MS, D, RAW
Agilent and Thermo Finnigan MS

binary
Agilent Mass Spectrometry/Chromatography LC/MS GC/MS GC/FID
Thermo Finnigan Mass Spectrometry/Chromatography

ENDF TSL DAT
Evaluated Nuclear Data File

text
ENDF Evaluated Nuclear Data File with specific support for the thermal neutron scattering law (TSL) section (MF7, MT=2 and 4). Can make use of PyNE when installed.

MCNP ACE
A compact ENDF

text/binary
ACE MCNP files ("A Compact ENDF"). Requires PyNE to be installed.

POSCAR
text
VASP POSCAR file for molecular modelling


Unsupported data formats: how to import them anyway ?

Text files (ascii)
In principle, any text based data format will be imported, whatever be the internal organization of the data and comments. The text is read and parsed by the looktxt tool, and all numerical blocks are named automatically from the character strings/comments that precede these blocks. The result is a single structure which holds a Data and a Header field, plus additional identification information. The structure can be converted transparently to an iData object. However, in some cases, the Signal and Axes definitions will require to be set after creating the object.

If you can not import a reluctant text file, try the most tolerant text reader configurations:
>> iData('filename','text format with fast import method')
>> iData('filename','Data (text format)')
will import the raw content, without post formatting of the object in memory. You will probably need to assign manually some of the Signal and Axes (see iData object help). It may be that the looktxt MeX file is corrupted. Refer to the Changes/Bugs and Install pages.

You can tune the way a text file is read with the syntax (each option is an argument in its own):
>> a=iLoad(filename, 'text', '--catenate','--fast','--headers','--wrapped', ... other options);
where 'text' indicates that the file is not binary encoded, and any following options is forwarded to the text reader. The possible options are:

Text import option
Description
'--catenate' Catenates similar numerical fields (which have similar dimensions and names. Recommended.
'--fast'
When numerical data blocks only use isspace(3) separators (\n \r \f \t \v and space), the reading can be made faster with even lower memory requirements. Recommended.
'--headers'
Extracts headers for each numerical field. Recommended.
'--wrapped'
Catenates single wrapped output lines with previous matrices (e.g. caused by the 80 chars per line limit in old data formats written by fortran codes). Recommended.
'--section=SEC'
Classifies fields into sections matching word SEC. This option can be repeated with different SEC words.
'--metadata=META'
Extracts lines containing word META as user metadata. This option can be repeated with different META items.
'--makerows=NAME' When a numerical data block label matching NAME is found, it is transformed into a row vector. This may be used for wrapped files (--wrapped option). This option can be repeatedwith as different NAME tokens.
'--help'
Lists all possible options.

The full list of options can be obtained with:
>> iLoad(' ','text','--help')
These options can be specified in the 'options' field of an iLoad configuration entry (see below). The same options can also be used when importing directly into an iData object (replace 'iLoad' by 'iData' in the example above).


Binary files
Binary formats require more work. In principle, if iLoad does not know the structure of the file, there is no way to import an unknown binary data format file. However, iLoad allows a rapid prototyping of new import filters by mean of a simple routine that should take as input a file name, and return a structure or numerical block. Additionally, a post-process script can be attached to the new format, and executed when the file content is converted into an iData object. The specifications of the new data format will then be inserted inside the iLoad_ini configuration (locally or globally) together with e.g. file extension and other identification info. These steps are detailed in the section below.

The class2str function can more generally write any Matlab variable as a character string. And the str2struct function can read a character string and search for [name{=: }value] pairs in single lines, to set a return structure.

The import filters: customizing data formats (iLoad_ini)

Reading and Writing the iLoad configuration

The full list of supported file formats supported by iLoad and iData can be obtained from command
>> iLoad('formats');	% display the list of supported formats
>> iLoad('force'); % force re-read of the configuration file and check importers
The configuration file iLoad_ini is stored by default in the
[ ifitpath 'Libraries' filesep 'Loaders' filesep 'iLoad_ini' ]
and a local copy (which overrides the default) is stored in
prefdir
when executing
>> iLoad('save');
% Saved iLoad configuration into /home/farhi/.matlab/R2015a/iLoad.ini <- this is prefdir
This is where you may add your own customized format definitions. Deleting this file will revert to the default configuration.
>> delete([ prefdir filesep 'iLoad_ini.m' ]);
Last, the config.UseSystemDialogs field of the iLoad configuration can be set to 'yes' to use the native Matlab/Java file selector, or to 'no' to use uigetfiles. The config.MeX field configures if C/Fortran external interfaces (looktxt, cif2hkl) should be use as MeX or as separate executable files.

To update a configuration, send it to iLoad:
>> config = iLoad('config'); % retrieve the iLoad configuration and file loaders (from cache)
>> iLoad('save', config);

iLoad configuration structure

Each format 'loaders' entry is a structure with members:
Let's details one of the iLoad_ini entry for INX files.

format14.name       ='INX tof data';
format14.extension ='inx';
format14.patterns ={'INX'};
format14.method ='looktxt';
format14.options ='--headers --fortran --catenate --fast --comment=NULL';
format14.postprocess='load_ill_inx';


(...)

config.loaders = { format1, format2, format3, format4, format5, format6, ...
format7, format8, format9, format10, format11, format12, format13, format14, format15 };

Each format is a structure.
The name field gives an explicit description of the format (human readable). The extension 'inx' indicates which string should match the file extension so that this format is selected for automatic importation. An other way to select the importer is by mean of pattern search in the file header content (by default the first 10 kb of text). The list of one or more patterns is given in the patterns field (as a cellstr, here {'INX'}). The method field is the name of the function to read the file. It takes as input a string, the file name, and should preferably return a structure or numerical array. In this example, the looktxt method is used with syntax
struct = looktxt(filename)
The options field provides additional options that should be sent to the method. When given as a single string, these options are appended to the filename before being sent to the method, e.g.
struct = looktxt([ filename options ])
When given as a cell, they are passed as additional input arguments to the method, e.g.
struct=method(filename, options{1}, options{2}, ...)
The structure obtained from the method is then converted into an iData object when called from the iData or iData/load method. In these cases, the postprocess field is used so that the returned iData object is
object = postprocess( iData( method(filename, options...) ) )
The post-process is a script that takes as input an iData object, and returns a possibly modified object (or an array of objects). This is where Signal, Axes, Aliases and metadata are re-arranged in the object. The post-process scripts are store in the Loaders/postprocess directory, and use a 'load_<format>' function naming convention for clarity. When missing, the default Signal and axes definitions will be used.

The last step required for storing a new data format is to add the format definition to the config.loaders field in the iLoad_ini file. Once done, you may re-read the configuration file to update the list of known format
>> iLoad('force load config'); % force to re-read the configuration files

In short: procedure to add a new data format

  1. Start by getting the documentation about the new data format you wish to implement. This gives the format name and the extension.
  2. Then write a function (locally or in ifitpath/Loaders) that takes as input a filename and returns the file content as a structure. This is where you will open the file, access its guts, and return something (or empty/error if this fails). This is your method.
  3. Edit the iLoad_ini file from prefdir or ifitpath/Loaders.
  4. Create a new structure (which you may name as you wish) and set the name (description), method, and extension fields.
  5. Optionally define some patterns to search, to complement the extension (as a list of words).
  6. Optionally write a post-process script and preferably place it in the ifitpath/Loaders/postprocess.
  7. Add the new structure to the config.loaders list (cell) at the end of the iLoad_ini file.
  8. Save the file, and optionally re-read it with iLoad('force load config').
  9. Send me [farhi (at) ill.fr] the method/postprocess and structure information so that I can contribute your work to the package.


E. Farhi - iFit/Loaders - Mar. 22, 2017 1.9 - back to Main iFit Page ILL,
            Grenoble, France <www.ill.eu>