iFit: Loaders import sub-library

iLoad the main file importer
Supported data formats (default)
The import filters: customizing data formats (iLoad_ini)

Commands we use in this page: iLoad(filename)
See also: Load (import) data sets, Save (export) data sets and Save Models

The Loaders sub-library takes care to import a large set of file formats, and return them as a structure. This structure can then be converted into an iData object. There is a similar documentation for models in the iFunc page.

iLoad: the main file importer

When a new file is imported into an iData object by means of e.g.

>> a = iData([ ifitpath 'Data/ILL_IN6.dat' ]);
>> a = load(iData, [ ifitpath 'Data/ILL_IN6.dat' ]);

the request for file access is sent transparently to the iLoad function. This latter performs the following operations:

resolves the file name, and identifies directory importation, wildcards (*, ?), distant files with URLs (file://, ftp://, http://, https://), compressed files (zip, gzip, tar, compress) ;
selects a list of possible file types from e.g. file extension ;
searches for patterns in text headers to better determine file type ;
requests importation by all possible file types one after the other until one succeeds ;
formats the file content so that it is returned as a structure containing standard fields ;
executes remaining post-process scripts.

The list of known file formats which can be imported is defined in the iLoad_ini file (see below), which is a list of format definitions with extension, patterns to search for, and post-process filters to run after importation. The order of these filters is important as they are parsed one after the other. A lower level of known formats is then added to the iLoad_ini preferences, with more standard file formats (mainly those that Matlab knows de-facto).

A file selector, which supports multiple file selection, may be used by issuing

>> a = iLoad('')

The result of all this mechanics is that importation is fully transparent. The file type is determined automatically, imported and formatted accordingly. The resulting structure is then ready to be converted into an iData object.

>> a = iLoad(filename); % this is a structure
>> a = iData(a);        % convert to iData object

In the case where the automatic importation fails, it is possible to manually force one format specification to be used, from the iLoad formats output

>> iLoad('formats')
...
HDF   read_hdf5/load_psi_RITA  HDF5
...
>> a = iLoad(filename, 'HDF5'); % explicitly use HDF5 loader

Distant files (http://, ...)

The specified file names may include URL tags such as

file://filename
ftp://filename
http://filename
https://filename

were the two latter cases first get a copy of the distant file (requires local write permission), and then import it. A valid Internet connection is then required, with proper Proxy settings if needed.

Compressed files (.zip, ...)

Compressed files (ZIP, TAR, GZIP, Z) can also be imported directly, in which case they are first extracted locally (requires write permission), and then imported. This extraction mechanism also applied for distant file.

Internal anchor references (file#anchor)

The file name may end with an anchor reference using the '#' character, such as in

'http://path/file.zip#Data'

In this case, the anchor is searched in the imported data, and only the corresponding matching elements are returned.

Supported data formats (default)

The Loaders sub-library comes with a large number of predefined import filters. These are tested one after the other, based on an initial filter selection using the file name extension and optionally pattern (words) recognition in file headers.
A compact list of all supported formats is shown with

>> a = iLoad('formats');

Some of these formats can be written back, providing a vast conversion capability (see the Save page).
The filters have been divided into two categories: those that directly use formats natively known by Matlab, and those that are more specific to neutron and X-ray communities (or other research areas).

A set of example data files is available from the Load page and the Data directory.

Legacy data formats (text, NetCDF, HDF, FITS, images, XML, CSV, ...)

format (load)	Text/Binary	Description	iData/saveas Write
text 'xvg','dat','txt...	text	Any text file, using a free format, can be read. Any text editor (gedit, notepad, nedit, ...) can display such files. The XVG format is the native XmGrace one.	Yes (DAT, M-file)
'cdf','nc' NetCDF	binary	The NetCDF format is a compact binary format. Such files can be edited/viewed with e.g ncview or hdfview, ncBrowse, OpenDX, Panoply, Autoplot.	Yes
'cdf' CDF		The CDF format is a compact binary format. Such files can be edited/viewed with e.g OpenDX and Autoplot. It is incompatible with NetCDF.	Yes
'fits' FITS astronomical image format	binary	The FITS format is a standard data format used in astronomy. Can be displayed with e.g. GIMP, xv, Autoplot.	Yes
'hdf4' HDF 4 image	binary	The HDF4 format is a compact, binary storage format. Such files can be edited/viewed with e.g hdfview. IDL and Matlab also have dedicated HDF 4 browsers (see hdftool). Also includes HDF-EOS format.	Yes (image)
'hdf5' HDF 5	binary	The HDF5 format is a compact, compressed binary storage format. However, in its use here, it partly reconstructs the initial object, with its main values and alias/axes names. Such files can be edited/viewed with e.g hdfview, OpenDX, Autoplot. Such files can of course also be imported into iData objects. IDL and Matlab also have dedicated HDF browsers (see hdftool). This format includes the NeXus format. The ROOT data format (CERN) can be converted into HDF5 with rootpy (root2hdf5). You can also read and write such files using LAMP.	Yes
'mat' Matlab Mat-file	binary	The Matlab workspace serialized binary file is compact and fast to read/write. It carries the whole object information. Such files require Matlab/iFit (or Octave) to be installed prior to importation. To save a file use: save(a,'mat_file','mat'). To load it use load(iData,'mat_file').	Yes
'xls' Excel spreadsheet	binary	Microsoft Excel spreadsheet. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric)	Yes
'gif' 'bmp' 'png' 'tiff' 'jpeg' 'ico'	binary	Standard image formats.	Yes
'ppm' 'pgm' 'pbm'	text	Standard image formats. View with e.g. GIMP, xv, ImageJ.	Yes
'csv' Matlab comma separated values	text	A file spreadsheet format. Can be viewed with any spreadsheet software (OpenOffice, Excel, Gnumeric)	Yes
'fig' Matlab figure	binary	The Matlab figure can also be opened with openfig(file), and then converted to iData with iData(gcf).	Yes
'xml' XML description file	text	The XML format (experimental)	Yes
'au' NeXT/SUN (.au) sound	binary	Sound format, initially from Sun/NeXT
'wav' Microsoft WAVE sound	binary	Sound format, standard
'avi' Audio/Video Interleaved multimedia container	binary	A video encoding format
'yaml','json' meta language configuration files	text	Data serialization format such as "YAML Ain't Markup Language" and JSON JavaScript Object Notation	yes
'ibw' Igor pro Wave	binary	Igor pro wave data file
'lvm' 'tdms' LabView	text/binary	LabView LVM measurement file and TDMS hierarchical technical data files
'sav' IDL	binary	IDL saved data (see format here). You can also read and write such files using LAMP.
'npy' NUMPY	binary	Numpy binary array NPY format (single array)	yes

More specific data formats (neutrons and X-ray, ...)

format (load)	Text/Binary	Description	iData/saveas Write
ILL data	text	Files generated by ILL instruments, with specific support for ILL TAS. May require post-processing to assign Signal and axes right, as well as metadata. This data can also be imported with Lamp.
ILL Cyclops Laue camera	text/binary	Specific data format for the Cyclops Neutron Laue diffractometer, which measures the reciprocal space image of a sample structure.
ChalkRiver CNBC/NRU	text	Files generated by the CNBC NRU instruments at Chalk River. Multi-wire and polarized data files are supported.
PDB file	text	Protein Data Bank file describing e.g. proteins. View with e.g. Jmol. The file structure is read, the gyration radius Rg, Protein density, volume, excess charge, scattering structure factor S(q) and pair correlation function. These files can be viewed by JMol, PyMol, ViewMol, RasMol, Avogadro, Garlic, GDis, VMD, Chimera Yasara ...
'spc' SPEC	text	The SPEC ESRF legacy format. May be slow to import due to the file format complexity for large files.
'sim' McStas/PGPLOT	text	The legacy format generated by McStas. Support for 1D, 2D and event lists.	Yes, similar to the DAT export format
'sqw','laz','lau' McStas sample files	text	Sample files for Isotropic_Sqw, PowderN and Single_crystal McStas components, resp. obtained from nMoldyn, FullProf, ICSD and Crystallographica. The McStas Sqw format is not similar to the Sqw Horace objects.	Yes for 'sqw' Can create Lau nd Laz with cif2hkl.
'inx' INX ILL	text	The INX format is a S(phi,w) simple format for reduced neutron time-of-flight data (see example). This data can also be imported/generated with Lamp.	Yes
'edf' EDF ESRF Data format	binary with 512-multiple length text header	The EDF format is mainly used at the ESRF and can be viewed with e.g. PyMCA, Zimg, GnuPlot, EDFExplorer, Fit2D, FabIO.	Yes
'spe' ISIS SPE	text	The SPE data format is a S(phi,w) obtained after processing ISIS RAW files with Horace and LibISIS (Homer/2d).	Yes
'sqw' ISIS SQW	binary	The SQW data format is obtained after processing ISIS RAW files with Horace and LibISIS. This format is binary, and is not similar to the Mcstas Sqw format.
'spe' Roper Scientific	binary	Princeton/Roper Scientific WinView / PI Acton image file
'sif' Andor Technology	binary	Andor Technology CCD Camera file
'mar','mccd' MarResearch	binary (TIFF)	MarResearch CCD Camera (Mar345) format, can also be imported as a TIFF format
'img' ADSC image	binary with 512 char header	ADSC CCD Camera
'nx','nxs','n4', 'n5','nxspe' NeXus	binary	The NeXus files are HDF4/5 files. See above format description for more information. This format includes all types of derivatives (such as NX SPE from ISIS). Such files can be edited/viewed with e.g hdfview.	Yes (as HDF5)
'nxs' Mantid workspace	binary	Specific support for reading Mantid and Lamp workspaces.	Yes
'cbf' ESRF/SLS binary imgCIF	binary with 4096-multiple length text header	The Crystallographic Binary File format, used on some X-ray and neutron diffractometers. See the format definition. This format gathers CIF and imgCIF standards. These files are e.g. generated by Pilatus CCD Cameras.
'hdr'+'img' MRI 3D volume	binary (2 files)	A MRI volume data format. The 'hdr' file requires an ssociated 'img' file. Format from the Analyze Biomedical Imaging Resource of the Mayo Clinic.Analyze files can be obtained from DICOM files using e.g. DicomNifti, MiTools, MRIconvert, or the Matlab tool dicm2nii. You can view with Invesalius, MiView (MiTools), MRICron.	Yes
'nii' NifTI MRI volume	binary	A NifTI medical imaging volume data format (MRI). Such files can be obtained from DICOM files using e.g. DicomNifti, MiTools, MRIconvert, or the Matlab tool dicm2nii. You can view with MiView (MiTools), MRICron.	Yes
STL/SLP/PLY/OFF/OBJ Volume/geometry	text or binary	The STL SLP OFF OBJ and PLY format are common in stereo-lithography CAD software. They describe raw unstructured triangulated surfaces/volumes by the unit normal and vertices (ordered by the right-hand rule) of the triangles using a three-dimensional Cartesian coordinate system. These files can be viewed with e.g. MeshLab, AdMesh, FreeCAD, Geomview, Chimera.	Yes
CIF, CFL/PCR, INS/RES Crystallography files (FullProf, ShelX)	text	The CIF format is the IUCr standard format for structure descriptions. The PCR and CFL file formats are used by FullProf and CrysFML. The RES, SHX and INS formats are used by ShelX. Such files can be viewed by DrawXtl, JMol, PyMol, RasMol, Avogadro, GDis, Chimera Yasara It is also possible to get a CIF file automatically from the Crystallography Open Database with: read_cif('cod: ID') and read_cif('cod: chem_formula') which both require a valid network connection (and proxy if any). See more details here.
MRC/CCP4/IMOD, EZD electron density map	text(EZD) and binary	Electron density maps in MRC/CCP4 and EZD file formats. Can be visualized with PyMol, VMD, Chimera, Yasara, VEDA.	Yes
acqus, fid or ser Bruker NMR	text	NMR data set from Bruker/WinNMR
fid, procpar Varian NMR	binary	NMR data set from Varian
'hdr','jdf' JEOL NMR	text/binary	NMR data set from JEOL
'0001','0002'... Bruker FT-IR OPUS	binary	FT-IR Bruker OPUS format
'R' and 'C' LLB TAS	binary	LLB TAS (1T, 2T, 4F1, 4F2, G43). Experimental. May not be properly imported. Users are advised to use B. Hennion conversion tools (convasc, convdat from 'wf').
DAT	text	Quantum Design VMS ppms/mpms
MS, D, RAW Agilent and Thermo Finnigan MS	binary	Agilent Mass Spectrometry/Chromatography LC/MS GC/MS GC/FID Thermo Finnigan Mass Spectrometry/Chromatography
ENDF TSL DAT Evaluated Nuclear Data File	text	ENDF Evaluated Nuclear Data File with specific support for the thermal neutron scattering law (TSL) section (MF7, MT=2 and 4). Can make use of PyNE when installed.
MCNP ACE A compact ENDF	text/binary	ACE MCNP files ("A Compact ENDF"). Requires PyNE to be installed.
POSCAR	text	VASP POSCAR file for molecular modelling. See with VMD.
SDF HP/Agilent/Keysight	binary	HP/Agilent/Keysight Standard Data Format (SDF)
SPINWAVE LLB	text	A SPINWAVE (LLB) input for to model spin-waves [text]
SQW4D McStas	text	Sqw 4D data set for McStas Single_crystal_inelastic component. This format is not* similar to the Sqw Horace objects.*	Yes

Unsupported data formats: how to import them anyway ?

Text files (ascii)
In principle, any text based data format will be imported, whatever be the internal organization of the data and comments. The text is read and parsed by the looktxt tool, and all numerical blocks are named automatically from the character strings/comments that precede these blocks. The result is a single structure which holds a Data and a Header field, plus additional identification information. The structure can be converted transparently to an iData object. However, in some cases, the Signal and Axes definitions will require to be set after creating the object.

If you can not import a reluctant text file, try the most tolerant text reader configurations:

>> iData('filename','text format with fast import method')
>> iData('filename','Data (text format)')

will import the raw content, without post formatting of the object in memory. You will probably need to assign manually some of the Signal and Axes (see iData object help). It may be that the looktxt MeX file is corrupted. Refer to the Changes/Bugs and Install pages.

You can tune the way a text file is read with the syntax (each option is an argument in its own):

>> a=iLoad(filename, 'text', '--catenate','--fast','--headers','--wrapped', ... other options);

where 'text' indicates that the file is not binary encoded, and any following options is forwarded to the text reader. The possible options are:

Text import option	Description
'--catenate'	Catenates similar numerical fields (which have similar dimensions and names. Recommended.
'--fast'	When numerical data blocks only use isspace(3) separators (\n \r \f \t \v and space), the reading can be made faster with even lower memory requirements. Recommended.
'--headers'	Extracts headers for each numerical field. Recommended.
'--wrapped'	Catenates single wrapped output lines with previous matrices (e.g. caused by the 80 chars per line limit in old data formats written by fortran codes). Recommended.
'--section=SEC'	Classifies fields into sections matching word SEC. This option can be repeated with different SEC words.
'--metadata=META'	Extracts lines containing word META as user metadata. This option can be repeated with different META items.
'--makerows=NAME'	When a numerical data block label matching NAME is found, it is transformed into a row vector. This may be used for wrapped files (--wrapped option). This option can be repeatedwith as different NAME tokens.
'--help'	Lists all possible options.

The full list of options can be obtained with:

>> iLoad(' ','text','--help')

These options can be specified in the 'options' field of an iLoad configuration entry (see below). The same options can also be used when importing directly into an iData object (replace 'iLoad' by 'iData' in the example above).

Binary files
Binary formats require more work. In principle, if iLoad does not know the structure of the file, there is no way to import an unknown binary data format file. However, iLoad allows a rapid prototyping of new import filters by mean of a simple routine that should take as input a file name, and return a structure or numerical block. Additionally, a post-process script can be attached to the new format, and executed when the file content is converted into an iData object. The specifications of the new data format will then be inserted inside the iLoad_ini configuration (locally or globally) together with e.g. file extension and other identification info. These steps are detailed in the section below.

The class2str function can more generally write any Matlab variable as a character string. And the str2struct function can read a character string and search for [name{=: }value] pairs in single lines, to set a return structure.

The import filters: customizing data formats (iLoad_ini)

Reading and Writing the iLoad configuration

The full list of supported file formats supported by iLoad and iData can be obtained from command

>> iLoad('formats');	% display the list of supported formats
>> iLoad('force');	% force re-read of the configuration file and check importers

The configuration file iLoad_ini is stored by default in the

[ ifitpath 'Libraries' filesep 'Loaders' filesep 'iLoad_ini' ]

and a local copy (which overrides the default) is stored in

prefdir

when executing

>> iLoad('save');
% Saved iLoad configuration into /home/farhi/.matlab/R2015a/iLoad.ini <- this is prefdir

This is where you may add your own customized format definitions. Deleting this file will revert to the default configuration.

>> delete([ prefdir filesep 'iLoad_ini.m' ]);

Last, the config.UseSystemDialogs field of the iLoad configuration can be set to 'yes' to use the native Matlab/Java file selector, or to 'no' to use uigetfiles. The config.MeX field configures if C/Fortran external interfaces (looktxt, cif2hkl) should be use as MeX or as separate executable files.

To update a configuration, send it to iLoad:

>> config = iLoad('config'); % retrieve the iLoad configuration and file loaders (from cache)
>> iLoad('save', config);

iLoad configuration structure

Each format 'loaders' entry is a structure with members:

loader.name: a description of the format
loader.method: the function used to read the file and return a structure or numerical array: data=method(filename)
loader.extension: a single file extension, or a cell string of file extensions
loader.patterns: a cell string of patterns to search in text headers (less than 10% of binary data in the first 10 kb) [optional]
loader.options: additional options to be passed to the loader.method. A single string is catenated after the file name, a cell array is sent as additional arguments to the method. [optional]
loader.postprocess: a script which handles iData commands to shape the final object, id=loader.postprocess(id) [optional]

Let's details one of the iLoad_ini entry for INX files.

format14.name       ='INX tof data';
format14.extension  ='inx';
format14.patterns   ={'INX'};
format14.method     ='looktxt';
format14.options    ='--headers --fortran  --catenate --fast --comment=NULL';
format14.postprocess='load_ill_inx';


(...)

config.loaders =  { format1, format2, format3, format4, format5, format6, ...
	       format7, format8, format9, format10, format11, format12, format13, format14, format15 };

Each format is a structure.
The name field gives an explicit description of the format (human readable). The extension 'inx' indicates which string should match the file extension so that this format is selected for automatic importation. An other way to select the importer is by mean of pattern search in the file header content (by default the first 10 kb of text). The list of one or more patterns is given in the patterns field (as a cellstr, here {'INX'}). The method field is the name of the function to read the file. It takes as input a string, the file name, and should preferably return a structure or numerical array. In this example, the looktxt method is used with syntax

struct = looktxt(filename)

The options field provides additional options that should be sent to the method. When given as a single string, these options are appended to the filename before being sent to the method, e.g.

struct = looktxt([ filename options ])

When given as a cell, they are passed as additional input arguments to the method, e.g.

struct=method(filename, options{1}, options{2}, ...)

The structure obtained from the method is then converted into an iData object when called from the iData or iData/load method. In these cases, the postprocess field is used so that the returned iData object is

object = postprocess( iData( method(filename, options...) ) )

The post-process is a script that takes as input an iData object, and returns a possibly modified object (or an array of objects). This is where Signal, Axes, Aliases and metadata are re-arranged in the object. The post-process scripts are store in the Loaders/postprocess directory, and use a 'load_<format>' function naming convention for clarity. When missing, the default Signal and axes definitions will be used.

The last step required for storing a new data format is to add the format definition to the config.loaders field in the iLoad_ini file. Once done, you may re-read the configuration file to update the list of known format

>> iLoad('force load config'); % force to re-read the configuration files

In short: procedure to add a new data format

Start by getting the documentation about the new data format you wish to implement. This gives the format name and the extension.
Then write a function (locally or in ifitpath/Loaders) that takes as input a filename and returns the file content as a structure. This is where you will open the file, access its guts, and return something (or empty/error if this fails). This is your method.
Edit the iLoad_ini file from prefdir or ifitpath/Loaders.
Create a new structure (which you may name as you wish) and set the name (description), method, and extension fields.
Optionally define some patterns to search, to complement the extension (as a list of words).
Optionally write a post-process script and preferably place it in the ifitpath/Loaders/postprocess.
Add the new structure to the config.loaders list (cell) at the end of the iLoad_ini file.
Save the file, and optionally re-read it with iLoad('force load config').
Send me [farhi (at) ill.fr] the method/postprocess and structure information so that I can contribute your work to the package.

E. Farhi - iFit/Loaders - Nov. 27, 2018 2.0.2 - back to Main iFit Page