Content-type: text/html
Man page of looktxt
looktxt
Section: USER COMMANDS (1)
Updated: July 16, 2009
Index
Return to Main Contents
NAME
looktxt - Search and export numerics from any text/ascii file
SYNOPSIS
looktxt
[-b][-c][-f FORMAT][-H][-s SEC ...][-m META ...] file1 file2 ...
DESCRIPTION
Extracting data from a text file is a never ending story. Usually, one will write a short script or program/function to analyse each specific input data format. The
looktxt
command purpose is to read any text data file containing numerical blocks just as a human would read it. Specifically, it looks for contiguous numerical blocks, which are stored into matrices, and other parts of the input file are classified as
headers
which are optionally exported. Numerical blocks are labelled according to the preceeding header block last word.
Blocks read from the data file can be sorted into sections. Each section
SEC
starts when it appears in a
header
and contains all following fields until a new section is found or the end of the file.
Additionally, one may search for specific metadata keywords, at user's choice. Each data field matching the keyword metadata
META
in its
headers
will create a new entry in the
MetaData
section.
The output data files may be generated using "Matlab", "Scilab", "IDL", "Octave", "XML", "HTML", and "Raw" formats (using the -f FORMAT option), using a structure-like hierarchy. This hierarchy contains all
sections, metadata
and optionally
headers
that have been found during the parsing of the input data file.
After using
looktxt foo
the data is simply loaded into memory using e.g. 'matlab> ans=foo;'. The exact method to import data is indicated at the begining of the output data file, and depends on the format.
The command can handle large files (hundreds of Mb) within a few secconds, with minimal memory requirements.
OPTIONS
- -h | --help
-
displays the command help
- -b | --binary
-
sets binary mode for large numerical blocks (more than 100 elements). This option creates an additional '.bin' file to be read accordingly to the references indicated for each field in the output text data file. This is transparently done when reading output files with matlab(1), scilab(1), idl(1), and octave(1).
- -c | --catenate
-
Catenates similar numerical fields (which have similar dimensions and names)
- -F | --force
-
Overwrites existing files
- -f FORMAT | --format=FORMAT
-
Sets the output format for generated files
- --fortran | --wrapped
-
Catenates single wrapped output lines with previous matrices (e.g. caused by the 80 chars per line limit in old data formats written by fortran codes)
- -H | --headers
-
Extracts
headers
for each numerical field (recommended)
- -s SEC | --section=SEC ...
-
Classifies fields into
sections
matching word SEC. This option can be repeated
- -m META | --metadata=META ...
-
Extracts lines containing word META as user
metadata.
This option can be repeated
OTHER OPTIONS
The command supports other options which are listed using
looktxt
-h
Among these are
- --fast
-
When numerical data blocks only use isspace(3) separators (\n \r \f \t \v and space), the reading can be made faster with even lower memory requirements.
- --silent
-
Silent mode, to only display fatal errors
- --verbose | -v | --debug
-
To display plenty of informations
- --makerows=NAME ...
-
When a numerical data block label matching NAME is found, it is transformed into a row vector. This may be used for wrapped files (--fortran option). This option can be repeated
- - o FILE | --outfile=FILE
-
to use FILE as output file. The streams
stdout
and
stderr
may be used, but we then recommend to specifiy the --silent option to avoid unwanted messages in the output.
EXAMPLES
- Typical usage (exporting headers as well)
-
looktxt
-H foo
- For large data files (using binary float storage, catenate and fortran mode)
-
looktxt
-F -c -H -b --fortran foo
- Sorting data into sections, and searching a metadata keyword
-
looktxt
-s SEC1 -s SEC2 -m META1 -H
foo
will result in the following Matlab structure:
Creator: 'Looktxt 1.0.7 16 July 2009 Farhi E. [farhi at ill.fr]'
User: 'farhi on localhost'
Source: 'foo'
Date: 'Fri Dec 12 11:35:20 CET 2008'
Format: 'Matlab'
Command: [1x195 char]
Filename: 'foo.m'
Headers: struct SEC1, struct SEC2, struct MetaData (headers)
Data: struct SEC1, struct SEC2, struct MetaData (numerics)
- Some options that may be used for specific data formats:
-
- ILL ASCII data format:
-
--headers --fortran --catenate --fast --binary --makerows=FFFF --makerows=JJJJ --makerows=IIII
- ILL TAS ASCII data format:
-
--headers --section=PARAM --section=VARIA --section=ZEROS --section=POLAN --metadata=DATA
- SPEC data file (ESRF, X-rays...):
-
--headers --metadata="#S " --comment=
- Most text-based data files:
-
--fast --fortran --binary --force --catenate --comment=NULL
ENVIRONMENT
The
LOOKTXT_FORMAT
environment variable may be set to define the default export format. When not defined, the Matlab format is used as default.
BUGS
The command by itself should work properly. In case of trouble, you may have more information with the --verbose or --debug options. Most problems arise when importing data after running looktxt. E.g. these come from idl(1) and scilab(1) limitations (lines too long, too many structure elements, ...). The --binary may solve some of these import issues.
In case of memory allocation problems, you may try the --fast option.
EXIT STATUS
looktxt returns -1 in case of error, 0 when no file was processed, or the number of processed files.
INSTALLATION
Usual procedure: ./configure; make; make install. An installer is available using
matlab> install
which may be used both from Linux/Unix and Windows systems. In principle, the
only required file is the executable
looktxt
, to be copied in a system executable location, e.g. '/usr/local/bin', '/usr/bin', or 'c:\windows\system32'.
Binaries are pre-compiled for usual systems with the package.
AUTHOR
Emmanuel FARHI (farhi (at) ill.eu) and the Institut Laue Langevin at http://www.ill.eu
SEE ALSO
matlab(1), idl(1), scilab(1), octave(1), xmlcatalog(1), html2text(1)
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- OTHER OPTIONS
-
- EXAMPLES
-
- ENVIRONMENT
-
- BUGS
-
- EXIT STATUS
-
- INSTALLATION
-
- AUTHOR
-
- SEE ALSO
-
This document was created by
man2html,
using the manual pages.
Time: 15:00:52 GMT, July 16, 2009