readcif C++ API

All of the public symbols are in the readcif namespace.

type StringVector

A std::vector of std::string’s.

int is_whitespace(char c)

is_whitespace and is_not_whitespace are inline functions to determine if a character is CIF whitespace or not. They are similar to the C/C++ standard library’s isspace function, but only recognize ASCII HT (9), LF (10), CR (13), and SPACE (32) as whitespace characters. They are not inverses because ASCII NUL (0) is both not is_whitespace and not is_not_whitespace.

int is_not_whitespace(char c)

See is_whitespace().

double str_to_float(const char *s)

Non-error checking inline function to convert a string to a floating point number. It is similar to the C/C++ standard library’s atof function, but returns NaN if no digits are found. Benchmarked by itself, it is slower than atof, but is empirically much faster when used in shared libraries. This is probably due to CPU cache behavior, but needs further investigation.

int str_to_int(const char *s)

Non-error inline function to convert a string to an integer. It is similar to the C/C++ standard library’s atoi function. Same rational for use as str_to_float(). Returns zero if no digits are found.

class CIFFile

The CIFFile is designed to be subclassed by an application to extract the data the application is interested in.

Public section:

type ParseCategory

A typedef for std::function<void (bool in_loop)>.

void register_category(const std::string &category, ParseCategory callback, const StringVector &dependencies = StringVector())

Register a callback function for a particular category.

Parameters
  • category – name of the category

  • callback – function to retrieve data from category

  • dependencies – a list of categories that must be parsed before this category.

A null callback function removes the category. Dependencies must be registered first. A category callback function can find out which category it is processing with category().

void set_unregistered_callback(ParseCategory callback)

Set callback function that will be called for unregistered categories.

void parse_file(const char *filename)
Parameters

filename – Name of file to be parsed

If possible, memory-map the given file to get the buffer to hand off to parse(). On POSIX systems, files whose size is a multiple of the system page size, have to be read into an allocated buffer instead.

void parse(const char *buffer)

Parse the input and invoke registered callback functions

Parameters

buffer – Null-terminated text of the CIF file

The text must be terminated with a null character. A common technique is to memory map a file and pass in the address of the first character. The whole file is required to simplify backtracking since data tables may appear in any order in a file. Stylized parsing is reset each time parse() is called.

int get_column(const char *name, bool required = false)
Parameters
  • tag – column name to search for

  • required – true if tag is required

Search the current categories tags to figure out which column the name corresponds to. If the name is not present, then -1 is returned unless it is required, then an error is thrown.

type ParseValue1

typedef std::function<void (const char* start)> ParseValue1;

type ParseValue2

typedef std::function<void (const char* start, const char* end)> ParseValue2;

class ParseColumnn
int column_offset

The column offset for a given tag, returned by get_column().

bool need_end

true if the end of the column needed – not needed for numbers, since all columns are terminated by whitespace.

ParseValue1 func1

The function to call if need_end is false.

ParseValue2 func2

The function to call if need_end is true.

ParseColumn(int c, ParseValue1 f)

Set column_offset and func1.

ParseColumn(int c, ParseValue2 f)

Set column_offset and func2.

type ParseValues

typedef std::vector<ParseColumn> ParseValues;

bool parse_row(ParseValues &pv)

Parse a single row of a table

Parameters

pv – The per-column callback functions

Returns

if a row was parsed

The category callback functions should call parse_row(): to parse the values for columns it is interested in. If in a loop, parse_row(): should be called until it returns false, or to skip the rest of the values, just return from the category callback. The first time parse_row() is called for a category, pv will be sorted in ascending order. Columns with negative offsets are skipped.

StringVector &parse_whole_category()

Return complete contents of a category as a vector of strings.

Returns

vector of strings

void parse_whole_category(ParseValue2 func)

Tokenize complete contents of category and call function for each item in it.

Parameters

func – callback function

const std::string &version()
Returns

the version of the CIF file if it is given

For mmCIF files it is typically empty.

const std::string &category()
Returns

the category that is currently being parsed

Only valid within a ParseCategory callback.

const std::string &block_code()
Returns

the data block code that is currently being parsed

Only valid within a ParseCategory callback and finished_parse().

bool multiple_rows() const
Returns

if current category may have multiple rows

size_t line_number() const
Returns

current line number

std::runtime_error error(const std::string &text)
Parameters

text – the error message

Returns

a exception with ” on line #” appended

Rtype

std::runtime_error

Localize error message with the current line number within the input. # is the current line number.

Stylized parsing support:

void register_heuristic_stylized_detection()

Convenience function that registers both parse_audit_syntax() for the audit_syntax category and parse_audit_conform() for the audit_conform category, for detecting whether or not to use stylized parsing.

void set_PDBx_keywords(bool stylized)

Turn on and off PDBx/mmCIF keyword styling as described in PDBx/mmCIF Styling.

Parameters

stylized – if true, assume PDBx/mmCIF keyword style

This is reset every time CIFFile::parse() or CIFFile::parse_file() is called. It may be switched on and off at any time, e.g., within a particular category callback function.

bool PDBx_keywords() const

Return if the PDBx_keywords flag is set. See set_PDBx_keywords().

void set_PDBx_fixed_width_columns(const std::string &category)

Turn on PDBx/mmCIF fixed width column parsing for a given category as described in PDBx/mmCIF Styling.

Parameters

category – name of category

This option must be set in each category callback that is needed. This option is ignored if PDBx_keywords() is false. This is not a global option because there is no reliable way to detect if the preconditions are met for each record without losing all of the speed advantages.

bool has_PDBx_fixed_width_columns() const

Return if there were any fixed width column categories specified. See set_PDBx_fixed_width_columns().

bool PDBx_fixed_width_columns() const

Return if the current category has fixed width columns. See set_PDBx_fixed_width_columns().

Protected section:

void data_block(const std::string &name)
Parameters

name – name of data block

data_block is a virtual function that is called whenever a new data block is found. Defaults to being ignored. Replace in subclass if needed.

void save_frame(const std::string &code)
Parameters

code – the same frame code

save_fame is a virtual function that is called when a save frame header or terminator is found. It defaults to throwing an exception. It should be replaced if the application were to try to parse a CIF dictionary.

void global_block()

global_block is a virtual function that is called whenever the global_ reserved word is found. It defaults to throwing an exception. In CIF files, global_ is unused. However, some CIF-like files, e.g., the CCP4 monomer library, use the global_ keyword.

void reset_parse()

reset_parse is a virtual function that is called whenever the parse function is called. For example, PDB stylized parsing can be turned on here.

void finished_parse()

finished_parse is a virtual function that is called whenever the parse function has successfully finished parsing.