| Bibutils Library Specifications (Version 4.0+) | ||
|---|---|---|
|
1.0 Introduction The break between Bibutils 3.43 and Bibutils 4.0 was established by changes in the library interface designed to eliminate passing of redundant information and eliminate the need for global variables in the calling programs. The following library description is for Bibutils 4.0 (but applies with some changes to previous versions). As I believe that example code speaks far more loudly than words, I'll start with a minimal C example that reads in a RIS-formatted library from the standard input and writes a BibTeX-formatted library to the standard output:
#include <stdio.h>
#include "bibutils.h"
int main( int argc, char *argv[] )
{
bibl bibliography;
param bibparams;
int err;
/* Initialize the parameters to tell library what we want to do */
bibl_initparams( &bibparams, BIBL_RISIN, BIBL_BIBTEXOUT, "program name" );
/* Now read the bibliography in one format and write it in another */
bibl_init( &biblography );
err = bibl_read( &bibliography, stdin, "stdin", &bibparams );
if ( err ) bibl_reporterr( err );
else {
err = bibl_write( &bibliography, stdout, &bibparams );
if ( err ) bibl_reporterr( err );
}
bibl_free( &bibliography );
bibl_freeparams( &bibparams );
}
2.0 struct param The struct param provides the instructions for how the library reads and writes files. It includes both the format of the files and the character sets for the input and output. The struct param needs to initialized with a call to bibl_initparams() void bibl_initparams( param *bibparams, int readformat, int writeformat, char *progname );
In addition to copying the relevant information into the relevant fields of struct param, bibl_initparam() also sets reasonable defaults for all of the other fields based on the reading and writing formats. All of these can be changed, but a few of them (such as altering the reading and writing formats) don't make sense after a particular struct param is initialized. The definition of the struct param is as follows:
typedef struct param {
int readformat;
int writeformat;
int charsetin;
uchar charsetin_src; /*BIBL_SRC_DEFAULT, BIBL_SRC_FILE, BIBL_SRC_USER*/
uchar latexin;
uchar utf8in;
uchar xmlin;
int charsetout;
uchar charsetout_src; /* BIBL_SRC_PROG, BIBL_SRC_USER */
uchar latexout; /* If true, write Latex codes */
uchar utf8out; /* If true, write characters encoded by utf8 */
uchar utf8bom; /* If true, write utf8 byte-order-mark */
uchar xmlout; /* If true, write characters in XML entities */
int format_opts; /* options for specific formats */
int addcount; /* add reference count to reference id */
uchar output_raw;
uchar verbose;
uchar singlerefperfile;
list asis; /* Names that shouldn't be mangled */
list corps; /* Names that shouldn't be mangled-MODS corporation type */
char *progname;
} param;
Character set issues Character set issues are particularly thorny, but are dealt with reasonably logically by the library. By default, all of these fields are defined based on the read and write formats and the average library user likely only needs to modify param.charsetin and param.charsetout. param.charsetin, param.charsetout This field contains the definition of the charset that should be used for reading/writing the references. Internally, all of the references are stored in UTF8-encoded Unicode. Currently charsets 0-81 are defined, with 66 (Latin-1/ISO8859-1) being defined as the default BIBL_CHARSET_DEFAULT. In addition, BIBL_CHARSET_UNICODE and BIBL_CHARSET_GB18030 (a Chinese character set) are also defined. Importantly, to write output in UTF8-encoded Unicode, param.charsetout must be BIBL_CHARSET_UNICODE and param.utf8out must be non-zero. param.charsetin_src, param.charsetin_src These fields record the information about where the charset information comes from. They are defined as BIBL_SRC_DEFAULT, set by the program default, BIBL_SRC_FILE, set by information read from the input file, and BIBL_SRC_USER, set by the user. The order of priority is the user's choice goes first, the information in the file is second, and the program default is used if no other information is available. param.xmlin, param.xmlout If these fields are non-zero, handle XML entities. The param.xmlin is a bit "stronger" than param.xmlout in that if param.xmlin is set, both the required XML entities and the larger set of entities defined for HTML are all processed. If param.xmlout is set, only required entities are converted. For any XML-formatted output (e.g. MODS, or Word 2007), having param.xmlout set is very important. param.latexin, param.latexout param.utf8in, param.utf8out param.utf8bom If this is set as non-zero, a UTF8 byte-order-mark will be written at the beginning of a file. Normally this should be used in conjunction with setting param.utf8out. Format-specific options: param.format_opts For several different formats, there are a variety of options that have been requested over the years. These options can be directly encoded by setting specific bytes in param.format_opts that are defined for the specific format. The BibTeX format has the most options, which are defined in the bibtexout.h header file. Adding to param.asis and param.corps void bibl_readasis( param *bibparams, char *filename ); void bibl_readcorps( param *bibparams, char *filename ); These functions read from the file "filename", one name per line. void bibl_addtoasis( param *bibparams, char *entry ); void bibl_addtocorps( param *bibparams, char *entry ); These functions add an entry. Freeing struct param With the inclusion of the filename and the asis and corps name lists, params can potentially take up a fair amount of heap space. Clearing this space can be done with a call to bibl_freeparams(). void bibl_freeparams( param *bibparams );
3.0 struct bibl The struct bibl will contain all of the bibliographies that are read from the file and the pointer to a specific instance is all that is needed to manage the bibliography. The internal details of struct bibl should be considered "private", meaning that they could change as Bibutils evolves (though in practice they tend not have). People actually interrogating the internal details of struct bibl should not complain if a new verion of Bibutils breaks their code. The struct bibl must be initialized before it is used by any library code, via a call to bibl_init. void bibl_init( bibl *bibliography );
4.0 Reading struct bibl bibliographies Reading bibliography information into struct bibl is performed by the bibl_read() function call. int bibl_read( bibl *bibliography, FILE *fp, char *filename, param *bibparams );
Note that bibl_read() will append and not replace bibliography information to the end of current struct bibl, so that merging the information from multiple files is trivial. Merging information from multiple files with different formats is also trivial as the internal state of struct bibl is independent of the struct param; however, different struct params should be defined for each new format read. 5.0 Writing struct bibl bibliographies Writing bibliography information uses the bibl_write call. int bibl_write( bibl *bibliography, FILE *fp, param *bibparams );
6.0 Errors returned by the library void bibl_reporterr( int err ); 7.0 Conversion from Bibutils 3.x library to Bibutils 4.x library Conversion for programs using the previous version of the library is fairly simple, almost all of it involves deleting code now made unnecessary. Compare the following two example code fragments, changed code is highlighted in red.:
|