Bibutils Library Specifications (Version 4.0+)
Bibutils Library Specifications
Document Version 0.1 December 10 2008
Chris Putnam

1.0 Introduction

The break between Bibutils 3.43 and Bibutils 4.0 was established by changes in the library interface designed to eliminate passing of redundant information and eliminate the need for global variables in the calling programs. The following library description is for Bibutils 4.0 (but applies with some changes to previous versions).

As I believe that example code speaks far more loudly than words, I'll start with a minimal C example that reads in a RIS-formatted library from the standard input and writes a BibTeX-formatted library to the standard output:

#include <stdio.h>
#include "bibutils.h"
int main( int argc, char *argv[] )
{
	bibl bibliography;
	param bibparams;
	int err;

	/* Initialize the parameters to tell library what we want to do */
	bibl_initparams( &bibparams, BIBL_RISIN, BIBL_BIBTEXOUT, "program name" );

	/* Now read the bibliography in one format and write it in another */
	bibl_init( &biblography );
	err = bibl_read( &bibliography, stdin, "stdin", &bibparams );
	if ( err ) bibl_reporterr( err );
	else {
		err = bibl_write( &bibliography, stdout, &bibparams );
		if ( err ) bibl_reporterr( err );
	}
	bibl_free( &bibliography );
	bibl_freeparams( &bibparams );
}

2.0 struct param

The struct param provides the instructions for how the library reads and writes files. It includes both the format of the files and the character sets for the input and output. The struct param needs to initialized with a call to bibl_initparams()

void bibl_initparams( param *bibparams, int readformat, int writeformat, char *progname );
  • bibparams is a pointer to a struct param to be initialized
  • readformat is the format the file show be read in:
    • BIBL_MODSIN
    • BIBL_BIBTEXIN
    • BIBL_RISIN
    • BIBL_ENDNOTEIN
    • BIBL_COPACIN
    • BIBL_ISIIN
    • BIBL_MEDLINEIN
    • BIBL_ENDNOTEXMLIN
    • BIBL_BIBLATEXIN
  • writeformat is the format the file should be written in:
    • BIBL_MODSOUT
    • BIBL_BIBTEXOUT
    • BIBL_RISOUT
    • BIBL_ENDNOTEOUT
    • BIBL_ISIOUT
    • BIBL_WORD2007OUT
    • BIBL_ADSABSOUT
  • progname is a C-formatted string (NULL-terminated) containing the program name for error messages. Passing a NULL pointer here is acceptable and is handled properly by the library.

In addition to copying the relevant information into the relevant fields of struct param, bibl_initparam() also sets reasonable defaults for all of the other fields based on the reading and writing formats. All of these can be changed, but a few of them (such as altering the reading and writing formats) don't make sense after a particular struct param is initialized.

The definition of the struct param is as follows:

typedef struct param {

        int readformat;
        int writeformat;

        int charsetin;
        uchar charsetin_src; /*BIBL_SRC_DEFAULT, BIBL_SRC_FILE, BIBL_SRC_USER*/
        uchar latexin;
        uchar utf8in;
        uchar xmlin;

        int charsetout;
        uchar charsetout_src; /* BIBL_SRC_PROG, BIBL_SRC_USER */
        uchar latexout;       /* If true, write Latex codes */
        uchar utf8out;        /* If true, write characters encoded by utf8 */
        uchar utf8bom;        /* If true, write utf8 byte-order-mark */
        uchar xmlout;         /* If true, write characters in XML entities */

        int format_opts; /* options for specific formats */
        int addcount;  /* add reference count to reference id */
        uchar output_raw;
        uchar verbose;
        uchar singlerefperfile;

        list asis;  /* Names that shouldn't be mangled */
        list corps; /* Names that shouldn't be mangled-MODS corporation type */

        char *progname;

} param;

Character set issues

Character set issues are particularly thorny, but are dealt with reasonably logically by the library. By default, all of these fields are defined based on the read and write formats and the average library user likely only needs to modify param.charsetin and param.charsetout.

param.charsetin, param.charsetout

This field contains the definition of the charset that should be used for reading/writing the references. Internally, all of the references are stored in UTF8-encoded Unicode. Currently charsets 0-81 are defined, with 66 (Latin-1/ISO8859-1) being defined as the default BIBL_CHARSET_DEFAULT. In addition, BIBL_CHARSET_UNICODE and BIBL_CHARSET_GB18030 (a Chinese character set) are also defined. Importantly, to write output in UTF8-encoded Unicode, param.charsetout must be BIBL_CHARSET_UNICODE and param.utf8out must be non-zero.

param.charsetin_src, param.charsetin_src

These fields record the information about where the charset information comes from. They are defined as BIBL_SRC_DEFAULT, set by the program default, BIBL_SRC_FILE, set by information read from the input file, and BIBL_SRC_USER, set by the user. The order of priority is the user's choice goes first, the information in the file is second, and the program default is used if no other information is available.

param.xmlin, param.xmlout

If these fields are non-zero, handle XML entities. The param.xmlin is a bit "stronger" than param.xmlout in that if param.xmlin is set, both the required XML entities and the larger set of entities defined for HTML are all processed. If param.xmlout is set, only required entities are converted. For any XML-formatted output (e.g. MODS, or Word 2007), having param.xmlout set is very important.

param.latexin, param.latexout

param.utf8in, param.utf8out

param.utf8bom

If this is set as non-zero, a UTF8 byte-order-mark will be written at the beginning of a file. Normally this should be used in conjunction with setting param.utf8out.

Format-specific options: param.format_opts

For several different formats, there are a variety of options that have been requested over the years. These options can be directly encoded by setting specific bytes in param.format_opts that are defined for the specific format. The BibTeX format has the most options, which are defined in the bibtexout.h header file.

Adding to param.asis and param.corps

void bibl_readasis( param *bibparams, char *filename );
void bibl_readcorps( param *bibparams, char *filename );

These functions read from the file "filename", one name per line.

void bibl_addtoasis( param *bibparams, char *entry );
void bibl_addtocorps( param *bibparams, char *entry );

These functions add an entry.

Freeing struct param

With the inclusion of the filename and the asis and corps name lists, params can potentially take up a fair amount of heap space. Clearing this space can be done with a call to bibl_freeparams().

void bibl_freeparams( param *bibparams );
  • bibparams - pointer to a struct param to be freed



3.0 struct bibl

The struct bibl will contain all of the bibliographies that are read from the file and the pointer to a specific instance is all that is needed to manage the bibliography. The internal details of struct bibl should be considered "private", meaning that they could change as Bibutils evolves (though in practice they tend not have). People actually interrogating the internal details of struct bibl should not complain if a new verion of Bibutils breaks their code.

The struct bibl must be initialized before it is used by any library code, via a call to bibl_init.

void bibl_init( bibl *bibliography );
  • bibliography - pointer to struct bibl to be initialized



4.0 Reading struct bibl bibliographies

Reading bibliography information into struct bibl is performed by the bibl_read() function call.

int bibl_read( bibl *bibliography, FILE *fp, char *filename, param *bibparams );
  • bibliography - pointer to struct bibl to be read in; must already have been initialized
  • fp - the file pointer for input; the library doesn't directly handle openingand closing files and expects the calling program to handle it
  • filename - the name of the file being read; used for error messages
  • bibparams - the pointer to the struct param containing appropriate information controlling the file reading process
  • return value - bibl_read returns a non-zero value if an error has occurred. This error can be decoded by bibl_reporterr().

Note that bibl_read() will append and not replace bibliography information to the end of current struct bibl, so that merging the information from multiple files is trivial. Merging information from multiple files with different formats is also trivial as the internal state of struct bibl is independent of the struct param; however, different struct params should be defined for each new format read.




5.0 Writing struct bibl bibliographies

Writing bibliography information uses the bibl_write call.

int bibl_write( bibl *bibliography, FILE *fp, param *bibparams );
  • bibliography - pointer to struct bibl to be written
  • fp - the file pointer for input; the library doesn't directly handle openingand closing files and expects the calling program to handle it
  • bibparams - the pointer to the struct param containing appropriate information controlling the file writing process
  • return value - bibl_read returns a non-zero value if an error has occurred. This error can be decoded by bibl_reporterr().



6.0 Errors returned by the library

void bibl_reporterr( int err );



7.0 Conversion from Bibutils 3.x library to Bibutils 4.x library

Conversion for programs using the previous version of the library is fairly simple, almost all of it involves deleting code now made unnecessary. Compare the following two example code fragments, changed code is highlighted in red.:

/* Version 3.x of bibutils */
#include <stdio.h>
#include "list.h"
#include "bibutils.h"

char progname[] = "program name";
list asis, corps;

int main( int argc, char *argv[] )
{
	bibl bibliography;
	param bibparams;
	int err;

	list_init( &asis );
	list_init( &corps );

	/* Initialize parameters */
	bibl_initparams( &bibparams, BIBL_RISIN, 
			BIBL_BIBTEXOUT );

	/* Read/Write bibliography */
	bibl_init( &biblography );
	err = bibl_read( &bibliography, stdin, "stdin", 
			BIBL_RISIN, &bibparams );
	if ( err ) bibl_reporterr( err );
	else {
		err = bibl_write( &bibliography, stdout, 
				BIBL_BIBTEXOUT, &bibparams );
		if ( err ) bibl_reporterr( err );
	}
	bibl_free( &bibliography );

}
/* Version 4.0  of Bibutils*/
#include <stdio.h>

#include "bibutils.h"




int main( int argc, char *argv[] )
{
	bibl bibliography;
	param bibparams;
	int err;




	/* Initialize parameters */
	bibl_initparams( &bibparams, BIBL_RISIN, 
			BIBL_BIBTEXOUT, "program name" );

	/* Read/Write bibliography */
	bibl_init( &biblography );
	err = bibl_read( &bibliography, stdin, "stdin", 
			&bibparams );
	if ( err ) bibl_reporterr( err );
	else {
		err = bibl_write( &bibliography, stdout, 
				&bibparams );
		if ( err ) bibl_reporterr( err );
	}
	bibl_free( &bibliography );
	bibl_freeparams( &bibparams );
}