?-Prolog - A Prolog->C Compiler

Version 7

Introduction

This is "?-Prolog", a compiler for the Prolog programming language. ?-Prolog compiles a single Prolog source file into C, which can be subsequently compiled with a C compiler to an executable. The generated code is platform-independent and should be portable to a wide range of machines and operating systems. No other dependencies exist, only a gcc-compatible C compiler is required. The generated executables are usually small and can be embedded in existing applications.

This Prolog implementation has no support for dynamic predicates, or for calling user-constructed goals - all code must be statically visible. An interactive interpreter is available and can be used as a library in compiled Prolog code, but interpreted predicates will not be visible to compiled ones. Loading compiled code at runtime is not possible and manipulating the clause-database with assert/retract will have no effect on a compiled program.

Performance is reasonable, in micro-benchmarks it is roughly comparable to systems like SWI-Prolog, but will not be able to compete with such mature Prolog implementations for larger, real-world programs. Basic clause-indexing on the first argument of a predicates is performed and tail-calls are optimized in many cases, but otherwise very little optimization is currently being done.

The dialect of Prolog implemented should be sufficient for usual Prolog programming. Basic ISO and Edinburgh system predicates and control structures are available. Modules, separate compilation or dynamic loading of compiled code are not provided, and multithreading is not supported. This Prolog implementation does not attempt to be ISO compliant, event though many standard predicates are available. Coroutining is implemented (using freeze/2 and dif/2) and the handling infinite trees is supported in many cases, but probably not all.

?-Prolog has currently only been tested on Linux (x86, x86-64 and ARM), Mac OS X and Windows 7 (with the mingw32 and mingw64 toolchains.)

The system is not further developed and contains known and unknown bugs. If you experience problems or want to help, please don't hesitate to contact me.

Building and installation

Assuming you obtained the sources from the distribution archive, which can be found at here, just unpack the tarball and follow the instructions given below. There is also a source-code repository on Bitbucket, but this uses a custom build-environment and contains mostly tests and supporting code.

The compiler itself is written in ?-Prolog and must be pre-compiled to C before it can be used. In the distribution archive that contains the file your are currently reading, pre-compiled code for the compiler and interpreter are included, so all that remains to be done is to compile these using GNU make(1):

make

A C compiler with support for C99, including some GNU extensions is required, both gcc and clang have been successfully tested. You can edit the variables at the top of the makefile to choose a different compiler or different compilation options. If you don't have GNU make, then compile the files manually, see Makefile for the proper options you have to pass to the compiler.

Compilation may take quite some time, particularly with clang.

Several builtin predicates are actually implemented in Prolog and reside in separate library files in the "lib" directory. You can install the compiler and interpreter-binaries wherever you like, but to find the libraries the compiler must be able to locate the library directory. This can be done by setting the environment variable PC_LIBRARY_DIR to the name of a directory to be searched.

There is an "install" target in the makefile, which will copy all parts of ?-Prolog in your system. To do so enter

make PREFIX=<path> install

PREFIX defaults to /usr/local. After this set PC_LIBRARY_DIR to <PREFIX>/qp.

Usage

Simply invoke the compiler with the source file to be compiled:

pc <FILENAME>

The compiler understands the following command-line options:

-version                show version and exit.
-h                      show some usage information.
-v                      show clauses as they are compiled.
-i                      show intermediate code produced instead 
                        of generating a C file.
-o FILENAME             specify output-file name (defaults to
                        the name of the source file, with the
                        file-extension replaced by ".c")
-q                      do not show any output while compiling.
-compress-facts         reduce code produced for facts (see below)
-xref                   write cross-referencing information to
                        stdout
-xrefall                like `-xref`, but considers all library code
                        as well and also outputs a call-graph
-n                      ignore PC_LIBRARY_DIR

Any output preceeded with "%" is purely for informational purposes and can be disabled with the "-q" option.

Code compiled must then be passed to a C compiler:

gcc -std=gnu99 -I. <program>.c -o <program> -lm

Platform-specific issues:

      % cat hello
      #!/usr/bin/env qp
      main :- display('Hello, World!'), nl.
      % chmod +x hello
      % ./hello
      Hello, World!

Running "qp" without arguments just starts the interpreter ("pi"). Enter "qp -help" to see a list of options.

Interpreter

The interpreter can be run interactively by simply starting it, or can run and execute a Prolog source file, given on the command-line:

pi <FILENAME>

Any file listed on the command line is "consulted" (its clauses are loaded) and finally the interactive prompt will be displayed, and allows you to enter queries.

Given the command-line option "-t", execution starts in "trace" mode, which shows how predicates are called, exit, fail or re-execute during backtracking. The interpreter has all library-predicates included and does not need to have access to the library directory.

The option "-i " sets as the default initialization goal to be executed, if no initialization-goal was declared in the source file.

Files consulted or included are located relative to the current working directory. File-references of the form library(NAME) are converted into a path relative to the current library directory, which is either ./lib or the value of the environment variable "PC_LIBRARY_DIR", if it is defined.

In contrast to the compiler, the clause-database represents the program that is currently executing and manipulations of the database using arsserta/assertz will influence running code. The effect of asserting or retracting clauses for predicates that currently execute is undefined.

Note that the interpreter is a simple meta-circular evaluator and is thus rather slow.

Debugging and error messages

There are no debugging-facilities besides a "trace" mode, which writes information about predicate-calls, -exits, -fails and redo's during execution to stderr. You can enable this trace mode by running the binary with the "-:d" option to produce a trace on stderr as the program is executing.

The compiler does very little error-reporting, mostly to simplify the implementation. Syntactical errors will produce a parser error, with some rough indication about the place where the problem occurred. Runtime-errors will throw an exception, which can be caught by user code using catch/3.

Support syntax

?-Prolog fully supports the ISO Prolog syntax. Naked variables in goals are not automatically converted to call(VAR), and will result in a compiler error if used.

DCG rules, including semicontext notation, are fully supported.

Hexadecimal numeric syntax ("0x...") is allowed, and quoted atoms and strings may contain the escape sequences "\xXX", "\n", "\r" and "\t".

Builtin predicates

All predicates listed here are either built-in or provided in "auto-included" libraries in the current library directory.

Precedence of builtin operators

| Name             | Associativity | Precedence |
|------------------+---------------+------------|
| *                | yfx           |        400 |
| **               | xfx           |        200 |
| +                | fy            |        200 |
| +                | yfx           |        500 |
| ,                | xfy           |       1000 |
| -                | fy            |        200 |
| -                | yfx           |        500 |
| -->              | xfx           |       1200 |
| ->               | xfy           |       1050 |
| /                | yfx           |        400 |
| //               | yfx           |        400 |
| /\               | yfx           |        500 |
| :-               | fx            |       1200 |
| :-               | xfx           |       1200 |
| ;                | xfy           |       1100 |
| <                | xfx           |        700 |
| <<               | yfx           |        400 |
| =                | xfx           |        700 |
| =..              | xfx           |        700 |
| =:=              | xfx           |        700 |
| =<               | xfx           |        700 |
| ==               | xfx           |        700 |
| =\=              | xfx           |        700 |
| >                | xfx           |        700 |
| >=               | xfx           |        700 |
| >>               | yfx           |        400 |
| @<               | xfx           |        700 |
| @=<              | xfx           |        700 |
| @>               | xfx           |        700 |
| @>=              | xfx           |        700 |
| \                | fy            |        200 |
| \+               | fy            |        900 |
| \/               | yfx           |        500 |
| \=               | xfx           |        700 |
| \==              | xfx           |        700 |
| \\               | xfy           |        400 |
| ^                | xfy           |        200 |
| initialization   | fx            |       1150 |
| is               | xfx           |        700 |
| xor              | yfx           |        400 |
| |                | xfy           |       1105 |

Directives

The following directives are available and can be declared using the usual ":- DIRECTIVE, ... ." syntax. Note that some of these are available only in compiled code, some only in interpreted code and some in both (directives marked with "*" are ignored by the interpreter):

   compress_facts                           [compiler, interpreter*]

Equivalent to the -compress-facts compiler option.

   trace_libraries                          [compiler, interpreter*]

Enable tracing for all code, including auto-included libraries. Normally only the code of the user program is traced.

   global_variable(NAME)                    [compiler]

Declares NAME as a global variable, accessible with global_set/2 and global_ref/2.

   verbatim(CODE)                           [compiler]

Include CODE directly in the produced C file. All code declared in "verbatim" directives will be concatenated and emitted at the start of the generated code, directly after the inclusion of "pc.h".

   pre_initialization(GOAL)                 [compiler]

Declares GAOL to be a pre-initialization goal. This is similar to the "initialization" directive, but pre-initialization goals are run before all normal initialization goals.

   determinate PI, ...                      [compiler, interpreter*]

Declares the predicates specified in PI as being determinate, which gives more opportunities for tail-call optimization. PI should be of the form "/" or "".

   discontiguous PI, ...                    [compiler]

Allows the clauses of the predicates indicated by PI to be discontiguous in the source program [ISO]

   mode PREDICATE(MODE, ...), ...           [compiler, interpreter*]

Declares that one or more predicates are always called with certain types of arguments. MODE should be one of the atoms +, - or ? and indicates whether the argument will always be instantiated, non-instantiated or of unknown type. This allows the compiler to produce somewhat more efficient code in certain situations.

   include(FILENAME)                        [compiler, interpreter]

Include the contents of FILENAME at the current position in the source-code that contains this clause. If the file exists and has the extension ".pl", then the extension may be omitted in the directive. FILENAME may be of the form "library(ATOM)" [ISO]

   initialization GOAL                      [compiler, interpreter]

Declares GOAL to be an initialization goal that will be invoked when a compiled program is started or when a file given to the interpreter on the command-line has been consulted. Several initialization goals may be declared and are invoked in the order in which they appear in the source code [ISO]

In compiled code, the default initialization goal is main/0.

  op(P, A, N)                               [compiler, interpreter]

Declares an operator - identical to op/3, but done before reading any further clauses [ISO]

  ensure_loaded(FILENAME)                   [compiler, interpreter]

Similar to "include", but only includes FILENAME if it has not been earlier included using "ensure_loaded" or "include" (or "consult" in the interpreter.) FILENAME may be of the form "library(ATOM)" [ISO]

Performance tips

Compiling the generated C code with "-DNDEBUG" will disable various internal sanity checks and assertions, and will disable tracing.

Compile the generated C code with "-DUNSAFE" to disable all safety checks (implies "-DNDEBUG".)

Compiling generated C code with "-DNO_CHECK_CYCLES" drops cycle-checks in builtin operations (this does not affect acyclic_term/1).

Asserting and retrieving terms in the record- or clause-database involves copying and thus should be avoided in time-critical code.

Any use of delay/2, freeze/2 or dif/2 will have a detrimental impact on overall performance.

As large sets of facts can increase Code-size and C compile-times considerably, a mode is provided that reduces the amount of code generated for sets of body-less clauses, provided all arguments in the definition are ground. This has a negative impact on overall speed but scales better in programs that use large fact databases. To use this feature, compile your program with the "-compress-facts" option.

For large programs it may be necessary to increase the sizes of one or more data areas in the compiler (which is itself just another Prolog program.) In such a case, use the appropriate runtime options to increase the default sizes, as described in the "Limitations" section.

You can use the built-in statistical profiler to examine the time spent in the predicates of your program by compiling the generated C code with the "-DPROFILE" C compiler option. On successful termination, a file named "PROFILE." is generated in the current working directory and holds a list of predicates, with a count-number, the number of seconds spent in the predicate and the percentage of the total run-time.

The profiler works by running a separate thread that probes at regular intervals which predicate is currently executing and increases a predicate-specific counter. Before the program terminates, the total running time is divided by the total number of counts and the predicate-specific counts are then correlated with the overall time. This is not completely accurate but should be good enough to get an idea of the relative distribution of run-time required by specific parts of your program. Note that the reported time only specifies the time in the predicate alone, not in called predicates, so the time does not include nested calls.

To use performance-profiling, the executable must be linked with the "pthread" library (pass "-lpthread" when compiling the generated C code.)

There is also an experimental memory-profiler, which gathers some information about heap-allocations and the sizes of various stacks. When you compile your program with "-DPROFILE_MEMORY", a file is generated (similar to using "-DPROFILE") containing entries of the form

<predicate name>   C: <cp>   T: <trail>  E: <env>  H: <heap>

The meaning of the values is as follows:

Calling code written in other languages

To call foreign functions, a single form named foreign_call is provided. foreign_call/1 takes a structure that names the function to be called and holds the arguments to be passed. The function should take as many arguments as the structure (plus 1) and return an integer, indicating whether the foreign call failed (0) or succeeded (1). The first argument is a CHOICE_POINT *, a pointer to current entry on the choice-point stack. Further arguments are of type X, a generic union type which is either an immediate small integer ("fixnum") or a pointer to a non-immediate data object. See pc.h in the distributed source for more information about the internal data layout. All arguments are dereferenced on entry (that is, veriables are resolved as much as possible, but not in nested data).

C code can be embedded in the compiled program using the verbatim directive. The text given will be emitted directly in the generated code, after the inclusion of the "pc.h" header file.

Values used from C will only be valid until a garbage collection is triggered, which subsequently moves all live data. Therefor references to Prolog data will become invalid, unless saved in the "global" variables vector", a separate buffer that may be used to keep "live" references to heap-allocated data (only usable in code compiled with -DEMBEDDED, see below for more information.)

Note that the garbage collector ignores data that is not located in the garbage-collected heap, so you can create Prolog data structures in native code and pass them to Prolog without problems. You can use duplicate_term/2 to copy static data from native code into dynamically allocated data located in the heap (but you must of course free the old static data manually).

To get you started, here is a very simple example: let's add an interface to the gethostname(2) UNIX API function.


% gethostname.pl - implements Prolog-side 

:- verbatim('#include "mystuff.c").   % include header

% wrapper predicate
gethostname(N) :- foreign_call(my_gethostname(N1)).

/* mystuff.c - implements C-side */

#include <unistd.h>
#include <errno.h>

PRIMITIVE(my_gethostname, X name) {
  char buffer[ 257 ];

  if(gethostname(buffer, 256) < 0)
    system_error(strerror(errno));  % throws exception

  /* unifies value with output argument */
  return unify(CSYMBOL(buffer), name);
}

% test-hostname.pl

:- include('gethostname.pl').   % include interface

main :- gethostname(S), display(S), nl.  % and use it

Notes:

Automatic generation of wrapper-code

Also included in this Prolog implementation is "pb", a generator for C and Prolog files intended to simplify the process of interfacing to foreign code. It is called like this:

pb <FILENAME>

should contain C variable-definitions and function prototypes, from which "pb" will generate all code required to use these in your own programs. Not all of C is allowed, but a simple subset that should be sufficient for many uses. Two files will be produced, ".h" containing C code and ".pl" containing Prolog code that uses the former.

Comments are ignored, both C style comments ("/* ... */") and C++/C99 style ("// ..."). Lines starting with "#" will be copied unchanged into the ".h" file (including "... \" continuation lines). All other text should be hold valid variable- or function-definitions. You can re-use the binding-definitions file for declaring function prototypes by passing the "-i" option to "pb", which will add code to include "" to the generated ".h" file.

Variable definitions may be of the form

[STORAGECLASS] TYPE IDENTIFIER ["[" [SIZE] "]" ...] "," ...

(Elements in brackets are optional)

Function definitions should look like this:

[STORAGECLASS] [MODE] TYPE IDENTIFIER "(" [ARGUMENT "," ...] ")"

where ARGUMENT is

[DIRECTION] TYPE [IDENTIFIER ["[" [SIZE] "]" ...]]

Definitions may be followed by an option semicolon (";").

STORAGECLASS is optional and may be either "static" or "extern". SIZE should be an integer, if given. IDENTIFIER should be a legal C name and may be followed by /* => REALNAME */, which will result in REALNAME being chosen for the Prolog wrapper.

If MODE is given and of the form "/* success /", then the result of the C function will be interpreted as a boolean indicating success or failure (and not as an additional "output" argument). "/ fail /" means the opposite: a non-null results indicates failure. "/ fail: VARIABLE */" is the same as "fail", but sets the global variable VARIABLE to the result (use another definition to access this variable.)

TYPE is a basic type or opaque type, optionally decorated with one or more "const", "unsigned" or "*" qualifiers. Basic types are

void
char
short
int
long
float
double

An opaque type will be stored in as a byte-sequence in an atom and extracted from such an atom when passed as an argument.

For every variable-definition, a Prolog predicate with one argument is created for setting and getting the variables value, so a variable definition like

int foo;
const double xpos, ypos;

will result in the predicates

foo(-X)
set_foo(+X)
xpos(-X)
ypos(-X)

For functions, a predicate taking as many arguments as the C function, plus an additional argument for the result (if any) is created:

int bar(int, char *);

results in:

bar(+X, +Y, -Z)

A function with return-type "void" will have no final result argument.

Argument and result conversion is as follows:

C type                  <->     Prolog type
--------------------------------------------------------------
int, short, char, long          integer
float, double                   float
char *                          atom (zero-terminated)
<any> *                         foreign-pointer
<opaque>                        atom
X                               <any>

Arguments of type "char *" and "X" are only valid until the next garbage collection takes place, that is, whenever Prolog code executes. The integer "0" represents a NULL pointer, both for arguments and results of type "char *" and any other pointer type.

Finally, C "output" arguments are supported by preceding the argument with a direction specifier, either /* in */ or /* out */, so

double modf /* => int_and_frac */ (double x, /* out */ double *intpart);

will produce:

int_and_frac(+X, -Y, -Z)

where Y will be unified with the "intpart" result and Z with "modf"'s function result.

Embedding

If you compile the C code with the "-DEMBEDDED" option, then the program will not contain a "main()" function and can be called as a library. The entry-point will have the following prototype:

extern X prolog(int argc, char *argv[], X arg, int *exit_code);

The first two arguments pass the command-line (even in embedded mode), and should contain at least a single entry (what is normally the program name.) You can pass command-line options, including runtime options, here as usual. The compiled Prolog code will start like any other non-embedded program, until suspend/2 is invoked, which passes its first argument to the caller of the "prolog()" function (the result). Any re-invocation of "prolog()" will continue the execution of the Prolog code at the state where the previous execution was suspended, passing on the "arg" value to be unified with the second argument of suspend/2.

All invocations of prolog() after the first must pass 0 and NULL as values for argc and argv.

Should the program terminate, exit_code will be set to the exit status of the program, depending on how the program terminated execution:

Reason                                      Exit code
-----------------------------------------------------------
halt/0                                      0
halt/1                                      <code>
Runtime-error / uncaught exception          EXIT_EXCEPTION
Normal termination                          EXIT_SUCCESS
Initialization goal(s) failed               EXIT_FAILURE

A second entry-point called "prolog_variable" is also available to allow access to entries in the "global variable vector" (mentioned above):

extern X *prolog_variable(int index);

This returns a pointer to a slot in the global variables vector, which can be set and retrieved. Values in this vector undergo garbage collection and change their address after every invocation of the Prolog side, but will not be collected and retain their identity.

The name of the prolog() entry-point can be changed by redefining the ENTRY_POINT_NAME macro when compiling the C code of an embedded program. The name of the "prolog_variable()" access function can be changes by redefining VARIABLE_ACCESS_NAME.

The compiled Prolog code is not reentrant or thread-safe.

Limitations

call/1 and its variants are not available in the compiler, as are setof/3 and bagof/3. Goals passed to catch/2, findall/3, forall/2 and '\+'/1 must be directly callable.

Integer numbers are limited to signed integers in a range of 31 bits on 32-bit platforms and 63 bits on 64-bit platforms.

UNICODE is not supported, the system uses 8-bit character codes, dependent on the default character set of the environment.

The system is fairly minimal but robust enough to execute many Prolog programs. The compiler does provide very few error diagnostics, mainly to keep it simple and straightforward to extend.

The garbage collector used is a simple Cheney-style[6] copying collector and all storage for data-structures constructed at run-time is allocated in this GC'd heap. Execution will stop when GC occurs - to keep this interruption short it is advisable to reduce the amount of memory used.

Depending on the nature of the executing program, memory-requirements may vary accordingly. Currently, the system is not overly memory-efficient with respect to choice-points, environments and the trail-stack. Green cuts may be often necessary to increase the frequency of tail-calls and avoid piling up too many choice points.

Terms stored in the record- or clause-database are copied into normal memory (allocated with malloc(3)) and are not subject to GC. This means that the amount of data stored in databases is only limited by available memory. Terms retrieved from the databases will be copied back into the garbage-collected heap.

The data-areas used at run-time are listed here, together with their default sizes. You can override the settings by redefining a macro when compiling generated C code or by passing the appropriate run-time command-line option to the final compiled executable.

Meaning             Macro                     R/T option      Default
----------------------------------------------------------------------
Heap size           HEAP_SIZE                 -:h<SIZE>    100,000,000
Heap reserve        HEAP_RESERVE                                    20
Trail stack         TRAIL_STACK_SIZE          -:T<SIZE>      1,000,000
Choice-point stack  CHOICE_POINT_STACK_SIZE   -:C<SIZE>     10,000,000
Environment-stack   ENVIRONMENT_STACK_SIZE    -:E<SIZE>     10,000,000
Argument-stack      ARGUMENT_STACK_SIZE       -:A<SIZE>     10,000,000
Shared-term table   SHARED_TERM_TABLE_SIZE    -:S<SIZE>        100,000

All sizes are in bytes, with the exception of "heap reserve", which is a percentage of the total heap-size, which is kept free to simplify heap-exhaustion checks: once the amount of storage currently in used exceeds the heap-size minus this percentage, a garbage collection is triggered.

If you need a very large heap, you can use a file-backed memory-mapped heap by starting a compiled program with the "-:m" runtime option. This creates a file with the given name and uses it as a backing store for the complete garbage-collected heap. This file can be of arbitrary size. Note that the necessary swapping in and out of memory pages to and from the file can be quite slow. This is an experimental feature and hasn't been tested much.

Version history

License

Copyright (c) 2015, Felix L. Winkelmann All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the authors may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This code uses a certain amount of code from the freely available DEC10 Prolog library:

These files are all in the "public domain" so you can use them freely, copy them, incorporate them into programs of your own and so forth without payment. The work of producing them in the first place and of organising them as detailed here has been funded over the years at Edinburgh University mainly by the Science and Engineering Research Council. Their dissemination has been encouraged by the Alvey Special Interest Group: Artificial Intelligence. We would appreciate it if you were to acknowledge these bodies when you use or re-distribute any of these files.