8. Custom Host Applications
Spicy provides a C++ API for integrating its parsers into custom host applications. There are two different approaches to doing this:
If you want to integrate just one specific kind of parser, Spicy can generate C++ prototypes for it that facilitate feeding data and accessing parsing results.
If you want to write a generic host application that can support arbitrary parsers, Spicy provides a dynamic runtime introspection API for dynamically instantiating parsers and accessing results.
We discuss both approaches in the following.
Note
Internally, Spicy is a layer on top of an intermediary framework called HILTI. It is the HILTI runtime library that implements most of the functionality which we’ll look at in this section, so you’ll see quite a bit of HILTI-side functionality. Spicy comes with a small additional runtime library of its own that adds anythings that’s specific to the parsers it generates.
Note
The API for host applications isn’t considered stable at this time and specifics may change in future versions of HILTI/Spicy without any migration/deprecation process.
8.1. Integrating a Specific Parser
We’ll use our simple HTTP example from the Getting Started section as a running example for a parser we want to leverage from a C++ application.
module MyHTTP;
const Token = /[^ \t\r\n]+/;
const WhiteSpace = /[ \t]+/;
const NewLine = /\r?\n/;
type Version = unit {
: /HTTP\//;
number: /[0-9]+\.[0-9]+/;
};
public type RequestLine = unit {
method: Token;
: WhiteSpace;
uri: Token;
: WhiteSpace;
version: Version;
: NewLine;
on %done {
print self.method, self.uri, self.version.number;
}
};
First, we’ll use spicyc to generate a C++ parser from the Spicy source code:
# spicyc -x my_http my_http.spicy
The option -x
(aka --output-c++-files
) tells spicyc
that
we want it to generate C++ code for external compilation, rather than
directly turning the Spicy module into executable code. This generates
two C++ files that have their names prefixed with my_http_
:
# ls my_http_*.cc
my_http___linker__.cc my_http_MyHTTP.cc
We don’t need to worry further what’s in these files.
Next, spicyc
can generate C++ prototypes for us that declare (1) a
set of parsing functions for feeding input into our parser, and (2) a
struct
type providing access to the parsed fields. That’s done
through option -P
(aka --output-prototypes
):
# spicyc -P my_http my_http.spicy -o my_http.h
That’ll leave the prototypes in my_http.h
. The content of that
generated header file tends to be a bit convoluted because it
(necessarily) also contains a bunch of Spicy internals. But stripped
down to the interesting parts, it looks like this for our example:
[...]
namespace hlt_my_http::MyHTTP {
struct RequestLine : ::hilti::rt::trait::isStruct, ::hilti::rt::Controllable<RequestLine> {
std::optional<::hilti::rt::Bytes> method{};
std::optional<::hilti::rt::Bytes> uri{};
std::optional<::hilti::rt::ValueReference<Version>> version{};
[...]
};
struct Version : ::hilti::rt::trait::isStruct, ::hilti::rt::Controllable<Version> {
std::optional<::hilti::rt::Bytes> number{};
[...]
};
[...]
extern auto parse1(::hilti::rt::ValueReference<::hilti::rt::Stream>& data, const std::optional<::hilti::rt::stream::View>& cur, const std::optional<::spicy::rt::UnitContext>& context) -> ::hilti::rt::Resumable;
extern auto parse2(::hilti::rt::ValueReference<__hlt_my_http::MyHTTP::RequestLine>& unit, ::hilti::rt::ValueReference<::hilti::rt::Stream>& data, const std::optional<::hilti::rt::stream::View>& cur, const std::optional<::spicy::rt::UnitContext>& context) -> ::hilti::rt::Resumable;
extern auto parse3(::hilti::rt::ValueReference<::spicy::rt::ParsedUnit>& gunit, ::hilti::rt::ValueReference<::hilti::rt::Stream>& data, const std::optional<::hilti::rt::stream::View>& cur, const std::optional<::spicy::rt::UnitContext>& context) -> ::hilti::rt::Resumable;
}
[...]
Todo
The struct
declarations should move into the public
namespace.
You can see the struct
definitions corresponding to the two unit
types, as well as a set of parsing functions with three different
signatures:
parse1
The simplest form of parsing function receives a stream of input data, along with an optional view into the stream to limit the region to parse if desired and an optional context.
parse1
will internally instantiate an instance of the unit’sstruct
, and then feed the unit’s parser with the data stream. However, it won’t provide access to what’s being parsed as it doesn’t pass back thestruct
.parse2
The second form takes a pre-instantiated instance of the unit’s
struct
type, which parsing will fill out. Once parsing finishes, results can be accessed by inspecting thestruct
fields.parse3
The third form takes a pre-instantiated instance of a generic, type-erased unit type that the parsing will fill out. Accessing the data requires use of HILTI’s reflection API, which we will discuss in Supporting Arbitrary Parsers.
Spicy puts all these declarations into a namespace hlt_PREFIX
,
where PREFIX
is the argument we specified to -P
. (If you leave
the PREFIX
empty (spicyc -P ''
), you get a namespace of just
hlt::*
.)
Let’s start by using parse1()
:
#include <iostream>
#include <hilti/rt/libhilti.h>
#include <spicy/rt/libspicy.h>
#include "my_http.h"
int main(int argc, char** argv) {
assert(argc == 2);
// Initialize runtime library.
hilti::rt::init();
// Create stream with $1 as data.
auto stream = hilti::rt::reference::make_value<hilti::rt::Stream>(argv[1]);
stream->freeze();
// Feed data.
hlt_my_http::MyHTTP::RequestLine::parse1(stream, {}, {});
// Wrap up runtime library.
hilti::rt::done();
return 0;
}
This code first instantiates a stream from data giving on the command
line. It freezes the stream to indicate that no further data will
arrive later. Then it sends the stream into the parse1()
function
for processing.
We can now use the standard C++ compiler to build all this into an
executable, leveraging spicy-config
to add the necessary flags
for finding includes and libraries:
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
# ./my_http $'GET index.html HTTP/1.0\n'
GET, /index.html, 1.0
The output comes from the execution of the print
statement inside
the Spicy grammar, demonstrating that the parsing proceeded as
expected.
Note
Above, when building the executable, we used clang++
assuming
that that’s the C++ compiler in use on the system. Generally, you
need to use the same compiler here as the one that Spicy itself
got build with, to ensure that libraries and C++ ABI match. To
ensure that you’re using the the right compiler (e.g., if there
are multiple on the system, or if it’s not in PATH
),
spicy-config can print out the full path to the expected
one through its --cxx
option. You can even put that directly
into the build command line:
# $(spicy-config --cxx --cxxflags --ldflags) -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc
When using parse1()
we don’t get access to the parsed information.
If we want that, we can use parse2()
instead and provide it with a
struct
to fill in:
#include <iostream>
#include <hilti/rt/libhilti.h>
#include <spicy/rt/libspicy.h>
#include "my_http.h"
int main(int argc, char** argv) {
assert(argc == 2);
// Initialize runtime libraries.
hilti::rt::init();
spicy::rt::init();
// Create stream with $1 as data.
auto stream = hilti::rt::reference::make_value<hilti::rt::Stream>(argv[1]);
stream->freeze();
// Instantiate unit.
auto request = hilti::rt::reference::make_value<__hlt_my_http::MyHTTP::RequestLine>();
// Feed data.
hlt_my_http::MyHTTP::RequestLine::parse2(request, stream, {}, {});
// Access fields.
std::cout << "method : " << *request->method << std::endl;
std::cout << "uri : " << *request->uri << std::endl;
std::cout << "version: " << *(*request->version)->number << std::endl;
// Wrap up runtime libraries.
spicy::rt::done();
hilti::rt::done();
return 0;
}
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
# ./my_http $'GET index.html HTTP/1.0\n'
GET, /index.html, 1.0
method : GET
uri : /index.html
version: 1.0
Another approach to retrieving field values goes through Spicy hooks
calling back into the host application. That’s how Zeek’s Spicy support
operates. Let’s say we want to execute a custom C++ function every
time a RequestList
has been parsed. By adding the following code
to my_http.spicy
, we (1) declare that function on the Spicy-side,
and (2) implement a Spicy hook that calls it:
public function got_request_line(method: bytes, uri: bytes, version_number: bytes) : void &cxxname="got_request_line";
on RequestLine::%done {
got_request_line(self.method, self.uri, self.version.number);
}
The &cxxname
attribute for got_request_line
indicates to Spicy
that this is a function implemented externally inside custom C++ code,
accessible through the given name. Now we need to implement that
function:
#include <iostream>
#include <hilti/rt/libhilti.h>
#include <spicy/rt/libspicy.h>
void got_request_line(const hilti::rt::Bytes& method, const hilti::rt::Bytes& uri, const hilti::rt::Bytes& version_number) {
std::cout << "In C++ land: " << method << ", " << uri << ", " << version_number << std::endl;
}
Finally, we compile it altogether like before, but now including our additional custom C++ file:
# spicyc -x my_http my_http.spicy
# spicyc -P my_http my_http.spicy -o my_http.h
# clang++ -o my_http my_http-callback.cc my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
# ./my_http $'GET index.html HTTP/1.0\n'
In C++ land: GET, index.html, 1.0
GET, index.html, 1.0
Note that the C++ function signature needs to match what Spicy
expects, based on the Spicy-side prototype. If you are unsure how
Spicy arguments translate into C++ arguments, look at the C++
prototype that’s included for the callback function in the output of
-P
.
8.2. Supporting Arbitrary Parsers
This approach is more complex, and we’ll just briefly describe the
main pieces here. All of the tools coming with Spicy support arbitrary
parsers and can serve as further examples (e.g., spicy-driver,
spicy-dump, Zeek Integration). Indeed, they all
build on the same C++ library class spicy::rt::Driver
that
provides a higher-level API to working with Spicy’s parsers in a
generic fashion. We’ll do the same in the following.
8.2.1. Retrieving Available Parsers
The first challenge for a generic host application is that it cannot
know what parsers are even available. Spicy’s runtime library provides
an API to get a list of all parsers that are compiled into the current
process. Continuing to use the my_http.spicy
example, this code
prints out our one available parser:
#include <iostream>
#include <hilti/rt/libhilti.h>
#include <spicy/rt/libspicy.h>
int main(int argc, char** argv) {
assert(argc == 2);
// Initialize runtime libraries.
hilti::rt::init();
spicy::rt::init();
// Instantiate driver providing higher level parsing API.
spicy::rt::Driver driver;
// Print out available parsers.
driver.listParsers(std::cout);
// Retrieve meta object describing parser.
spicy::rt::done();
hilti::rt::done();
return 0;
}
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
# ./my_http
Available parsers:
MyHTTP::RequestLine
Using the name of the parser (MyHTTP::RequestLine
) we can
instantiate it from C++, and then feed it data:
// Retrieve meta object describing parser.
auto parser = driver.lookupParser("MyHTTP::RequestLine");
assert(parser);
// Fill string stream with $1 as data to parse.
std::stringstream data(argv[1]);
// Feed data.
auto unit = driver.processInput(**parser, data);
assert(unit);
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
# ./my_http $'GET index.html HTTP/1.0\n'
GET, /index.html, 1.0
That’s the output of the print
statement once more.
unit
is of type spicy::rt::ParsedUnit
, which is a type-erased
class holding, in this case, an instance of
_hlt::MyHTTP::RequestLine
. Internally, that instance went through
the parse3()
function that we have encountered in the previous
section. To access the parsed fields, there’s a visitor API to iterate
generically over HILTI types like this unit:
void print(const hilti::rt::type_info::Value& v) {
const auto& type = v.type();
switch ( type.tag ) {
case hilti::rt::TypeInfo::Bytes: std::cout << type.bytes->get(v); break;
case hilti::rt::TypeInfo::ValueReference: print(type.value_reference->value(v)); break;
case hilti::rt::TypeInfo::Struct:
for ( const auto& [f, y] : type.struct_->iterate(v) ) {
std::cout << f.name << ": ";
print(y);
std::cout << std::endl;
}
break;
default: assert(false);
}
}
Adding print(unit->value())
after the call to processInput()
then gives us this output:
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
# ./my_http $'GET index.html HTTP/1.0\n'
GET, /index.html, 1.0
method: GET
uri: /index.html
version: number: 1.0
Our visitor code implements just what we need for our example. The
source code of spicy-dump
shows a full implementation covering all
available types.
So far we have compiled the Spicy parsers statically into the
generated executable. The runtime API supports loading them
dynamically as well from pre-compiled HLTO
files through the class
hilti::rt::Library
. Here’s the full example leveraging that,
taking the file to load from the command line:
#include <iostream>
#include <hilti/rt/libhilti.h>
#include <spicy/rt/libspicy.h>
void print(const hilti::rt::type_info::Value& v) {
const auto& type = v.type();
switch ( type.tag ) {
case hilti::rt::TypeInfo::Bytes: std::cout << type.bytes->get(v); break;
case hilti::rt::TypeInfo::ValueReference: print(type.value_reference->value(v)); break;
case hilti::rt::TypeInfo::Struct:
for ( const auto& [f, y] : type.struct_->iterate(v) ) {
std::cout << f.name << ": ";
print(y);
std::cout << std::endl;
}
break;
default: assert(false);
}
}
int main(int argc, char** argv) {
// Usage now: "my-driver <hlto> <name-of-parser> <data>"
assert(argc == 4);
// Load pre-compiled parser. This must come before initializing the
// runtime libraries.
auto library = hilti::rt::Library(argv[1]);
auto rc = library.open();
assert(rc);
// Initialize runtime libraries.
hilti::rt::init();
spicy::rt::init();
// Instantiate driver providing higher level parsing API.
spicy::rt::Driver driver;
// Print out available parsers.
driver.listParsers(std::cout);
// Retrieve meta object describing parser.
auto parser = driver.lookupParser(argv[2]);
assert(parser);
// Fill string stream with $1 as data to parse.
std::stringstream data(argv[3]);
// Feed data.
auto unit = driver.processInput(**parser, data);
assert(unit);
// Print out content of parsed unit.
print(unit->value());
// Wrap up runtime libraries.
spicy::rt::done();
hilti::rt::done();
return 0;
}
# clang++ -o my-driver my-driver.cc $(spicy-config --cxxflags --ldflags --dynamic-loading)
# spicyc -j -o my_http.hlto my_http.spicy
# printf "GET /index.html HTTP/1.0\n\n<dummy>" > data
# ./my-driver my_http.hlto MyHTTP::RequestLine "$(cat data)"
Available parsers:
MyHTTP::RequestLine
GET, /index.html, 1.0
method: GET
uri: /index.html
version: number: 1.0
Note
Note the addition of --dynamic-loading
to the hilti-config
command line. That’s needed when the resulting binary will
dynamically load precompiled Spicy parsers because linker flags
need to be slightly adjusted in that case.
8.3. API Documentation
We won’t go further into details of the HILTI/Spicy runtime API here.
Please see C++ API documentation for more on that, the namespaces
hilti::rt
and spicy::rt
cover what’s available to host
applications.
Our examples always passed the full input at once. You don’t need to do that, Spicy’s parsers can process input incrementally as it comes in, and return back to the caller to retrieve more. See the source of spicy::Driver::processInput() for an example of how to implement that.