Part 1 - Introduction & Getting Started
What it is all about and how to set it up.
Gatery is a Hardware Construction Library for RTL design, the creation of digital circuits, which seeks to improve over common description languages such as VHDL or Verilog while offering the same level of control. Gatery is still in an early stage of its development and while the basics have converged to a stable feature set now, there will in all likelihood be changes to come. We will update this tutorial as we move along, but if you happen to find some inconsistency do drop us a note.
To use Gatery yourself, as in, write your own digital circuits with it, you need a couple of prerequisites such as an environment for building C++ programs. The goal of this part of the tutorial is to setup these prerequisites, understand the project structure, and generate with Gatery a simple VHDL design that could be synthesized on a FPGA.
Gatery Workflow
Gatery is a C++ library that offers primitives for signals, registers, etc. such that a digital circuit can be constructed programmatically. The idea is that the hardware description is actually a C++ program that is compiled into a binary and executed. During its execution, the program assembles the digital circuit and performs operations on the resulting graph such as simulating it or exporting it to VHDL. The resulting VHDL code can then be ingested by industry standard synthesis tools.
VHDL & Verilog
VHDL and Verilog are the standard languages for describing RTL-designs. They are high level languages and much like high level programming languages need further processing akin to compilation. The final product of a digital circuit design, however, is extremely platform specific.
From a software developer’s standpoint, it might be likened to each processor having a different instruction set architecture. Gatery, seeking to improve RTL-design itself and not the synthesis, placement, or routing, thus interfaces with these platform specific tools by producing VHDL code. Software developers can think of this as source2source compiling a new language to C-code so that the various C-compilers for all ISAs (and their optimizers) can be used to target all processors and microcontrollers.
The big benefit of this approach is that the entire expressive power of C++ can be used to control the assembly of the digital circuit. In addition, regular C/C++ code can seamlessly interact with the simulation, allowing easy construction of test benches, comparison to C reference implementations, as well as running simulation and driver code together.
This is what a gatery program may look like:
#include <gatery/frontend.h>
#include <gatery/utils.h>
#include <gatery/export/vhdl/VHDLExport.h>
#include <gatery/scl/arch/intel/IntelDevice.h>
#include <gatery/scl/synthesisTools/IntelQuartus.h>
#include <gatery/simulation/waveformFormats/VCDSink.h>
#include <gatery/simulation/ReferenceSimulator.h>
using namespace gtry;
using namespace gtry::vhdl;
using namespace gtry::scl;
using namespace gtry::utils;
int main()
{
DesignScope design;
// Optional: Set target technology
{
auto device = std::make_unique<IntelDevice>();
device->setupCyclone10();
design.setTargetTechnology(std::move(device));
}
// Build circuit
Clock clock({.absoluteFrequency = 1'000}); // 1 KHz
ClockScope clockScope{ clock };
hlim::ClockRational blinkFrequency{1, 1}; // 1Hz
size_t counterMax = hlim::floor(clock.absoluteFrequency() / blinkFrequency);
UInt counter = BitWidth(utils::Log2C(counterMax+1));
auto enable = pinIn().setName("button");
IF (enable)
counter += 1;
counter = reg(counter, 0);
HCL_NAMED(counter);
pinOut(counter.msb()).setName("led");
design.postprocess();
// Optional: Setup simulation
sim::ReferenceSimulator simulator;
simulator.compileProgram(design.getCircuit());
simulator.addSimulationProcess([=, &clock]()->SimProcess{
simu(enable) = '0';
for ([[maybe_unused]]auto i : Range(300))
co_await WaitClk(clock);
simu(enable) = '1';
});
// Optional: Record simulation waveforms as VCD file
sim::VCDSink vcd(design.getCircuit(), simulator, "waveform.vcd");
vcd.addAllPins();
vcd.addAllSignals();
// Optional: VHDL export
VHDLExport vhdl("vhdl/");
vhdl.addTestbenchRecorder(simulator, "testbench", false);
vhdl.targetSynthesisTool(new IntelQuartus());
vhdl.writeProjectFile("import_IPCore.tcl");
vhdl.writeStandAloneProjectFile("IPCore.qsf");
vhdl.writeConstraintsFile("constraints.sdc");
vhdl.writeClocksFile("clocks.sdc");
vhdl(design.getCircuit());
// Run simulation
simulator.powerOn();
simulator.advance(hlim::ClockRational(5000,1'000));
return 0;
}
Prerequisites
In essence, the following things are needed:
- Gatery
- C++ 20 compiler
- Boost
- make or MS VisualStudio
- premake5
To facilitate getting started, we recommend our template project which you can download from here or clone from github. When cloning, make sure to clone recursively as Gatery is pulled in as a submodule.
Since Gatery is a C++ library, you will need a C++ compiler capable of C++20. On Windows, we recommend a recent version of Visual Studio. On Linux, we recommend gcc version 10 or higher, which can usually be installed from the distribution’s repository.
Gatery makes use of the C++ library boost. Some linux distributions ship with an older version of boost that is incompatible with C++20. In those cases and on windows, Boost needs to be downloaded and compiled by following the Boost instructions.
Finally, we use premake5 as the build system of Gatery as well as of the template project. Not quite unlike CMake, premake can generate amongst others Makefiles and Visual Studio project files. It is a standalone executable and easy to set up.
Fedora Step-by-Step Instructions
# install gcc, boost, git (for cloning)
sudo dnf install g++ boost-devel git make
# verify gcc10 or later
gcc --version
# fetch premake5 and make globally available
curl -L https://github.com/premake/premake-core/releases/download/v5.0.0-beta2/premake-5.0.0-beta2-linux.tar.gz > /tmp/premake-5.0.0-beta2-linux.tar.gz
tar -zxf /tmp/premake-5.0.0-beta2-linux.tar.gz -C /tmp/
sudo mv /tmp/premake5 /usr/local/bin/
# fetch template project
git clone --recursive https://github.com/synogate/gatery_template.git ~/Documents/gatery/hello_world/
# Generate makefiles
cd ~/Documents/gatery/hello_world/
premake5 gmake2
Ubuntu Step-by-Step Instructions
Ubuntu is slightly more involved as gcc10 is a separate package. Also boost needs to be build from scratch since the repository version is not compatible with c++20.
# install gcc, boost, git (for cloning)
sudo apt install build-essential g++-10 libboost-all-dev git
# Select gcc10 as default gcc
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10
sudo update-alternatives --config gcc
sudo update-alternatives --config g++
# verify gcc10 or later
gcc --version
# fetch premake5 and make globally available
curl -L https://github.com/premake/premake-core/releases/download/v5.0.0-beta2/premake-5.0.0-beta2-linux.tar.gz > /tmp/premake-5.0.0-beta2-linux.tar.gz
tar -zxf /tmp/premake-5.0.0-beta2-linux.tar.gz -C /tmp/
sudo mv /tmp/premake5 /usr/local/bin/
# Fetch, build, and install boost
curl -L https://boostorg.jfrog.io/artifactory/main/release/1.76.0/source/boost_1_76_0.tar.gz > /tmp/boost_1_76_0.tar.gz
mkdir -p ~/Documents/software/
tar -zxf /tmp/boost_1_76_0.tar.gz -C ~/Documents/software/
cd ~/Documents/software/boost_1_76_0
./bootstrap.sh
./b2
sudo ./b2 install
# fetch template project
git clone --recursive https://github.com/synogate/gatery_template.git ~/Documents/gatery/hello_world/
# Generate makefiles
cd ~/Documents/gatery/hello_world/
premake5 gmake2
Windows Step-by-Step Instructions
NOTICE: only works with PowerShell
Go to visualstudio.microsoft.com and download/install Visual Studio with the packages for C++ development and Git for Windows. Also install from individual components “MSVC v142 - VS 2019 C++ x64/x86 build tools (v.14.28 - 16.8)”.
# Assume Visual Studio and Git for Windows installed
# Get boost via vcpkg, so fetch and install vcpkg (a package manager)
git clone https://github.com/microsoft/vcpkg "$Env:USERPROFILE/Documents/vcpkg_target_dir"
cd "$Env:USERPROFILE/Documents/vcpkg_target_dir"
.\bootstrap-vcpkg.bat
# Fetch and build boost (this may take a while)
.\vcpkg.exe install boost:x64-windows
.\vcpkg.exe integrate install
# Fetch and build premake
git clone https://github.com/premake/premake-core "$Env:USERPROFILE/Documents/premake_target_dir"
cd "$Env:USERPROFILE/Documents/premake_target_dir"
./Bootstrap.bat
# fetch template project
git clone --recursive https://github.com/synogate/gatery_template.git "$Env:USERPROFILE/Documents/gatery/hello_world/"
cd "$Env:USERPROFILE/Documents/gatery/hello_world/"
# Generate visual studio solutions with premake
"$Env:USERPROFILE/Documents/premake_target_dir/bin/release/premake5.exe vs2019"
Project Structure
The directory and file structure of the template project can be summarized as the following:
- Makefile or gatery-template.sln
- premake5.lua
- source/
- main.cpp
- lib/gatery/
- source/gatery/
- export/
- frontend/
- scl/
- simulation/
- ...
- source/gatery/
If you are using windows and generated a solution file (gatery-template.sln), you can open this solution with Visual Studio.
The most important file is source/main.cpp
. This is where the RTL-design of this tutorial is to be build.
Gatery is placed in lib/gatery/
. As the directory structure suggests, it is composed of multiple parts.
This part of the tutorial will only come in contact with the frontend and the export.
The other two noteworthy parts are the build in simulator and the Standard Component Library (SCL).
The latter is a collection of frequently needed digital circuit components, items too small to be traded as IP-cores but too big to be rewritten on each use.
A later part of this tutorial will take a closer look at the SCL.
Hello World
To test the setup, we want to build a simple circuit and export it to VHDL.
The design to build is the "Hello World" of RTL designs: a blinking LED.
The resulting code that we will arrive at is exactly the code found in source/main.cpp
.
Nontheless, we start from scratch with an empty C++ program that does nothing.
int main()
{
return 0;
}
Next, we want to create a DesignScope
.
The DesignScope
holds an internal graph representation of the circuit that is being created.
While the DesignScope
variable exists, all operations such as creating or combining signals, are directed into this graph.
Every generator program that creates a digital circuit, for simulation or export, will want to create such a DesignScope
.
Scopes in C++
The concept of creating instances of classes whose mere existence creates an effect is a common pattern in C++.
In C++, the lifetime of variables and, by extension, class instances is gouverned by very strict rules.
Any variable or instance created lives on until the { }
scope in which it is created is left for any reason.
This { }
scope can be the scope of a function, the scope block of a loop or if
statement, but one can also create freely create new { }
sub-scopes.
If execution leaves the { }
scope, either by reaching the bottom, or by exception, return
, break
, or goto
statement, C++ automatically deconstructs all (stack-)objects/variables that were created in that scope in the reverse order in which they were created.
Since destructors of classes can be freely defined in C++, it is a common paradigm to create these scope classes whose constructors and destructors affect certain changes while making sure that reversing the effect can not be forgotten.
In the case of the DesignScope
, the primary effect is to use (thread-local) global variables to direct all frontend Gatery calls to the underlying graph data structure.
However, Gatery makes heavy use of this scope mechanism and later parts of this tutorial will revisit this pattern.
The DesignScope
, like all frontend aspects of Gatery, requires the gtry/frontend.h
include and is in the “gtry” namespace.
Includes and Namespaces in C++
In any source file, from which you want to call a gatery function, you need to #include
the corresponding header in the top part of the file.
For all frontend aspects of gatery, this is gtry/frontend.h
.
These includes are necessary because C++ compiles all .cpp files independently. The header files provide “declarations” of functions, classes, and types to ensure that the resulting object files are compatible and can be linked together. The linker still needs to be explicitly told to link against the gatery library (the actual implementations of the functions, classes, etc.), however, this is handled by the premake/make/VS-project files.
All gatery functions and definitions are in the “gtry” namespace, and sometimes in further sub-namespaces.
Line 3 simply frees us from having to prefix all classes and functions with gtry::
.
#include <gatery/frontend.h>
using namespace gtry;
int main()
{
DesignScope design;
return 0;
}
To let an LED blink with a human visible frequency, we build a counter of sufficient width and wire the LED to the most significant bit. Since this part of the tutorial focusses on getting started, we will skim over the details a bit. The code creates a clock with a certain frequency, in this case, the default frequency of a Zybo-Z7 FPGA development board. It then computes the necessary bit width of the counter and constructs the latter. Finally, it forces the msb of the counter to become an output signal of the top level entity where it could be connected to an FPGA-board’s LED by configuring the XDC file accordingly.
Counter for reduction
Typical clock frequencies for FPGAs are in the range of 10s to 100s of MHz, way too fast for a human to perceive. A binary counter can be used to reduce this frequency. A counter that simply counts up (with overflow to zero) will see its least significant bit alternate at every rising clock edge. The second least significant bit will alternate with half that frequency. The one after again with half the frequency and so on. Every bit added to the counter halves the frequency with which it alternates. So, constructing a counter of sufficient width will see its most significant bit alternate with an appropriate frequency.
IO-Pins and constraint files
As for actually driving external IO, such as a LED on a development board: The output signals driving external IO or input signals being driven by external IO have to be input or output signals of the top-most entity of the resulting VHDL files (being taken care of by Gatery) and be associated with actual pins of the FPGA’s package. Which signals are supposed to be connected to which package pins is handled through an .xdc constraint file, a template of which is usually supplied by the vendor of the FPGA development board.
#include <gatery/frontend.h>
using namespace gtry;
int main()
{
DesignScope design;
{
Clock clock({.absoluteFrequency = 125'000'000}); // 125MHz
ClockScope clockScope{ clock };
hlim::ClockRational blinkFrequency{1, 1}; // 1Hz
size_t counterMax = hlim::floor(clock.absoluteFrequency() / blinkFrequency);
UInt counter = BitWidth(utils::Log2C(counterMax+1));
counter = reg(counter+1, 0);
HCL_NAMED(counter);
pinOut(counter.msb()).setName("led");
}
return 0;
}
So far, we have fully defined the behavior of the design as a graph. However, this graph is still in a pretty raw form. Before it is exported (or simulated) one usually wishes to perform various graph operations such as optimizations, register retiming, etc. Some of these operations are mandatory because the raw graph can still contain illegal structures such as apparent combinatorial loops that only through optimization are resolved. Gatery offers a function that performs those optimizations and adaptations to a target platform. The default postprocessing invocation, shown below, targets an “average FPGA” by turning e.g. block rams into VHDL code from which they are easily inferred, but without using any vendor-specifc macros.
Hard blocks
FPGAs not only contain fully configurable logic blocks, but also hard blocks with specific, frequently needed, or difficult to build functionality. These blocks are, to some degree, vendor and device specific and thus restrict any design that explicitly uses them to those devices.
Synthesis tools can infer the use of these hard blocks from a behavioral description in VHDL. C++ developers can compare it to auto-vectorization of a compiler vs. platform specific vector intrinsics. Auto-vectorization and hard block inference are both somewhat fragile and take more time during compilation/synthesis, but are more platform independent.
int main()
{
DesignScope design;
{
// ...
}
design.postprocess();
return 0;
}
Finally, the adapted graph can be exported to VHDL.
The VHDL exporter VHDLExport
is a class that gets instantiated with a destination path.
The instance can be configured further, though this is not shown here and usually not necessary.
Applying the circuit to the exporter triggers the actual export.
For Xilinx Vivado, the exporter can generate a .tcl script that imports the .vhdl files into an existing project.
#include <gatery/frontend.h>
#include <gatery/export/vhdl/VHDLExport.h>
using namespace gtry;
using namespace gtry::vhdl;
int main()
{
DesignScope design;
{
Clock clock({.absoluteFrequency = 125'000'000}); // 125MHz
ClockScope clockScope{ clock };
hlim::ClockRational blinkFrequency{1, 1}; // 1Hz
size_t counterMax = hlim::floor(clock.absoluteFrequency() / blinkFrequency);
UInt counter = BitWidth(utils::Log2C(counterMax+1));
counter = reg(counter+1, 0);
HCL_NAMED(counter);
pinOut(counter.msb()).setName("led");
}
design.postprocess();
VHDLExport vhdl{"vhdl/"};
vhdl(design.getCircuit());
vhdl.writeVivadoScript("vivado.tcl");
return 0;
}
The program now needs to be build into an executable. You may build it either through your IDE (e.g. Visual Studio), or from the console using make, depending on how you set it up:
# Navigate to the root of the project directory
cd ~/Documents/gatery/hello_world/
# Build using make and multiple parallel jobs
make -j
# Alternative for making an optimized "release" build
# make config=release -j
Debug vs release builds
C++ programs can be build in different configurations, in this case in a debug or release configuration. The the debug configuration, the executable contains symbols and largely unoptimized code which enables or facilitates running the executable in a debugger. In addition, Gatery provides stack traces of offending nodes in case of errors, which helps localize why and from where something was created. These stack traces require the same debug symbols to work properly. The release configuration, on the other hand, yields an optimized executable. This can be useful for large designs where the execution time of the construction or graph operation steps can become a problem.
The build configuration of the executable has no effect on the generated RTL-design. The optimization of the executable, and its execution speed, is independent of the optimization of the resulting circuit.
The build process will add two new directories (names vary slightly for windows):
- bin/linux-x86_64-Debug/
- gatery-template
- obj/
- Makefile
- premake5.lua
- source/
- main.cpp
- lib/gatery/
- source/gatery/
- ...
- source/gatery/
The executable(s) can be found in the bin/
folder and be executed from there, or from the IDE.
Execution should yield the vhdl/
folder relative to the current working directory, i.e.:
# Navigate to the root of the project directory
cd ~/Documents/gatery/hello_world/
# Navigate to bin directory
cd bin
# Execute
./linux-x86_64-Debug/gatery-template
yields
- bin/
- vhdl/
- top.vhdl
- GateryHelperPackage.vhdl
- clocks.xdc
- vivado.tcl
- linux-x86_64-Debug/
- gatery-template
- vhdl/
- obj/
- Makefile
- premake5.lua
- source/
- lib/gatery/
The .vhdl files can now be added to a project in a synthesis tool and synthesized.
The specifics of this depend on the tool used.
For Xilinx Vivado, the provided vivado.tcl
script can be sourced.
The clocks.xdc
file is meant for simulation only, it only specifies the clock frequency, and should be replaced for synthesis.
In the case of FPGA devboards, the devboard’s template constraint file should be adapted accordingly.
If your FPGA’s default clock is actually driven with a different frequency, you may want to change the
frequency in the code to ensure that the computations of the counter bit width are still in the right order of magnitude.
Conclusion
In this part of the tutorial, we set up a "Hello World" project and looked at the general workflow from Gatery based C++ code to synthesizable VHDL code. However, how to actually compose RTL-designs with gatery was glossed over. The next part of this series remedies this by introducing the basic building blocks for Gatery based RTL-design.