This was originally posted by by Jessy Exum as an HN comment, but it deserves its own post. I’m hosting it here until he gets a blog set up. He’s currently looking for work and you can contact him at firstname.lastname@example.org.
The lack of open tools is a huge problem. My friends on IRC are working on reverse engineering multiple chips enough that full compilation tool chains can work. My part of the project is working to make a highly generic method of loading the compiled files onto the actual chips no matter the programmer or chip used. I will expand on the steps of compiling for an FPGA, loading a program onto an FPGA, the difficulties we face in making our own tools, and then talk about my project a little (which cgag was kind enough to name drop). As you can see from the post size this will be very long but should be very educational. First thing is there are more parts to the bitstream generation, as well as other tools of the toolchain required after bitstream generation is done.
The compilation is split into the following parts.
HDLs can work for both ASIC design, FPGA design, and more general work like some mathematical proofs that never intend to run on real hardware. This step takes in a HDL program and calculates all the gates required. The output is called a netlist is more of a mathematical construct of connections over perfect wires (everything is ideal i.e. not realistic). This step should also detect repeated gate mpatterns in the netlist and de-duplicate them (This is crazy important since it can shrink a netlist substantially). Commonly tools that implement this step will produce everything in one gate type (like NAND). There are two major open source tools: Icarus Verilog used more for the math proofs and simulation but not intended to generate information for real chips. Yosys A much newer tool that is made to be part of a real life FPGA toolchain once the next tools are available. Only works with Verilog (more popular in industry).
The link OP posted shows how the FPGA has specialty hardware peppered in with the general gates to speed up certain operations. The link briefly talks about the truth tables in the LUTs which means that each LUT is not really just A gate, they are each a box that can be configured to implement one of many common sequences of gates to save space and speed. Each FPGA can be implemented differently, and since the synthesis step outputs a netlist usually in a specific gate type, the gates of the netlist often do not match the gates that can be efficiently implemented in the LUTs. This tool is the first one that has to have deep knowledge of the FPGA’s underlying technology (often called the Fabric) so we can convert the netlist into something that could in theory be put on the target FPGA. Synthesis tools do not implement this step since it is FPGA specific.
After we generate a netlist of realistic gates we have to find out how to fit them into them onto the physical chip with enough room to connect to each other. This by itself is a VERY hard problem and takes most of the time of the full compilation. It is made more difficult by the fact that this requires extremely specific information about the FPGA’s fabric to know not only what paths are available, but also make sure that the paths between LUTs are not too slow so we can send signals around the chip fast enough (programs for FPGAs are not like programs for CPUs where each instruction happens one after the other. In an FPGA all the gates are running at the same time so we have to time the electricity running at the speed of light to give the circuits time to execute, and move LUT’s around if our timing is not met. This timing happens in a CPU too, but the CPU designer got this working so our instructions run in a predictable way).
The final step of the compilation toolchain is to take the LUT configuration and routing information from place-and-route and convert it to a binary file the FPGA can actually ‘run’. Like Place-and-route this step is highly FPGA dependent. We already know the configuration of the fpga we want from the last step so this step is just converting that configuration into the binary file format for the FPGA. This format is also kept secret. reply
With out final bitstream file in hand, we obviously want to shove it into an fpga. But how? For anyone who has programmed microcontrollers directly you know you need a box that lets your computer talk to the chip. For Arduino this is some SPI setup, but most FPGAs and ARM cores support JTAG. JTAG was intended as a method to do unit tests on highly complicated boards but its easily expandable architecture let people do what they wanted including programming their chips. Unfortunately each company implemented programming differently and while most companies produce chips that comply with JTAG signaling, each company and sometimes each chip handles programming their own way. A new standard, IEEE 1532, was made to mitigate the damage of nonstandard interfaces. IEEE 1532 specified a format for people to write up the random nonstandard sequences required for chip configuration that can be interpreted and run, but sadly the standard suffers from common problems of Electrical Engineers writing specifications for software. The result is a standard too loose to really standardize anything, with tool writers often having to do custom work per chip using the chip’s 1532 spec as a guideline.
In order to build tools for the TECHNOLOGY MAPPING, PLACE-AND-ROUTE, and BITSTREAM GENERATION steps, we need through documentation on the layout and configuration format of the FPGAs we are targeting. This information is unavailable and aggressively kept secret. One engineer bothered Xilinx until they told him that in order to receive the information his company would have to make them several million dollars a quarter, and even then he would only get it under an NDA. I have many opinions on how counter productive it is keeping this information secret and how it does not actually help Xilinx maintain competitiveness, but that is a different post. The FPGA manufacturers are not only unhelpful, they do not want there to be open tools. I am not honestly sure why since Xilinx gives away their compiler for free. They do sell the professional version but Xilinx admitted that they only made several million from software sales and the cost of engineers writing it must have been in the same scale (they made 14 billion last year).
I would expect them to be indifferent to open tool chains but they actively work against tools that have been made. The licensing language in their compiler says you are not allowed to reverse engineer the tool (honestly not a big problem since the code is terrible), but you are also not allowed to look at the output bitstreams to try to figure anything out about the chip. If you think this is an idle threat, they have sent cease and desist letters to at least one tool chain that got pretty far reversing the bitstreams. It is likely that this part of the license would lose if challenged in court, but as I said above, Xilinx made 14 billion last year, and no engineer has been able to challenge them even if they are right. I hope to get help from the EFF in the future to challenge and strike down this surprisingly common clause.
Several of my friends on IRC cut FPGAs open and use high power optical microscopes as well as electron microscopes to capture each layer of the chip’s layout. At least one is currently working on a computer vision tool to automatically map the chips and produce verilog code of the FPGAs themselves. If this works well we should be able to get enough information to build basic tools for each chip the whole process is applied to.
I want to pave the way for FPGAs to be usable in everything from laptop/desktop computers to phones (if the static power consumption gets better). I have some very interesting ideas of what an average user could do with in system FPGAs that can be reconfigured at runtime.
My main target is fixing the huge tangled JTAG mess. I mentioned earlier that even though the JTAG standard is pretty solid, everything built on it and around it are highly non standard. There is one more problem; there were standards to address how to build boxes to let your PC talk JTAG since that is not the IEEE’s area. Each company has their own JTAG Controller which is often USB. Each controller only works with the Company’s software and only with the company’s chips. The good news is that there is no physical reason these controllers can not talk to all JTAG devices. Each of them have custom USB protocols for how to make the box talk JTAG, and only the company’s software knows how to talk to it. The limitation here is the software on the PC and not the controller itself. All we need is documentation on the controllers USB protocol, which is where my project started.
As a final note on JTAG controllers, almost all of them require being loaded with firmware every time they plug in. This means that you have to have the proprietary firmware available. But since this firmware is not allowed to be distributed by anyone other than the company that created it, even if we know the secret USB protocol, we can not use these controllers with our own open tools out of the box without writing custom open firmware for each one.
I first started documenting JTAG controller protocols. This is available on my github (though it needs to be cleaned up). Next I started work on an open tool for talking JTAG to all chips independent of what JTAG controller is used. I only support a few chips for now, but all you have to do to add support another JTAG controller is write a driver and all supported chips will work with it (independent of manufacturer). To address propriatary firmware issue, I started writing my own open source replacements and have a stable (though limited functionality) version of the Xilinx Platform Cable controller available on github. And I am working on firmware for digilent boards (including the Atlys). My next step is to rewrite my tool in something other than python (likely C though I have a fantasy of Rust). The future version will not require controllers to be USB (and will support controllers that use ethernet, PCIe, etc), will support more than just JTAG (Debug Wire, SPI, etc), and likely be a daemon that keeps track of all jtag hardware. I want to see tools like OpenOCD (provides GDB interfaces to embedded systems, etc) to replace its dated controller architecture with calls to my daemon. I want to unify how all of these devices are communicated to so we have something to use when we get configurable FPGAs on the PCIe bus so the kernel or user software can configure them as needed.