Automating MATLAB for the C generation
25 February 2009
MATLAB algorithms, for all their advantages, have come unstuck in the past when being incorporated in an embedded solution as it requires an awkward and lengthy translation to C. Robert Yu describes a method of automating this process
When implementing signal and image processing designs as embedded software, you must meet a challenging set of design requirements. Derived from mission specifications and hardware constraints,
these requirements differ greatly from desktop application requirements. Typically, embedded software must execute in realtime on a tightly-integrated software and hardware platform, with constrained memory resources and limits on size, weight, power, and cost. Given these requirements, the preferred embedded software language is C, which keeps you close to the bare metal.
MATLAB, on the other hand, is the language of choice for algorithm developers. MATLAB's vector syntax, large library of functions, and dynamic typing make this language ideal for algorithm exploration. As anybody who uses both MATLAB and C knows, programming in an interpreted language like MATLAB yields substantial gains in productivity. Unfortunately, MATLAB is unsuitable for deployment on most embedded systems: MATLAB can be slow, requires large amounts of memory, supports only a limited number of
platforms, and defies tight integration.
So for the time being, we are stuck with translating MATLAB algorithms to C code. Traditionally, translation is a manual, painstaking task performed by a skilled C developer. Converting MATLAB to C is tedious and error-prone; the very language features which make MATLAB a joy to use become translation pitfalls. The challenges range from the merely inconvenient (MATLAB array indices start with 1, whereas C arrays start with 0) to quite difficult (finding suitable replacements for high-level, polymorphic toolbox functions).
For example, consider translating the short MATLAB function below, which computes the contours for a 2-D
dataset X and returns line segments for the longest contour (see Fig 1).
This code illustrates several translation challenges. For example, the functions contourc and sortrows are MATLAB toolbox functions to identify contours and sort matrix rows, respectively. These high-level functions are non-trivial to implement in C, and you must either find appropriate replacements or allocate time and resources to re-implement them. In addition, the sizes of vectors and matrices (e.g. contourc output) are not known until runtime. This adds an extra dimension of complexity to the translation task. For an embedded project, the challenges don't end once you have a first-cut C translation. Inevitably, you must refine the C to improve the algorithm performance and meet embedded software requirements.
For anything more than trivial changes, you must either modify and test in MATLAB, then propagate the changes to C, or make the changes directly in C and deal with an obsolete MATLAB implementation, plus the inconvenience of debugging and analysis outside MATLAB. Either way, the initial handtranslation from MATLAB to C, plus subsequent revisions, incur substantial labor and time. Figure 2 illustrates the process.
An alternative to hand translation is automatic MATLAB-to-C translation. Agility developed MCS (MATLAB-C
Synthesis) to convert plain MATLAB to C. Given the challenges of translation by hand, automatic generation of C from MATLAB offers clear advantages in terms of productivity. MCS supports most MATLAB language
constructs, and translates a large number of commonly-used toolbox functions. With MCS, the lengthy first step of translating MATLAB to C takes substantially less time, as does code refinement. Let's look at how MCS accomplishes this feat.
To tackle the initial translation, you must address the fundamental gap between languages: C is a statically typed language, whereas MATLAB is dynamically typed. The first step is to assign types to longcontour's
input arguments. You specify the types by annotating the MATLAB code with "must be" functions ("mbfunctions"). Consider the function declaration for longcontour:
where X is a matrix and contourLevels in a row vector. To specify the types, you insert corresponding mbfunctions at the top of longcontour.m.
X is a matrix of real values
contourLevels is a row vector of integers
With this information about the top-level input arguments, MCS can infer the types of all other variables in the MATLAB code. Note that you don't have to specify the exact dimensions of X or contourLevels.
You can view how MCS has assigned variable types with the Analyser GUI (shown below). The GUI provides useful feedback during the initial translation process and during code refinement.
Once MCS assigns types to all variable, translation is a matter of pushing the "C" button. In less than a minute, MCS generates 2811 lines of ANSI C for all functions in the call graph, including C for contourc and sortrows. MATLAB comments and code can be embedded in the C for easy cross-reference.
Once you have the C translation, you'll want assurance that it returns the same results as the original MATLAB. MCS automatically generates a MEX file from the C translation. The result is a C implementation which reads its inputs from MATLAB and returns its outputs to MATLAB. Thus, testing the C code is as easy as calling the function within MATLAB.
Refining the C Code for Embedded Requirements
Now that you have a first-cut C translation, you can iteratively refine the code to meet embedded requirements. Ever since language compilers first became available, programmers who are wary of machinegenerated code have declared that "I could write faster/cleaner/better code by hand!" In the embedded industry, assembly language is still used (occasionally with justification). Demanding embedded
programmers look for ways to improve their C code with respect to code size, performance, and memory usage. MCS allows you to quickly iterate through revisions to achieve all of these goals, while still maintaining the advantage of a MATLAB "golden master". In this article, we'll focus specifically on the topic of memory.
Memory is a common concern for embedded software, both in terms of data layout as well as run-time utilization. The longcontour algorithm provides a good illustration: it handles 2-D data and manipulates arrays of unknown length. Because longcontour processes 2-D data, the difference in how the two languages store data becomes significant: MATLAB arrays are column-major, while C arrays are row-major. This issue affects integration: imagine that as a C implementation, longcontour receives input from a sensor
which generates row-major data. If translating by hand, you must either 1) buffer and transpose the sensor data, then apply a column-major implementation of longcontour, or 2) implement longcontour as row-major. The former solution's transpose incurs processing delays and requires more memory, while the latter
solution requires the C developer to mentally "transpose the algorithm" during translation, thereby complicating the task and diverging from the MATLAB implementation. The figure above illustrates these choices. However, a single option in MCS lets you generate row-major or column-major code without any changes to the MATLAB. With minimal effort, you can test both approaches while maintaining a working MATLAB model.
Longcontour uses variable-size arrays: the number of elements is not known until runtime. Because of hardware limitations or reliability and safety concerns, embedded projects often forbid dynamic heap
allocation. Unfortunately, since MATLAB arrays are not statically declared, dynamic heap allocation is the C translator's most direct route to manipulating variable-size arrays. To avoid this issue during hand translation, you either 1) rewrite the MATLAB to eliminate variable-size arrays, or 2) resolve the issue
during translation. The former solution, while easing translation to C, effectively shackles much of MATLAB's
expressiveness, while the latter introduces code complexity and obsoletes the MATLAB implementation. By default, MCS generates C with a combination of heap allocation, stack allocation, and static allocation. For variablesize arrays, MCS uses heap allocation. However, you can annotate variable-size arrays to specify a maximum number of elements at runtime; the Analyser GUI identifies variables representing variablesize
arrays. Once the upper bound is specified, MCS will avoid dynamic allocation: variable-size arrays are "emulated" in a fixed-size array with additional code and variables to track the actual size. For fixedsize
arrays, MCS gives you a great deal of control over how and where storage memory is allocated. For example, you can easily specify that all arrays be allocated statically, and thus avoid heap allocation entirely.
Converting MATLAB to C for embedded software is more than just translation. In this article, we have seen that translation is just the first step in meeting embedded software requirements. The refinement and
optimisations that occur after translation are equally important and often take more time than the original translation. Using an automation tool delivers a more predictable result, linking the MATLAB algorithm
directly with the generation of embedded C code.
Agility MCS makes automatic MATLAB-to- C translation and refinement of embedded software a realistic approach. When translating MATLAB to C, using a development tool like MCS delivers significant gains in productivity over traditional manual translation.
ROBERT YU is Senior Applications Engineer, Agility Design Solutions
Contact Details and Archive...