GPU-friendly Stroke Expansion (2024)

Raph LevienandArman UgurayGoogleSan FranciscoCAUSA

Abstract.

Vector graphics includes both filled and stroked paths as the main primitives. While there are many techniques for rendering filled paths on GPU, stroked paths have proved more elusive. This paper presents a technique for performing stroke expansion, namely the generation of the outline representing the stroke of the given input path. Stroke expansion is a global problem, with challenging constraints on continuity and correctness. Nonetheless, we implement it using a fully parallel algorithm suitable for execution in a GPU compute shader, with minimal preprocessing. The output of our method can be either line or circular arc segments, both of which are well suited to GPU rendering, and the number of segments is minimal. We introduce several novel techniques, including an encoding of vector graphics primitives suitable for parallel processing, and an Euler spiral based method for computing approximations to parallel curves and evolutes.

Vector Graphics, Stroke, Offset Curve, Path Rendering, GPU

ccs: Computing methodologiesRenderingccs: Computing methodologiesParametric curve and surface models

1. Introduction

Stroke rendering is an essential part of the standard vector graphics imaging model. The standard representation of paths is a sequence of cubic Bézier segments. While there are a number of published techniques for GPU rendering of filled paths, stroke rendering is more challenging to parallelize, largely because path segments cannot be processed independently of each other; rendering of the joins between path segments depends on both adjoining segments, and the endpoints of paths (when not closed) are rendered with caps. Joins and caps can be rendered with a variety of styles within the same document. The style of stroke rendering also optionally includes dashing, which applies a pattern of dashes and gaps to the outline of the shape, as illustrated in Figure1.

GPU-friendly Stroke Expansion (1)

\Description

Illustration of stroke styles applied to a source Bézier path (shown in black).

A typical vector graphics document may have thousands of stroked paths. A map or CAD drawing may have tens of thousands of paths, possibly consisting of millions of path segments. At this scale, the computational cost of stroke rendering on a CPU may result in loss of interactive rendering performance. We believe the runtime performance can be improved significantly by offloading stroke rendering to a GPU. This requires that the algorithm be GPU-friendly: that it exploit the available parallelism, be work efficient, and avoid unnecessarily complex and divergent control flow. It must also remain numerically robust when computed with 32-bit floating pointing numbers.

There are a number of strategies for rendering strokes. Among them are local techniques that break down the stroke into individual closed primitives, distance field techniques (or similar techniques based on point inclusion). In this paper, we focus on stroke expansion, the generation of an outline that, when filled, represents the rendered stroke. Implementing stroke rendering using stroke expansion enables a unified approach to rendering both filled and stroked paths downstream. Among other benefits, such an approach avoids a large number of draw calls when stroked and filled paths are finely interleaved, as is often the case.

Correctness is another challenging aspect of stroke rendering. We propose that the correctness of stroke outlines be divided into weak correctness and strong correctness. We define strong correctness as the computation of the outline of a line swept along the segment, maintaining normal orientation, combined with stroke caps and joins. Weak correctness, by contrast, only requires the parallel curves (also known as “offset curves”) of path segments, combined with caps and the outer contours of joins. Both Nehab (2020) and Kilgard (2020b) provide useful definitions of strongly correct stroke rendering, and Nehab in particular describes how to achieve it in stroke expansion. Briefly, when the path curvature exceeds the reciprocal of the stroke half-width, evolute segments are required in addition to the parallel curves, as well as inner contours of joins when segments are short. The technique described in this paper produces strongly correct outlines according to this definition. Examples of weakly and strongly correct rendering are shown in Figure2.

GPU-friendly Stroke Expansion (2)

\Description

Two examples of strokes. The top left is missing an evolute section, making it appear that a half-disc has been carved out. The top right is missing inner joins, looking like spokes rather than a connected path. The bottom two examples are rendered with the full strokes.

Correct stroke rendering also relies on accurate approximation of the parallel curves of the input paths. Determining the parallel curve of a cubic Bézier is not analytically tractable, as it is a tenth order algebraic formula[5]. Thus, all practical stroke rendering techniques employ an approximation, with error tolerance as a tunable parameter, typically representing the maximum allowable Fréchet distance between the true curve and its approximation. A typical value is 0.25 device pixel, as that is close to the threshold of visibility. A correct algorithm will produce an approximate curve within the error tolerance, but an efficient algorithm will not subdivide significantly more than is necessary to meet the error bound. Some specifications for vector graphics, including PDF[11], specify the error tolerance explicitly.

The subproblem of approximating parallel curves has spawned extensive literature, none of which is entirely satisfying, especially when it comes to algorithms that can be efficiently evaluated on GPU. An example of a reasonably good algorithm that computes flattened parallel curves is Yzerman (2020), as used in the Blend2D rendering library. For producing curved outlines, the Tiller and Hanson[24] algorithm is commonly implemented and cited, but its performance is quite poor when applied to cubic Béziers. The quadratic Bézier version is adequate, and forms the basis of the algorithm in Nehab (2020). A variant is also used in Skia[9]; instead of lowering the cubic Bézier to a quadratic approximation and then computing the parallel curve, Skia approximates the parallel curve directly with a quadratic Bézier, then measures the error to determine whether further subdivision is necessary.

Previous work on stroke rendering does not fully solve the problem of fast and correct stroking. Most existing implementations have correctness problems, in particular failing to draw the evolutes and inner joins as required for strong correctness; Nehab (2020) provides a detailed survey. In addition, most implementations are not suitable for running on GPU, requiring sequential execution to produce watertight outlines. One GPU implementation is polar stroking (Kilgard (2020b)), a local technique, but among other limitations, it does not bound the Fréchet distance error (rather, it bounds the angle error).

Our central approach bounding the error tolerance is error metrics, which predict the Fréchet distance error. The error metrics presented in this paper are both straightforward to compute, and also invertible, which avoids excessive subdivision and also increases the amount of available parallelism to exploit. Previous approaches to error measurement generally follow a “cut then measure” approach, where the approximate curve is generated, then the error measured by sampling. Such sampling is often time consuming, involving iteration, and even then risks underestimating the error due to inadequate sampling.

Previous work, particularly Kilgard (2020b), has explained how to break the input path into dash segments of the specified length on GPU using a prefix sum of segment arc lengths. There are implementation tradeoffs to GPU-based dash processing, including potentially a larger number of dispatch stages. We anticipate that the potential performance benefit of implementing dashing on a GPU depends on the scene and other factors in the overall rendering pipeline. While we would expect further performance improvement from GPU-based dashing, we did not fully explore the solution space and we were able to achieve acceptable performance with dash processing on CPU.

This paper presents a solution to the core problem of stroke expansion, well suited to GPU implementation. It can produce both flattened polylines or an approximation consisting of circular arcs. The best choice depends on the capabilities of the path rendering mechanism following stroke expansion, but as we show, outlines consisting of circular arcs have many fewer segments and are correspondingly faster to produce. The algorithm is based on Euler spiral segments as an intermediate representation, with an iterative algorithm based on a straightforward error metric for conversion from cubic Béziers. We have also devised a compact binary encoding of paths, suitable for fully parallel computation of stroke outlines, while requiring minimal CPU-side processing. Our algorithm has been implemented in GPU compute shaders, integrated in a full rendering engine for vector graphics, and shows a dramatic speedup over CPU methods.

2. Flattening and arc approximation of curves

The core problem in stroke expansion is approximating the desired curve by segments of some other curve, usually a simpler one. These segments must be within an error tolerance of the source curve, and ideally close to a minimal number of them. We consider a number of source-to-target pairs, most importantly cubic Béziers to Euler spirals, and Euler spiral parallel curves to either lines or arcs.

There are generally three approaches to such curve approximation. The most straightforward but also least efficient is “cut then measure,” usually combined with adaptive subdivision. In this technique, a candidate approximate curve is produced, then the error is measured against the source curve, usually by sampling a number of points along both curves and determining a maximum error (or perhaps some other error norm). If the error is within tolerance, the approximation is accepted. Otherwise, the curve is subdivided (usually at t=0.5𝑡0.5t=0.5italic_t = 0.5) and each half is recursively approximated. A substantial fraction of all curve approximation methods in the literature are of this form, including Nehab (2020). The main disadvantage is the cost of computing the error metric. Another risk is underestimating the error due to inadequate sampling; this is a particular problem when the source curve contains a cusp.

The next approach is similar, but uses an error metric to estimate the error. Ideally such a metric is a closed-form computation rather than requiring iteration. A good error metric is conservative, yet tight, in that it never underestimates the error (which would allow results exceeding the error bound to slip through), and does not significantly overestimate the error, which would result in more subdivision than optimal.

By far the most efficient approach is an invertible error metric. In this approach, the error metric has an analytic inverse, or at least a good numerical approximation. Because the metric is invertible, it can predict the number of subdivisions needed, as well as the parameter value for each subdivision. If the error metric is accurate, then approximation is near-optimal. One example of an invertible error metric is angle step, used in polar stroking (Kilgard (2020b)); the number of subdivisions is the total angle subsumed by the curve divided by the angle step size, and the parameter value for each subdivision is the result of solving for a tangent direction. Another widely used invertible error metric is Wang’s formula (Goldman (2003), Section 5.6.3), which gives a bound on the flattening error based on the second derivative of the curve. This metric is conservative but works well in practice; among other applications, it is used in Skia for path flattening. The chief limitation of Wang’s method is that it computes a subdivision bound for a Bézier curve without accounting for the displacement of the parallel curve due to stroking. When applied naively to generation of parallel curves, it can undershoot substantially, especially near cusps.

2.1. Error metrics for flattening

The distance between a circular arc segment of length s𝑠sitalic_s and its chord, with angle between arc and chord of θ𝜃\thetaitalic_θ (see Figure3), is exactly (1cosθ)s2θ1𝜃𝑠2𝜃(1-\cos\theta)\frac{s}{2\theta}( 1 - roman_cos italic_θ ) divide start_ARG italic_s end_ARG start_ARG 2 italic_θ end_ARG. The curvature is κ=2θs𝜅2𝜃𝑠\kappa=\frac{2\theta}{s}italic_κ = divide start_ARG 2 italic_θ end_ARG start_ARG italic_s end_ARG (equivalently, θ=κs2𝜃𝜅𝑠2\theta=\frac{\kappa s}{2}italic_θ = divide start_ARG italic_κ italic_s end_ARG start_ARG 2 end_ARG), and this remains constant even as the arc is subdivided. Rewriting, d=(1cosκs2)1κ𝑑1𝜅𝑠21𝜅d=(1-\cos\frac{\kappa s}{2})\frac{1}{\kappa}italic_d = ( 1 - roman_cos divide start_ARG italic_κ italic_s end_ARG start_ARG 2 end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG. From this, we can derive a precise, invertible error metric. Subdividing the arc into n𝑛nitalic_n segments, the distance error for each segment is 1κ(1cosκs2n)1𝜅1𝜅𝑠2𝑛\frac{1}{\kappa}(1-\cos\frac{\kappa s}{2n})divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG ( 1 - roman_cos divide start_ARG italic_κ italic_s end_ARG start_ARG 2 italic_n end_ARG ). Solving for n𝑛nitalic_n, we get:

GPU-friendly Stroke Expansion (3)

\Description

A circular arc above its chord, with labels at the endpoints for angle from chord (θ𝜃\thetaitalic_θ), the length of the arc (s𝑠sitalic_s) and the maximum distance from the arc to its chord (d𝑑ditalic_d).

n=sκ2cos1(1dκ)𝑛𝑠𝜅2superscript11𝑑𝜅n=\frac{s\kappa}{2\cos^{-1}\left(1-d\kappa\right)}italic_n = divide start_ARG italic_s italic_κ end_ARG start_ARG 2 roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 - italic_d italic_κ ) end_ARG

To flatten a finite arc, round up n𝑛nitalic_n to the nearest integer. This will cause the error to decrease, so will still be within the error bounds.

Note that the number of subdivisions is proportional to the arc length. Another way of stating this relationship is that the subdivision density, the number of subdivisions per unit of arc length, is constant.

The error metric for flattening an arc is exact. It always yields the minimum number of subdivisions needed to flatten the curve, and the flattening error is the least possible given that number of subdivisions. For general curves, an exact error bound is not feasible, and we resort to an approximation. Again the circular arc provides a good example. Applying the small angle approximation cosθ1θ2/2𝜃1superscript𝜃22\cos\theta\approx 1-\theta^{2}/2roman_cos italic_θ ≈ 1 - italic_θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2, the approximate distance error is d=κs28n2𝑑𝜅superscript𝑠28superscript𝑛2d=\frac{\kappa s^{2}}{8n^{2}}italic_d = divide start_ARG italic_κ italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, and solving for n𝑛nitalic_n we get n=sκ8d𝑛𝑠𝜅8𝑑n=s\sqrt{\frac{\kappa}{8d}}italic_n = italic_s square-root start_ARG divide start_ARG italic_κ end_ARG start_ARG 8 italic_d end_ARG end_ARG. Note that this estimate is conservative, in that it will always request more subdivision and thus produce a lower error than the exact metric.

We are of course concerned with the flattening of more general curves (ultimately the parallel curve of a cubic Bézier), not simply circular arcs, so the curvature is not constant. An obvious approach to try would be to use the maximum error. This would be conservative with respect to error tolerance, but also generate too much subdivision when the curve has a small region of high curvature. In the limit, the curve can contain a cusp of infinite curvature, which would entail infinite subdivision. Similarly, sampling the curvature at a single point can either underestimate or overestimate the error. The ideal approach would be a form of average curvature that is tractable to compute, and accurately predicts the flattening error. To that end, we propose the following error metric to estimate the maximum distance between an arbitrary curve of length s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG and its chord.

d18(0s^|κ(s)|𝑑s)2𝑑18superscriptsuperscriptsubscript0^𝑠𝜅𝑠differential-d𝑠2d\approx\frac{1}{8}\left(\int_{0}^{\hat{s}}\sqrt{|\kappa(s)|}ds\right)^{2}italic_d ≈ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT square-root start_ARG | italic_κ ( italic_s ) | end_ARG italic_d italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

This formula is the same as the approximate error metric for circular arcs, except that instead of a constant curvature value, a norm-like average with an exponent of 1/2 is used (it is not considered a true norm because the triangle inequality does not hold). This particular formulation has two strong advantages. First, it has been empirically validated to accurately predict the distance between chord and curve for a variety of curves. In particular, we believe that when curvature is monotonic, it is a tight bound both above and below (and later we will see that in our algorithm it is always evaluated on curve segments with such monotonic curvature). Second, this formula lends itself to invertible error metrics.

This formula also has a meaningful interpretation: the quantity under the integral sign is the subdivision density, and represents the number of subdivisions per unit length of an optimal flattening as the error tolerance approaches zero. In particular, the number of subdivisions is:

n=(0s^|κ(s)|𝑑s)18d𝑛superscriptsubscript0^𝑠𝜅𝑠differential-d𝑠18𝑑n=\left(\int_{0}^{\hat{s}}\sqrt{|\kappa(s)|}ds\right)\sqrt{\frac{1}{8d}}italic_n = ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT square-root start_ARG | italic_κ ( italic_s ) | end_ARG italic_d italic_s ) square-root start_ARG divide start_ARG 1 end_ARG start_ARG 8 italic_d end_ARG end_ARG

In addition, if the function represented by the integral is invertible, then the corresponding error metric is invertible. Evaluate the function to determine the number of subdivision points, then evenly divide the result, using the inverse of the function to map these values back into parameter values for the source curve being approximated.

2.2. Error metrics for flattening Euler spirals

We choose Euler spiral segments for our intermediate curve representation precisely because their simple formulation in terms of curvature (Cesàro equation) results in similarly simple subdivision density integrals.

An Euler spiral segment is defined by κ(s)=as+b𝜅𝑠𝑎𝑠𝑏\kappa(s)=as+bitalic_κ ( italic_s ) = italic_a italic_s + italic_b, or alternatively κ(s)=a(ss0)𝜅𝑠𝑎𝑠subscript𝑠0\kappa(s)=a(s-s_{0})italic_κ ( italic_s ) = italic_a ( italic_s - italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), where s0=b/asubscript𝑠0𝑏𝑎s_{0}=-b/aitalic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - italic_b / italic_a is the location of the inflection point. Applying the above error metric, the subdivision density is simply |a(ss0)|𝑎𝑠subscript𝑠0\sqrt{|a(s-s_{0})|}square-root start_ARG | italic_a ( italic_s - italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) | end_ARG. The integral is 23a(ss0)1.523𝑎superscript𝑠subscript𝑠01.5\frac{2}{3}\sqrt{a}(s-s_{0})^{1.5}divide start_ARG 2 end_ARG start_ARG 3 end_ARG square-root start_ARG italic_a end_ARG ( italic_s - italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT, which is readily invertible.

An immediate consequence is that flattening an Euler spiral by choosing subdivision points si=ai23subscript𝑠𝑖𝑎superscript𝑖23s_{i}=a\cdot i^{\frac{2}{3}}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a ⋅ italic_i start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT produces a near-optimal flattening, as visualized in Figure4.

GPU-friendly Stroke Expansion (4)

\Description

An Euler spiral along with its flattened representation. The flattened curve has a consistent error, and the subdivision points are spaced according to a power law.

3. Euler spirals and their parallel curves

It is common to approximate cubic Béziers to some intermediate curve format more conducive to offsetting and flattening. A number of published solutions (Yzerman (2020), Nehab (2020)) use quadratic Béziers, as it is well suited for computation of parallel curves. Even so, this curve has some disadvantages. For one, it cannot model an inflection point, so the source curve must be subdivided at inflection points.

Like these other approaches, we also use an intermediate curve, but our choice is an Euler spiral. In some ways it is similar to quadratic Béziers – it also has O(n4)𝑂superscript𝑛4O(n^{4})italic_O ( italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) scaling and is best computed using geometric Hermite interpolation – but differs in others. It has no difficulty modeling an inflection point. Further, its parallel curve has a particularly simple mathematical definition and clean behavior regarding cusps.

An Euler spiral segment is defined as having curvature linear in arc length.

The parallel curve of the Euler spiral (also known as “clothoid”) was characterized by Wieleitner (1907) well over a hundred years ago, and has a straightforward closed-form Cesàro representation, curvature as a function of arc length, of the form κ(s)=c0(ss0)1/2+c1𝜅𝑠subscript𝑐0superscript𝑠subscript𝑠012subscript𝑐1\kappa(s)=c_{0}(s-s_{0})^{-1/2}+c_{1}italic_κ ( italic_s ) = italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s - italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Euler spirals as an intermediate curve representation for computing parallel curves was proposed in Levien (2021), which also expands on the Wieleitner reference (including an English translation) and Cesàro equation.

4. Flattened parallel curves

The geometry of a stroke outline consists of joins, caps, and the two parallel curves on either side of the input path segments, offset by the half linewidth. The joins and caps are not particularly difficult to calculate, but parallel curves of cubic Béziers are notoriously tricky[12]. Analytically, it is a tenth order algebraic curve, which is not particularly feasible to compute directly.

Conceptually, generating a flattened stroke outline consists of computing the parallel curve of the input curve segment followed by flattening, the generation of a polyline that approximates the parallel curve with sufficient accuracy (which can be measured as Fréchet distance). However, these two stages can be fused for additional performance, obviating the need to store a representation of the intermediate curve.

Using a subpixel Fréchet distance bound guarantees that the rendered image does not deviate visibly from the exact rendering. Another choice would be uniform steps in tangent angle, as chosen by polar stroking[13]. However, at small curvature, the stroked path can be off by several pixels, and at large curvature there may be considerably more subdivision than needed for faithful rendering.

The limitation of the angle step error metric is shown in Figure5. The top row shows the use of a distance-based error metric, as is used in our approach, which is visually consistent at varying curvature (in practice, a tolerance value of 0.25 of a pixel is below the threshold of perceptibility). The bottom row shows a consistent angle step, as implemented in polar stroking, but has excessive distance error at low curvature, and excessive subdivision at high curvature. It should be noted, to avoid the undershoot at low curvature, both the Skia[9] and Rive[10] renderers use a hybrid of the Wang and polar stroking error metrics.

GPU-friendly Stroke Expansion (5)

\Description

Four stroked arcs with different amount of subdivision. The top row shows a low-curvature and high-curvature arc with a consistent distance error

4.1. The subdivision density integral

The subdivision density for the parallel curve of an Euler spiral, normalized so that its inflection point is at 11-1- 1 and the cusp of the parallel curve is at 1, is simply |1s2|1superscript𝑠2\sqrt{|1-s^{2}|}square-root start_ARG | 1 - italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | end_ARG. This function is plotted in Figure6.

GPU-friendly Stroke Expansion (6)

\Description

A graph of subdivision density. The range from -1 to 1 is a semicircle, and the value from 1 rises sharply from zero to an asymptote of unity slope.

The subdivision density integral for the parallel curve of an Euler spiral is given as follows:

f(x)=0x|u21|𝑑u𝑓𝑥superscriptsubscript0𝑥superscript𝑢21differential-d𝑢f(x)=\int_{0}^{x}\sqrt{|u^{2}-1|}duitalic_f ( italic_x ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT square-root start_ARG | italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 | end_ARG italic_d italic_u

This integral has a closed-form analytic solution:

f(x)={12(x|x21|+sin1x)if|x|112(x|x21|cosh1x+π4)ifx1𝑓𝑥cases12𝑥superscript𝑥21superscript1𝑥if𝑥112𝑥superscript𝑥21superscript1𝑥𝜋4if𝑥1f(x)=\left\{\begin{array}[]{rl}\frac{1}{2}(x\sqrt{|x^{2}-1|}+\sin^{-1}x)&\text%{if }|x|\leq 1\\\frac{1}{2}(x\sqrt{|x^{2}-1|}-\cosh^{-1}x+\frac{\pi}{4})&\text{if }x\geq 1\end%{array}\right.italic_f ( italic_x ) = { start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_x square-root start_ARG | italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 | end_ARG + roman_sin start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x ) end_CELL start_CELL if | italic_x | ≤ 1 end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_x square-root start_ARG | italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 | end_ARG - roman_cosh start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x + divide start_ARG italic_π end_ARG start_ARG 4 end_ARG ) end_CELL start_CELL if italic_x ≥ 1 end_CELL end_ROW end_ARRAY

Values for x<1𝑥1x<-1italic_x < - 1 follow from the odd symmetry of the function.

4.2. Approximation of the subdivision density integral

The subdivision density integral (Section4.1) is fairly straightforward to compute in the forward direction, but not invertible using a straightforward closed-form equation. Numerical techniques are possible, but require multiple iterations to achieve sufficient accuracy, so are slower. In this subsection, we present a straightforward and accurate approximation, constructed piecewise from easily invertible functions. This approximation was found empirically, by interactively tuning candidate approximation functions in the Desmos graphing calculator. The exact integral and the approximation are shown in Figure7. Visually, it is clear that the agreement is close. We also built a test suite (included in our supplemental materials) to exhaustively test the subdivision count using a property testing approach, finding that the worst case discrepancy between approximate and exact results is 6%. If higher flattening quality is desired at the expense of slower computation, this approximation could be used to determine a good initial value for numeric techniques; two iterations of Newton solving are enough to refine this guess to within 32-bit floating point accuracy.

The approximation is as follows:

f𝑎𝑝𝑝𝑟𝑜𝑥(x)={sinc1xc1ifx<0.883(x1)1.5+π4if0.8x<1.250.6406x20.81x+c2if1.25x<2.10.5x20.156x+c3ifx2.1subscript𝑓𝑎𝑝𝑝𝑟𝑜𝑥𝑥casessubscript𝑐1𝑥subscript𝑐1if𝑥0.883superscript𝑥11.5𝜋4if0.8𝑥1.250.6406superscript𝑥20.81𝑥subscript𝑐2if1.25𝑥2.10.5superscript𝑥20.156𝑥subscript𝑐3if𝑥2.1f_{\mathit{approx}}(x)=\left\{\begin{array}[]{rl}\frac{\sin c_{1}x}{c_{1}}&%\text{if }x<0.8\\\frac{\sqrt{8}}{3}(x-1)^{1.5}+\frac{\pi}{4}&\text{if }0.8\leq x<1.25\\0.6406x^{2}-0.81x+c_{2}&\text{if }1.25\leq x<2.1\\0.5x^{2}-0.156x+c_{3}&\text{if }x\geq 2.1\end{array}\right.italic_f start_POSTSUBSCRIPT italic_approx end_POSTSUBSCRIPT ( italic_x ) = { start_ARRAY start_ROW start_CELL divide start_ARG roman_sin italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_x < 0.8 end_CELL end_ROW start_ROW start_CELL divide start_ARG square-root start_ARG 8 end_ARG end_ARG start_ARG 3 end_ARG ( italic_x - 1 ) start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT + divide start_ARG italic_π end_ARG start_ARG 4 end_ARG end_CELL start_CELL if 0.8 ≤ italic_x < 1.25 end_CELL end_ROW start_ROW start_CELL 0.6406 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 0.81 italic_x + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL if 1.25 ≤ italic_x < 2.1 end_CELL end_ROW start_ROW start_CELL 0.5 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 0.156 italic_x + italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL if italic_x ≥ 2.1 end_CELL end_ROW end_ARRAY
c1=1.0976991822760038c2=0.9148117935952064c3=0.16145779359520596subscript𝑐1absent1.0976991822760038subscript𝑐2absent0.9148117935952064subscript𝑐3absent0.16145779359520596\begin{array}[]{ll}c_{1}=&1.0976991822760038\\c_{2}=&0.9148117935952064\\c_{3}=&0.16145779359520596\end{array}start_ARRAY start_ROW start_CELL italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = end_CELL start_CELL 1.0976991822760038 end_CELL end_ROW start_ROW start_CELL italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = end_CELL start_CELL 0.9148117935952064 end_CELL end_ROW start_ROW start_CELL italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = end_CELL start_CELL 0.16145779359520596 end_CELL end_ROW end_ARRAY

The primary rationale for the constants is for the approximation to be continuous. The other parameters were determined empirically; further automated optimization is possible but is unlikely to result in dramatic improvement. Further, this approximation is given for positive values. Negative values follow by symmetry, as the function is odd.

GPU-friendly Stroke Expansion (7)

\Description

A graph of the integral of subdivision density. It is a wriggly diagonal line, with flat parts at -1 and 1. The approximation is nearly indistinguishable from the exact value.

5. Error metrics for approximation by arcs

The problem of approximating a curve by a sequence of arc segments has extensive literature, but none of the published solutions are quite suitable for our application. The specific problem of approximating an Euler spiral by arcs is considered in Meek and Walton (2004) using a “cut then measure” adaptive subdivision scheme, but their solution has poor quality; it scales as O(1/n2)𝑂1superscript𝑛2O(1/n^{2})italic_O ( 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), while O(1/n3)𝑂1superscript𝑛3O(1/n^{3})italic_O ( 1 / italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) is attainable. The result was improved “slightly” by Narayan (2014). The literature also contains optimal results, namely Maier (2014) and Nuntawisuttiwong and Dejdumrong (2021), but at considerable cost; both approaches claim O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time complexity. The through-line for all these results is that they are solving a harder problem: adopting the constraint that the generated arc sequence is G1superscript𝐺1G^{1}italic_G start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT continuous. While desirable for many applications, this constraint is not needed for rendering a stroke outline. Even with this constraint relaxed, the angle discontinuities of an arc approximation are tiny compared to flattening to lines.

Our approach is based on a simple error metric, similar in flavor to the one for flattening to line segments. The details of the metric (in particular, tuning of constants) were obtained empirically, though we suspect that more rigorous analytic bounds could be obtained. In practice it works very well indeed; the best way to observe that is an interactive testing tool, which is provided in the supplemental materials.

The proposed error metric is as follows. The estimated distance error for a curve of length s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG is:

d1120(0s^|κ(s)|3𝑑s)3𝑑1120superscriptsuperscriptsubscript0^𝑠3superscript𝜅𝑠differential-d𝑠3d\approx\frac{1}{120}\left(\int_{0}^{\hat{s}}\sqrt[3]{|\kappa^{\prime}(s)|}ds%\right)^{3}italic_d ≈ divide start_ARG 1 end_ARG start_ARG 120 end_ARG ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT nth-root start_ARG 3 end_ARG start_ARG | italic_κ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s ) | end_ARG italic_d italic_s ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT

For an Euler spiral segment, κ(s)superscript𝜅𝑠\kappa^{\prime}(s)italic_κ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s ) is constant and thus this error metric becomes nearly trivial. With n𝑛nitalic_n subdivisions, the estimated distance is simply s3κ120n3superscript𝑠3superscript𝜅120superscript𝑛3\frac{s^{3}\kappa^{\prime}}{120n^{3}}divide start_ARG italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 120 italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG. Solving for n𝑛nitalic_n, we get n=s|κ|120d3𝑛𝑠3superscript𝜅120𝑑n=s\sqrt[3]{\frac{|\kappa^{\prime}|}{120d}}italic_n = italic_s nth-root start_ARG 3 end_ARG start_ARG divide start_ARG | italic_κ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_ARG start_ARG 120 italic_d end_ARG end_ARG subdivisions, and those are divided evenly by arc length, as the subdivision density is constant across the curve, just as is the case for flattening arcs to lines.

Remarkably, the approximation of an Euler spiral parallel curve by arc segments is almost as simple as that for Euler spirals to arcs. As in flattening to lines, the parameter for the curve is the arc length of the originating Euler spiral. The subdivision density is then constant, and only a small tweak is needed to the formula for computing the number of subdivisions, taking into account the additional curvature variation from the offset by hhitalic_h (the half line-width). The revised formula is:

n=s|κ|(1+0.4|hsκ|)120d3𝑛𝑠3superscript𝜅10.4𝑠superscript𝜅120𝑑n=s\sqrt[3]{\frac{|\kappa^{\prime}|(1+0.4|hs\kappa^{\prime}|)}{120d}}italic_n = italic_s nth-root start_ARG 3 end_ARG start_ARG divide start_ARG | italic_κ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ( 1 + 0.4 | italic_h italic_s italic_κ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) end_ARG start_ARG 120 italic_d end_ARG end_ARG

This formula was determined empirically by curve-fitting measured error values from approximating Euler spiral parallel curves to arcs, but was also inspired by applying the general error metric formula to the analytical equations for Euler spiral parallel curve, and dropping higher order terms. A more rigorous derivation, ideally with firm error bounds, remains as future work.

One consequence of this formula is that, since the error is in terms of the absolute value of hhitalic_h, independent of sign, the same arc approximation can be used for both sides of a stroke.

See Figure8 for a comparison between flattening to a polyline and approximation with arc segments. The arc segment version has many fewer segments at the same tolerance, while preserving very high visual quality.

GPU-friendly Stroke Expansion (8)

\Description

TODO

6. Evolutes

In the principled, correct specification for stroking[20], parallel curves are sufficient only for segments in which the curvature does not exceed the reciprocal half-width. When it does, additional segments must be drawn, including evolutes of the original curve. In general, the evolute of a cubic Bézier is a very complex curve, requiring approximation techniques. By contrast, the evolute of an Euler spiral (κ=as𝜅𝑎𝑠\kappa=asitalic_κ = italic_a italic_s) is another spiral with a simple Cesàro equation, namely κ=a1s3𝜅superscript𝑎1superscript𝑠3\kappa=-a^{-1}s^{-3}italic_κ = - italic_a start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, an instance of the general result that the evolute of a log-aesthetic curve is another log-aesthetic curve[27].

Flattening this evolute is also straightforward; the subdivision density is proportional to s0.5superscript𝑠0.5s^{-0.5}italic_s start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT where s𝑠sitalic_s is the arc length parameter of the underlying Euler spiral (and translated so s=0𝑠0s=0italic_s = 0 is the inflection point). Thus, the integral is 2s2𝑠2\sqrt{s}2 square-root start_ARG italic_s end_ARG, and the inverse integral is just squaring. Thus, flattening the evolute of an Euler spiral is simpler than flattening its parallel curve.

GPU-friendly Stroke Expansion (9)

\Description

Two rows of rendered strokes, the top being weakly correct, only drawing parallel curves, the bottom being strongly correct, with additional evolute segments.

The effect of adding evolutes to achieve strong correctness is shown in Figure9. The additional evolute segments and connecting lines are output twice, to make the winding numbers consistent and produce a watertight outline. All winding numbers are positive, so rendering with the nonzero winding rule yields a correct final render.

7. Conversion from cubic Béziers to Euler spirals

The Euler spiral segment representation of a curve is useful for computing near-optimal flattened parallel curves, but standard APIs and document formats overwhelmingly prefer cubic Béziers as the path representation.

Many techniques for stroke expansion described in the literature apply some lowering of cubic Bézier curves to a simpler curve type that is more tractable for evaluating the parallel curve. Computing parallel curves directly on cubic Bézier curve segments is not very tractable. In particular, the widely cited Tiller-Hanson algorithm[24] performs well for quadratic Béziers but significantly worse for cubics.

A typical pattern for converting from one curve type to another is adaptive subdivision. An approximate curve is found in the parameter space of the target curve family. The error of the approximation is measured. If the error exceeds the specified tolerance, the curve is subdivided (typically at t=0.5𝑡0.5t=0.5italic_t = 0.5), otherwise the approximation is accepted. Subdivisions are also indicated at special points; for example, since quadratic Béziers cannot represent inflection points, and geometric Hermite interpolation is numerically unstable if the input curve is not convex, lowering to quadratic Béziers also requires calculation of inflection points and subdividing there. A good example of this pattern is Nehab (2020). One advantage of Euler spirals over quadratic Béziers is that they can represent inflection points just fine, so it is not necessary to solve for the inflection points, or do additional subdivision. Avoiding these alleviates the need for conditional logic and case analysis, which makes the algorithm more amenable to GPU execution.

The approach in this paper is another variant of adaptive subdivision, with two twists. First, it’s not necessary to actually generate the approximate curve to measure the error. Rather, a straightforward closed-form formula accurately predicts it. The second twist is that, since compute shader languages on GPUs typically don’t support recursion, the stack is represented explicitly and the conceptual recursion is represented as iterative control flow. This is an entirely standard technique, but with a clever encoding the entire state of the stack can be represented in two words, each level of the stack requiring a mere single bit.

7.1. Error prediction

A key step in approximating one curve with another is evaluating the error of the approximation. A common approach (used in Nehab (2020) among others) is to generate the approximate curve, then measure the distance, often by sampling at multiple points on the curve. All this is potentially slow, with the additional risk of underestimating the error due to undersampling.

Our approach is different. In short, we perform a straightforward analytical computation to accurately estimate the error. Our approach to the error metric has two major facets. First, we obtain a secondary cubic Bézier which is a very good fit to the Euler spiral, then we estimate the distance between that and the source cubic. Due to the triangle inequality, the sum of these is a conservative estimate of the true Fréchet distance between the cubic and the Euler spiral.

For mathematical convenience, the error estimation is done with the chord normalized to unit distance; the actual error is scaled linearly by the actual chord length.

The cubic Bézier approximating the Euler spiral is one in which the distance of each control point from the endpoint is d=23(1+cosθ)𝑑231𝜃d=\frac{2}{3(1+\cos\theta)}italic_d = divide start_ARG 2 end_ARG start_ARG 3 ( 1 + roman_cos italic_θ ) end_ARG, where θ𝜃\thetaitalic_θ is the angle of the endpoint tangent relative to the chord. This is a generalization of the standard approximation of an arc. It should be noted that this is not in general the closest possible fit, but it is computationally tractable and has near-uniform parametrization. In general it has quintic scaling; if an Euler spiral segment is divided in half, the error of this cubic fit decreases by a factor of 32. A good estimate of this error is:

4.6255×106|θ0+θ1|5+7.5×103|θ0+θ1|2|θ0θ1|4.6255superscript106superscriptsubscript𝜃0subscript𝜃157.5superscript103superscriptsubscript𝜃0subscript𝜃12subscript𝜃0subscript𝜃14.6255\times 10^{-6}|\theta_{0}+\theta_{1}|^{5}+7.5\times 10^{-3}|\theta_{0}+%\theta_{1}|^{2}|\theta_{0}-\theta_{1}|4.6255 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT | italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT + 7.5 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT | italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT |

The distance between two cubic Bézier segments can be further broken down into two terms; in most cases, the difference in area accurately predicts the Fréchet distance between the curves. Two cubic Béziers with the same area and same tangents tend to be relatively close to each other, but there is some error stemming from the difference in parametrization. Area of a cubic Bézier segment is straightforward to compute using Green’s theorem:

a=320(2d0sinθ0+2d1sinθ1d0d1sin(θ0+θ1))𝑎3202subscript𝑑0subscript𝜃02subscript𝑑1subscript𝜃1subscript𝑑0subscript𝑑1subscript𝜃0subscript𝜃1a=\tfrac{3}{20}(2d_{0}\sin\theta_{0}+2d_{1}\sin\theta_{1}-d_{0}d_{1}\sin(%\theta_{0}+\theta_{1}))italic_a = divide start_ARG 3 end_ARG start_ARG 20 end_ARG ( 2 italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_sin italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 2 italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_sin italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_sin ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) )

The conservative Fréchet distance estimate is 1.55 the absolute difference in area between the source cubic and the Euler spiral approximation. The final term for imbalance is as follows, with d¯0subscript¯𝑑0\bar{d}_{0}over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and d¯1subscript¯𝑑1\bar{d}_{1}over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT representing the distance from the endpoints of the Euler approximation and d0subscript𝑑0d_{0}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT the corresponding distance in the source cubic segment, as in the area calculation above:

(0.005|θ0+θ1|+0.07|θ0θ1|)(d¯0d0)2+(d¯1d1)20.005subscript𝜃0subscript𝜃10.07subscript𝜃0subscript𝜃1superscriptsubscript¯𝑑0subscript𝑑02superscriptsubscript¯𝑑1subscript𝑑12(0.005|\theta_{0}+\theta_{1}|+0.07|\theta_{0}-\theta_{1}|)\sqrt{(\bar{d}_{0}-d%_{0})^{2}+(\bar{d}_{1}-d_{1})^{2}}( 0.005 | italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | + 0.07 | italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ) square-root start_ARG ( over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

The total estimated error is the sum of these three terms. We have validated this error metric in randomized testing. For values of θ𝜃\thetaitalic_θ between 00 and 0.50.50.50.5, and values of d𝑑ditalic_d between 00 and 0.60.60.60.6, this estimate is always conservative, and it is also tight: over Bézier curves generated randomly with parameters drawn from a uniform distribution in this range, the mean ratio of estimated to true error is 1.656.

A visualization of combined error metric is shown in Figure10, comparing measured and approximate error for a slice of the parameter space, fixing the endpoint angles to 0.1 and 0.2, and varying the distance from both endpoints to the control point.

GPU-friendly Stroke Expansion (10)

GPU-friendly Stroke Expansion (11)

\Description

TODO

7.2. Geometric Hermite interpolation

Given tangent angles relative to the chord, finding the Euler spiral segment that minimizes total curvature variation is a form of geometric Hermite interpolation. There are a number of published solutions to this problem, involving nontrivial numerical solving techniques: gradient descent[14], bisection[25], or Newton iteration[4]. A more direct approach is to approximate the function as a reasonably low-order polynomial in terms of the endpoint angles. Our approach is to use a 7th order polynomial, which is more precise than the 3rd order polynomial proposed in Reif and Weinmann (2021), at only modest increased cost. The exact coefficients used are presented in Appendix A.

7.3. Cusp handling

There are two types of cusps that must be handled in the stroke expansion problem. One is when the input cubic Bézier contains a cusp (or near-cusp, which is prone to numerical robustness issues), and one is when the parallel curve contains cusps. This section will describe both in order.

A Bézier curve is expressed in parametric form, and its derivative with respect to the parameter can be zero or nearly so, causing serious numerical problems for many stroke expansion algorithms. In the limit, as the Bézier curve describes a semi-cubical parabola, the curvature can become infinite.

Our approach leverages the conversion to Euler spiral segments, which have finite curvature. The geometric Hermite interpolation depends on accurate tangents at the endpoints, which in turn are derived from derivatives. The tangents are not well defined when the derivative is zero, and are not numerically stable when near-zero. Thus, the algorithm has one major mechanism to deal with numerical robustness in these cases: when sampling the cubic Bézier, if the derivative is near zero (as determined by testing against an epsilon threshold), then the derivative is sampled at a slightly perturbed parameter value.

In practice, when rendering a cubic Bézier with a cusp, the region near the cusp is rendered with one Euler spiral segment approximately in shape of the top of a question mark, as shown in Figure11. Its parallel curve is well defined, and has the correct shape for the outline of the stroke, given of course that the distance is within tolerance (as enforced by the error metric).

GPU-friendly Stroke Expansion (12)

\Description

A cubic Bézier with a cusp is approximated by Euler spiral segments with finite curvature, including one with a question mark shape. The resulting stroke outlines have the correct shape.

GPU-friendly Stroke Expansion (13)

\Description

Selection of stroked cubic Béziers exhibiting strong and weak correctness

As recommended in both Nehab (2020) and Kilgard (2020b), the rendering of a cusp with infinite curvature matches that of a near-cusp. The code to detect near-zero derivatives and re-evaluate is a small amount of non-divergent logic, unlike the additional “regularization” pass proposed in Section 3.1 of Nehab (2020) to handle special cases.

7.3.1. Cusps in parallel curves

Even when the input curve is smooth, its parallel curve contains a cusp when the (signed) curvature equals the offset distance (the half-width of the stroke). In published techniques, dealing with these cusps is a nontrivial effort, and involves numerical methods that are not GPU friendly. In particular, detecting locations in the cubic Bézier where the curvature crosses a given quantity is a medium-degree polynomial, and in general requires numerical techniques for root finding. In the approach of Nehab (2020), the root finder not only requires a hybrid Newton/bisection method, which requires iteration, but is also recursive in that it uses the roots of a polynomial of one degree lower as a subroutine. In general, polynomials up to cubic can be considered GPU-friendly, while finding cusps in cubic Béziers requires a higher degree.

In the simplest case, weak stroke correctness with line segments as the output primitive, no additional work is needed – the flattening algorithm for Euler spiral parallel curves will naturally generate a point within the given error tolerance of the true cusp. However, generation of arc segments and drawing the evolutes needed for strong correctness both require subdivision at the parallel curve cusps.

Fortunately, finding the cusp in the parallel curve of an Euler spiral is a simple linear equation, and there is at most one cusp in any such segment. Euler spirals are thus a good solution for determining cusps as a piecewise linear approximation to curvature.

8. GPU Implementation

We applied the techniques outlined in Section 7 to implement a data-parallel algorithm that can convert stroked 2D Bézier paths into flat geometry suitable for GPU rendering. We focused primarily on the global stroke-to-fill conversion problem in order to generate polygonal outlines for rasterization. Since the Euler spiral approximation can fit any source cubic Bézier and its parallel curves, our implementation can be used to render both filled and stroked primitives within a satisfactory error tolerance.

We aimed to satisfy the following the criteria:

  1. (1)

    The implementation must be able to handle a large number of inputs at real-time rates.

  2. (2)

    The stroke outlines must meet the strong correctness definition.

  3. (3)

    The implementation must support the standard SVG stroke end cap (butt, square, round) and join styles (bevel, miter, round).

  4. (4)

    The CPU must do no work beyond the basic preparation of the input path data for GPU consumption, in order to maximize the exploited parallelism.

One of our artifacts is a GPU compute shader that satisfies these criteria. We coupled this with a simple and efficient data layout for parallel processing. Notably, our implementation is fully data-parallel on input path segments and does not require any computationally expensive curve evaluation on the CPU. We implemented our shader on general-purpose GPU compute primitives supported by all modern graphics APIs, particularly Metal, Vulkan, D3D12, and WebGPU[3]. As such, the ideas presented in this section are portable to various GPU platforms.

The rest of this section describes the design of our GPU pipeline and input encoding scheme.

8.1. Pipeline Design

An SVG path consists of a sequence of instructions (or “verbs”). The lineto and curveto instructions denote an individual path segment consisting of either a straight line or a cubic Bézier. The moveto instruction is used to begin a new subpath and the closepath instruction can be used to create a closed contour (by inserting a line connecting the current endpoint to the start of the subpath). A subpath must form a closed contour if painted as a fill. When painted as a stroke, each subpath must begin and end with a cap according to the desired cap style. In addition, each path segment in a stroked subpath must be connected to adjacent path segments of the same subpath with a join. For any given path, the cap and join styles do not vary across subpaths or individual path segments.

The standard organizational primitive for a compute shader is the workgroup. A workgroup can be organized as a 1, 2, or 3-dimensional grid of individual threads that can cooperate using shared memory. Multiple workgroups can be dispatched as part of a larger global grid, also of 1, 2, or 3 dimensions.

Our compute shader is arranged as a 1-dimensional grid, parallelized over input path segments. We chose a workgroup size of 256 as it works well over a wide range of GPUs. However, this choice of workgroup size is not particularly important for our algorithm as it does not require any cross-thread communication. Each thread in the global grid is assigned to process a single path segment. We encode the individual path segments contiguously in a GPU storage buffer such that each element has 1:1 correspondence to a global thread ID in the dispatch. This layout easily scales to hundreds of thousands of input path segments, which can be distributed across any number of paths and subpaths. We submit the kernel to the GPU as a single dispatch with as many workgroups as needed to handle the entire input.

GPU-friendly Stroke Expansion (14)

\Description

TODO

1tid𝑡𝑖𝑑absenttid\leftarrowitalic_t italic_i italic_d ← global invocation ID;

2(style,segment)readScene(tid)𝑠𝑡𝑦𝑙𝑒𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑟𝑒𝑎𝑑𝑆𝑐𝑒𝑛𝑒𝑡𝑖𝑑(style,segment)\leftarrow readScene(tid)( italic_s italic_t italic_y italic_l italic_e , italic_s italic_e italic_g italic_m italic_e italic_n italic_t ) ← italic_r italic_e italic_a italic_d italic_S italic_c italic_e italic_n italic_e ( italic_t italic_i italic_d );

3ifstyle𝑠𝑡𝑦𝑙𝑒styleitalic_s italic_t italic_y italic_l italic_e is fill𝑓𝑖𝑙𝑙fillitalic_f italic_i italic_l italic_lthen

// Output source curve (offset = 0)

4flatten_cubic(segment, 00);

5

6else

// Handle stroked segment

7ifsegment𝑠𝑒𝑔𝑚𝑒𝑛𝑡segmentitalic_s italic_e italic_g italic_m italic_e italic_n italic_t is stroke cap markerthen

8ifpath is openthen

9output start cap;

10else

11output nothing;

12 end if

13

14else

// Output parallel curves

15flatten_cubic(segment, style.offset);

16flatten_cubic(segment, -style.offset);

17

18neighborreadScene(tid+1)𝑛𝑒𝑖𝑔𝑏𝑜𝑟𝑟𝑒𝑎𝑑𝑆𝑐𝑒𝑛𝑒𝑡𝑖𝑑1neighbor\leftarrow readScene(tid+1)italic_n italic_e italic_i italic_g italic_h italic_b italic_o italic_r ← italic_r italic_e italic_a italic_d italic_S italic_c italic_e italic_n italic_e ( italic_t italic_i italic_d + 1 );

19ifneighbor𝑛𝑒𝑖𝑔𝑏𝑜𝑟neighboritalic_n italic_e italic_i italic_g italic_h italic_b italic_o italic_r is not stroke cap marker OR path is closedthen

20output join;

21

22else

23output end cap;

24

25 end if

26

27 end if

28

29 end if

Each thread outputs a polyline that fits a subset of the rendered subpath’s outer contour, represented by the path segment (see Figure 13). For a filled primitive, a thread only outputs the flattened approximation of the path segment. For a stroke, a thread outputs the polylines approximating multiple curves: a) the two parallel curves on both sides of the source segment, offset by the desired stroke half-width, b) the joins that connect the path segment to the next adjacent segment, and c) evolute patches in regions of high curvature. We also store metadata in our input encoding that marks the position of a path segment within the containing subpath. We use this information to determine whether to output an end cap instead of a join. We encode one additional segment designated as the stroke cap marker for every subpath. The thread assigned to the marker segment only outputs a start cap, which completes the stroke outline.

The joins between adjacent segments need to form a continuous outline when combined with the segments. This requires that a thread have access to the tangent vector at the end of its assigned segment and the start tangent of the next adjacent segment. We require that the input segments within a subpath are stored in order which simplifies the access to the adjacent segment down to a buffer read at the next array offset.

1fnflatten_cubic(cubic: Cubic, offset: f32)

2TOL \leftarrow Euler spiral fit tolerance;

// ‘t0_u’ and ‘dt’ track the recursion stack

3t0_u \leftarrow 0;

4dt \leftarrow 1.0;

5last_p \leftarrow cubic.p0;

6last_q \leftarrow cubic.p1 - cubic.p0;

7loop

8t0 \leftarrow f32(t0_u) * dt;

9ift0 == 1.0then

10break;

11 end if

12t1 \leftarrow t0 + dt;

13(next_p, next_q) \leftarrow eval_cubic_and_deriv(cubic, t1);

// Estimate error of geometric

// Hermite interpolation to Euler spiral

14(th0, th1, chord_len, err) \leftarrow fit_cubic_to_es(;

15last_p, next_p, last_q, next_q, dt);

16iferr ¡= TOLthen

17es \leftarrow euler_segment(th0, th1, last_p, next_p);

// Find location of any cusp in outline.

// If 0t<10𝑡10\leq t<10 ≤ italic_t < 1, draw evolute at ‘t’.

18cusp0 \leftarrow es_curvature(es, 0) * offset + chord_len;

19cusp1 \leftarrow es_curvature(es, 1) * offset + chord_len;

20t \leftarrow cusp0 * cusp1 ¿= 0 ? 1 : cusp0 / (cusp0 - cusp1);

21ifcusp0 ¿= 0then

22evolute_finalize();

23es_seg_flatten_offset(es, offset, vec2(0, t));

24

25 end if

26ifcusp0 ¡ 0 OR t ¡ 1then

27evolute_start(es, offset);

28range \leftarrow cusp0 ¿= 0 ? vec2(t, 1) : vec2(0, t);

29es_seg_flatten_evolute(es, range);

30es_seg_flatten_offset_reverse(es, offset, range);

31

32 end if

33ifcusp0 ¡ 0 AND t ¡ 1then

34evolute_finalize();

35es_seg_flatten_offset(es, offset, vec2(t, 1));

36

37 end if

// Pop the stack

38t0_u \leftarrow t0_u + 1;

39shift \leftarrow countTrailingZeros(t0_u);

40t0_u \leftarrow t0_u much-greater-than\gg shift;

41dt \leftarrow dt * f32(1 much-less-than\ll shift);

42last_p \leftarrow next_p;

43last_q \leftarrow next_q;

44

45else

// Error is above the tolerance.

// Push the stack; Continue subdivision

46t0_u \leftarrow t0_u * 2;

47dt \leftarrow dt * 0.5;

48

49 end if

50

51 end

52

53 end

All input segments first get converted to a cubic Bézier by degree elevation. Our flattening routine (see Algorithm2) starts by computing the error estimate for geometric Hermite interpolation from the cubic Bézier to an Euler spiral segment. If the error exceeds the desired tolerance threshold, the cubic segment is iteratively subdivided in half and a new error estimate is computed. Once the error satisfies the tolerance, the Euler spiral segment is directly flattened to a polyline by invoking a subroutine that is parameterized by the stroke half-width (see es_seg_flatten_offset in Algoritm2). For a regular fill, this parameter is set to 00. For strokes, the sign of the parameter determines the position of the parallel curve relative to the source segment as well as the winding direction of the output lines.

The adaptive subdivision in lowering cubic Béziers to Euler spiral segments is traditionally accomplished through recursion; in the case of subdivision there is a recursive call for each of the two subdivisions of the range. As the execution model of compute shaders does not support recursion, we unroll recursion into an iterative loop. The cookbook technique for such unrolling is to store the stack explicitly as an array, which is potentially expensive (it increases pressure on registers). We exploit the specific nature of this recursion, encoding the entire state of the stack into two scalar values: the size of the range dt𝑑𝑡dtitalic_d italic_t, and the scaled start of the range t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u, so that the range is t0_udt..(t0_u+1)dtt0\_u\cdot dt..(t0\_u+1)\cdot dtitalic_t 0 _ italic_u ⋅ italic_d italic_t . . ( italic_t 0 _ italic_u + 1 ) ⋅ italic_d italic_t. Initial values are 1111 and 00 respectively. Pushing the call stack is represented by halving dt𝑑𝑡dtitalic_d italic_t and doubling t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u, which preserves the start of the range but halves the size. Accepting an approximation (a leaf in the call stack) is represented by incrementing t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u. At this point, the decision of whether to pop the stack or continue is dependent on whether t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u is even or odd, respectively. If even, popping the stack is accomplished by doubling dt𝑑𝑡dtitalic_d italic_t and halving t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u, and this is repeated until t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u becomes odd. The repeated stack popping can be accomplished without iteration by using the “count trailing zeros” intrinsic to determine the number of stack pops n𝑛nitalic_n, then multiplying dt𝑑𝑡dtitalic_d italic_t by 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and shifting t0_u𝑡0_𝑢t0\_uitalic_t 0 _ italic_u right by n𝑛nitalic_n bits.

This procedure is sufficient to generate weakly correct stroke outlines. If the outline contains a cusp, we output an evolute patch (as shown in Figure9) in order to achieve strong correctness. A selection of renderings produced by our algorithm that demonstrate both strong and weak correctness is shown in Figure12.

The output of the compute shader is a collection of line segments that gets stored in a GPU storage buffer. Every element in the output contains the viewport coordinates of the two endpoints of a single line segment. Threads reserve storage buffer regions for their output by incrementing an atomic index stored in GPU device memory. A property of the data-parallel structure is that the line segments are unordered. Because the segments are processed in parallel, ordering cannot be guaranteed across path segments. We refer to this output data structure as line soup.

An application may want to sort the line soup, depending on the requirements of the chosen rendering algorithm. For instance, our own renderer employs tile-based scanline rasterization in a compute shader, with a pipeline architecture that resembles cudaraster[15] and MPVG[6]. In this design, the line soup elements are spatially sorted by subsequent compute stages into a grid of 16x16 pixel tiles. The final compute stage of our renderer (called fine rasterizer) operates at tile granularity and computes pixel coverage for filled polygons outlined by the line soup (Figure14 illustrates the overall structure of our renderer, the details of which are beyond the scope of this paper).

GPU-friendly Stroke Expansion (15)

\Description

Our GPU pipeline architecture in broad strokes

We also implemented a variant that outputs circular arc segments instead of lines, in which the output entries store curvature in addition to the endpoints. For both primitive types, our renderer also outputs a 32-bit path ID used by downstream pipeline stages to identify the path that a segment belongs to, in order to retrieve the correct paint (solid or gradient color) for rendering.

8.2. Input encoding

The input to our pipeline consists of a sequence of paths, each with an associated 2D affine transformation matrix and a style data structure containing information about the stroke style or fill rule. We encode paths, transforms, and styles as parallel input streams. Unlike a conventional structure-of-arrays arrangement, we permit a one-to-many mapping across elements between certain streams: each transform and style entry can apply to one or more path entries with an array index greater than or equal to transform or style index.

Paths get encoded as two parallel streams: a path tag stream containing an element for every segment in every subpath, and a path data stream of point coordinates. Each segment/verb contributes a variable number of points: 1 for moveto/lineto, 2 for quadratic Béziers, and 3 for cubic Béziers. This variable length stream encoding allows for a highly compact representation of SVG-type content in memory. A GPU thread assigned to a path segment must be able to reference the corresponding element in each stream in order to obtain the right curve coordinates, transform, and style information. This is done by computing a set of stream offsets for every element in the path tag stream.

We compute the offsets on the GPU in a separate compute shader that runs prior to the flattening stage. A path tag consists of 8 bits that store stream offset increments, such that the inclusive prefix sums of all increments for every element in the tag stream yield the stream offsets. We call this structure the tag monoid and have implemented the computation as a parallel prefix scan.

Path Tag BitsMonoid Fields
0-1coordinate count
2subpath end bit
316/32 bit coordinates
4path bit
5transform bit
6style bit

Table1 shows the individual fields of the 8-bit path tag. The offset increments for the transform and style streams can be stored in a single bit that is set to 1111 only if a path tag marks the beginning of a new transform and/or style entry. We also encode an additional path bit in the final segment of all paths, which allows us to access additional streams (such as fill color or gradient parameters) using a per-path offset in later rendering stages.

The handling of the subpath bit allows for overlap in coordinates, so that except at subpath boundaries, the first coordinate pair of each segment overlaps with the last coordinate pair of the previous one. During the CPU-side encoding, subpath boundaries are moved from the beginning of a subpath (moveto) to the end. The encoding process inserts an additional line segment into the tag and coordinate streams to close the subpath if the final point does not coincide with the start point. The subpath end bit is set to 1111 for the final segment of every subpath. For strokes, the subpath end segment represents the stroke cap marker defined in Section 8.1. This special segment is assigned a coordinate count of 2222, and the corresponding two points in the path data stream encode the tangent vector of the subpath’s initial segment, used to correctly draw the start cap or the connecting join in a closed contour.

9. Results

We present two versions of our stroking method: a sequential CPU stroker and the GPU compute shader outlined in Section 8. Both versions can generate stroked outlines with line and arc primitives. We focused our evaluation on the number of output primitives and execution time, using the timings dataset provided by Nehab (2020), in addition to our own curve-intensive scenes to stress the GPU implementation. We present our findings in the remainder of this section.

9.1. CPU: Primitive Count and Timing

We measured the timing of our CPU implementation, generating both line and arc primitives, against the Nehab and Skia strokers, which both output a mix of straight lines and quadratic Bézier primitives. The results are shown in Figure15. The time shown is the total for test files from the timings dataset of Nehab (2020), and the tolerance was 0.25 when adjustable as a parameter.

Similarly to Nehab (2020)’s observations, Skia is the fastest stroke expansion algorithm we measured. The speed is attained with some compromises, in particular an imprecise error estimation that can sometimes generate outlines exceeding the given tolerance. We confirmed that this behavior is still present. The Nehab paper proposes a more careful error metric, but a significant fraction of the total computation time is dedicated to evaluating it.

GPU-friendly Stroke Expansion (16)

\Description

A stacked bar chart of the timings for stroke expansion

We also compared the number of primitives generated, for varying tolerance values from 1.0 to 0.001. The number of arcs generated is significantly less than the number of line primitives. The number of arcs and quadratic Bézier primitives is comparable at practical tolerances, but the count of quadratic Béziers becomes more favorable at finer tolerances. The vertical axis shows the sum of output segment counts for test files from the timings dataset, plus mmark-70k (as described in more detail in Section9.2), and omitting the waves example111The waves example triggers Skia issue https://issues.skia.org/issues/336617138 at finer tolerance..

GPU-friendly Stroke Expansion (17)

\Description

A line chart showing the primitive count

9.2. GPU: Execution Time

We evaluated the runtime performance of our compute shader on 4 GPU models and multiple form factors: Google Pixel 6 equipped with the Arm Mali-G78 MP20 mobile GPU, the Apple M1 Max integrated laptop GPU, and two discrete desktop GPUs: the mid-2015 NVIDIA GeForce GTX 980Ti and the late-2022 NVIDIA GeForce RTX 4090. We authored our entire GPU pipeline in the WebGPU Shading Language (WGSL) and used the wgpu[7] framework to interoperate with the native graphics APIs (we tested the M1 Max on macOS via the Metal API; the mobile and desktop GPUs were tested via the Vulkan API on Android and Ubuntu systems).

We instrumented our pipeline to obtain fine-grained pipeline stage execution times using the GPU timestamp query feature supported by both Metal and Vulkan. This allowed us to collect measurements for the curve flattening and input processing stages in isolation. We ran the compute pipeline on the timings SVG files provided by Nehab (2020). The largest of these SVGs is waves (see rendering (a) in Figure 20) which contains 42 stroked paths and a total of 13,308 path segments.

We authored two additional test scenes in order to stress the GPU with a higher workload. The first is a very large stroked path with ~500,000 individually styled dashed segments which we adapted from Skia’s open-source test corpus. We used two variants of this scene with butt and round cap styles. For the second scene we adapted the Canvas Paths test case from MotionMark[2], which renders a large number of randomly generated stroked line, quadratic, and cubic Bézier paths. We made two versions with different sizes: mmark-70k and mmark-120k with 70,000 and 120,000 path segments, respectively. Table2 shows the payload sizes of our largest test scenes.

TestInput SegmentsLinesArcs
waves.svg13,308475,855181,229
mmark-70k70,0001,577,7051,162,117
mmark-120k120,0002,709,0131,994,946
long dash (butt)503,3041,672,5611,672,561
long dash (round)503,3042,625,3001,672,561

The test scenes were rendered to a GPU texture with dimensions 2088x1600. The contents of each scene were uniformly scaled up (by encoding a transform) to fit the size of the texture, in order to increase the required number of output primitives. As with the CPU timings, we used an error tolerance of 0.25 pixels in all cases. We recorded the execution times across 3,000 GPU submissions for each scene.

GPU-friendly Stroke Expansion (18)

\Description

A stacked bar chart of the GPU timings for stroke expansion from Nehab (2020) timings SVGs

GPU-friendly Stroke Expansion (19)

\Description

A stacked bar chart of the GPU timings for stroke expansion from our test scenes

Figures17 and 18 show the execution times for the two data sets. The entire Nehab (2020) timings set adds up to less than 1.5ms on all GPUs except the mobile unit. Our algorithm is ~14×\times× slower on the Mali-G78 compared to the M1 Max but it is still capable of processing waves in 3.48ms on average. We observed that the performance of the kernel scales well with increasing GPU ability even at very large workloads, as demonstrated by the order-of-magnitude difference in run time between the mobile, laptop, and high-end desktop form factors we tested. We also confirmed that outputting arcs instead of lines can lead to a 2×\times× decrease in execution time (particularly in scenes with high curve content, like waves and mmark) as predicted by the lower overall primitive count.

9.3. GPU: Input Processing

Figure19 shows the run time of the tag monoid parallel prefix sums with increasing input size. The prefix sums scale better at large input sizes compared to our stroke expansion kernel. This can be attributed to a lack of expensive computations and high control-flow uniformity in the tag monoid kernel. The CPU encoding times (prior to the GPU prefix sums) for some of the large scenes is shown in Table3. Our implementation applies dashing on the CPU. It is conceptually straightforward to implement on GPU; the core algorithm is prefix sum applied to segment length, and the Euler spiral representation is particularly well suited. However, we believe the design of an efficient GPU pipeline architecture involves comples tradeoffs which we did not sufficiently explore for this paper. Therefore the dash style applied to the long dash scene was processed on the CPU and dashes were sequentially encoded as individual strokes. Including CPU-side dashing, the time to encode long dash on the Apple M1 Max was approximately 25ms on average, while the average time to encode the path after dashing was 4.14ms.

Our GPU implementation targets graphics APIs that do not support dynamic memory allocation in shader code (in contrast to CUDA, which provides the equivalent of malloc). This makes it necessary to allocate the storage buffer in advance with enough memory to store the output, however, the precise memory requirement is unknown prior to running the shader.

We augmented our encoding logic to compute a conservative estimate of the required output buffer size based on an upper bound on the number of line primitives per input path segment. We found that Wang’s formula[8] works well in practice, as it is relatively inexpensive to evaluate, though it results in a higher memory estimate than required (see Table4). For parallel curves, Wang’s formula does not guarantee an accurate upper bound on the required subdivisions, which we worked around by inflating the estimate for the source segment by a scale factor derived from the linewidth. We observed that enabling estimation increases our CPU pre-processing time by 2–3×\times× on the largest scenes but overall the impact is negligible when the scene size is modest.

GPU-friendly Stroke Expansion (20)

\Description

Mean GPU execution time of the tag monoid kernel

TestEncodingEncoding + Estimation
roads.svg398.56 μ𝜇\muitalic_μs618.24 μ𝜇\muitalic_μs
waves.svg197.53 μ𝜇\muitalic_μs418.96 μ𝜇\muitalic_μs
mmark-70k2.69 ms6.89 ms
mmark-120k4.28 ms11.59 ms
TestActual SizeEstimated Size
roads.svg2.76 MB8.35 MB
waves.svg10.89 MB80.69 MB
mmark-70k36.10 MB103.06 MB
mmark-120k60.01 MB177.33 MB

GPU-friendly Stroke Expansion (21)

\Description

Renderings of the test scenes

10. Conclusions

Path stroking has received attention in recent years, with Nehab (2020) and Kilgard (2020b) both having proposed a complete theory of correct stroking in vector graphics. While there are a number of implementations of path filling on the GPU, the goal of also implementing stroke expansion on the GPU had not yet been realized. Building upon the theory proposed by Nehab (2020), we implemented an approach to stroke expansion that is GPU-friendly, and achieves high performance on commodity hardware.

We present the Euler spiral as an intermediate representation for a fast and precise approximation of both filled and stroked Bézier paths. Our method for lowering to both line and arc primitives avoids recursion and minimizes divergence, potentially serious problems for parallel evaluation on the GPU. We propose a novel error metric to directly estimate approximation error as opposed to the commonly employed cut-then-measure approaches which often consume the lion’s share of computational expense.

Our approach includes an efficient encoding scheme that alleviates the need for expensive CPU pre-computations and unlocks fully GPU-driven rendering of the entire vector graphics model.

11. Future work

  • The GPU implementation sequentializes some work that could be parallel, and is thus not as well load-balanced as it might be. An appealing future direction is to split the pipeline into separate stages, executed by the GPU as nodes in a work graph[22]. This structure is appealing because load balancing is done by hardware.

  • The pre-allocation requirement for bump-allocated line soup buffer is a limitation due to today’s graphics APIs. A more accurate and optimized buffer size estimation heuristic is considered future work. It may also be interesting to explore whether graphics APIs can be extended to facilitate GPU-driven scheduling of the compute workload under a bounded memory footprint.

  • Many of the error metrics were empirically determined. The mathematical theory behind them should be developed more rigorously, and that process will likely uncover opportunities to fine-tune the technique.

  • As mentioned earlier, we did not have time to sufficiently explore a pipeline architecture for GPU-based dashing. Given its parameterization by arc length, we believe the Euler spiral representation lends itself well to an approach based on parallel prefix sums of segment arc lengths.

References

Appendix A Geometric Hermite interpolation for Euler spiral

This appendix gives the detailed algorithm for Geometric Hermite interpolation, determining Euler spiral segment parameters given the tangent angles at the endpoints relative to the chord.

The Euler spiral is represented by κ(s)=k0+k1s𝜅𝑠subscript𝑘0subscript𝑘1𝑠\kappa(s)=k_{0}+k_{1}sitalic_κ ( italic_s ) = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_s, where s𝑠sitalic_s ranges from 0 to 1, i.e., the arc length is unit normalized. This segment is scaled, rotated, and translated into place so that its endpoints match the desired locations. Immediately, k0subscript𝑘0k_{0}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be determined as θ0+θ1subscript𝜃0subscript𝜃1\theta_{0}+\theta_{1}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Therefore, the remaining parameters to compute are k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the ratio of chord length to arc length. The latter could be computed from k0subscript𝑘0k_{0}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by numerical integration, but it is more efficient to obtain it at the same time as k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Setting k=θ0+θ1𝑘subscript𝜃0subscript𝜃1k=\theta_{0}+\theta_{1}italic_k = italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Δ=θ1θ0Δsubscript𝜃1subscript𝜃0\Delta=\theta_{1}-\theta_{0}roman_Δ = italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have:

k1=6ΔΔ3/70Δ5/10780+Δ72.769178184818219×107k2Δ/10+k2Δ3/4200+k2Δ51.6959677820260655×105k4Δ/1400+k4Δ36.84915970574303×105k6Δ7.936475029053326×106subscript𝑘1absent6ΔsuperscriptΔ370superscriptΔ510780superscriptΔ72.769178184818219superscript107superscript𝑘2Δ10superscript𝑘2superscriptΔ34200superscript𝑘2superscriptΔ51.6959677820260655superscript105superscript𝑘4Δ1400superscript𝑘4superscriptΔ36.84915970574303superscript105superscript𝑘6Δ7.936475029053326superscript106\begin{array}[]{rl}k_{1}=&6\Delta\\-&\Delta^{3}/70\\-&\Delta^{5}/10780\\+&\Delta^{7}\cdot 2.769178184818219\times 10^{-7}\\-&k^{2}\Delta/10\\+&k^{2}\Delta^{3}/4200\\+&k^{2}\Delta^{5}\cdot 1.6959677820260655\times 10^{-5}\\-&k^{4}\Delta/1400\\+&k^{4}\Delta^{3}\cdot 6.84915970574303\times 10^{-5}\\-&k^{6}\Delta\cdot 7.936475029053326\times 10^{-6}\end{array}start_ARRAY start_ROW start_CELL italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = end_CELL start_CELL 6 roman_Δ end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL roman_Δ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT / 70 end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL roman_Δ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT / 10780 end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL roman_Δ start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT ⋅ 2.769178184818219 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ / 10 end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT / 4200 end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ⋅ 1.6959677820260655 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_Δ / 1400 end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ⋅ 6.84915970574303 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT roman_Δ ⋅ 7.936475029053326 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY

Similarly, the formula for the chord to arc length ratio:

c=1Δ2/40+Δ40.00034226190482569864Δ61.9349474568904524×106k2/24+k2Δ20.0024702380951963226k2Δ43.7297408997537985×105+k4/1920k4Δ24.87350869747975×105k63.1001936068463107×106𝑐absent1superscriptΔ240superscriptΔ40.00034226190482569864superscriptΔ61.9349474568904524superscript106superscript𝑘224superscript𝑘2superscriptΔ20.0024702380951963226superscript𝑘2superscriptΔ43.7297408997537985superscript105superscript𝑘41920superscript𝑘4superscriptΔ24.87350869747975superscript105superscript𝑘63.1001936068463107superscript106\begin{array}[]{rl}c=&1\\-&\Delta^{2}/40\\+&\Delta^{4}\cdot 0.00034226190482569864\\-&\Delta^{6}\cdot 1.9349474568904524\times 10^{-6}\\-&k^{2}/24\\+&k^{2}\Delta^{2}\cdot 0.0024702380951963226\\-&k^{2}\Delta^{4}\cdot 3.7297408997537985\times 10^{-5}\\+&k^{4}/1920\\-&k^{4}\Delta^{2}\cdot 4.87350869747975\times 10^{-5}\\-&k^{6}\cdot 3.1001936068463107\times 10^{-6}\end{array}start_ARRAY start_ROW start_CELL italic_c = end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 40 end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL roman_Δ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ⋅ 0.00034226190482569864 end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL roman_Δ start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ⋅ 1.9349474568904524 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 24 end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ 0.0024702380951963226 end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ⋅ 3.7297408997537985 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT / 1920 end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ 4.87350869747975 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - end_CELL start_CELL italic_k start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ⋅ 3.1001936068463107 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY

Note that the latter quantity is equal to sinc(k/2)sinc𝑘2\mbox{sinc}(k/2)sinc ( italic_k / 2 ) when Δ=0Δ0\Delta=0roman_Δ = 0.

The coefficients were determined by numerical differentiation, using a Newton-style solver as the source of truth. The resulting formulas are extremely accurate over a wide range of inputs.

GPU-friendly Stroke Expansion (2024)

References

Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6371

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.