Advanced Tessellation with Metal: Dynamic B-Spline Surface

August 16, 2023
Rss Fetcher

Advanced Tessellation With Metal: Dynamic B-Spline Surface

Figure 1: Real-time Shallow Water Simulation and Cloth (Flag) Simulation on iPhone 13Mini

Tessellation is a GPU-supported technique for generating a fine mesh surface from much fewer control points. It is achieved by geometrically subdividing a rectangular or triangular area called patch into much smaller triangles by automatic triangulation. This article focuses on the rectangular patches called ‘quad patches.’ Following are some examples of the surfaces rendered by tessellation.

Real-time water simulation — See the picture with a duckling above. This is a simulation of the water surface by the shallow water equations [LM02]. The dimension of the original grid of the control points is 80 x 80. The tessellation can theoretically expand to 5,120 x 5,120 for rendering.
Real-time cloth simulation — See the picture with a flag above. This is a simulation of cloth by a simple spring model. The dimension of the original grid of the control points is 22 x 18. The tessellation can expand up to 1,408 x 1,152.

The examples above utilize B-Spline interpolation with the tessellation for rendering. The true power of tessellation comes when combined with B-Spline interpolation techniques. I’ll walk you through the key concepts and techniques for dynamically rendering B-Spline surfaces in this article. Finally, I present a sample implementation for some elementary water surface simulations in AR. You can use it as a basis for your apps with cool surface effects and simulations.

Intended Audience

I assume you are familiar with the rendering with Metal and its tessellation. If you are not yet familiar with the tessellation, I recommend you try the following tutorial at Metal by Example first and then come back here again.

This is a super nice and compact tutorial that comes with some sample code.

Organization

In this article, I touch briefly on the basic tessellation first, and then I’ll go through B-Spline and B-Spline surface so that you can write your own interpolation code in the vertex shader. In the end, I present a demo App for water surface animation that illustrates tessellation and go through some important points in the implementation. The Xcode project for the demo app and the Python scripts used in this article can be downloaded from my git repository.

Brief Description of Tessellation

In a normal rendering shader pipeline, you give it an array of vertex data such as point, color, and surface normal in a buffer, and also an array of indices into the vertex data to specify the triangles (or triangle strip, lines, or points in some cases). The vertex shader is invoked per vertex.

On the other hand, in a rendering pipeline with the tessellation, you give an array of control points similar to the vertex data and an array of indices into the control points to specify the set of control points per patch.

Figure 2: Render API Diagram (All images by author)

See Fig. 2 above for the schematic diagram of the API. The user defines the control points’ semantics, format, and size. In this article, we use 16 control points per quad-patch, and one control point consists of just one point (x, y, z, 1.0) in LCS or local coordinate system.

The vertex shader is invoked per vertex of the subdivided triangles. To be precise, the vertex shader is given all 16 control points and a local point in the (u,v) coordinates in the patch. For the quad-patch, the (u,v) is normalized within the range of (0.0, 0.0)-(1.0, 1.0). In the vertex shader, you generate the geometric data for the vertex of the triangles, such as the point in the projected coordinate system, surface normal in GCS, global coordinate system, texture coordinates, etc.

Again, if you are unfamiliar with any of those concepts, please read Introduction to Tessellation in Metal.

B-Spline and B-Spline Surface

B-Spline basics

B-Spline is a set of functions called basis functions of the same order n. Roughly speaking, the order n defines the smoothness of the functions. In this article, we distinguish the functions in the same set (i.e., the same order n) by k, or knot index. The basis functions can be best explained graphically with the graphs Fig. 3, 4, and 5 below and the following three tiny sample scripts in Python. Those scripts are found in my git repo. Those scripts plot the graphs below. You need Python 3 and Matplotlib to run those scripts.

basis_order_0.py: sample script to plot the basis functions of order 0.
basis_order_1.py: sample script to plot the basis functions of order 1.
basis_order_2.py: sample script to plot the basis functions of order 2.

Fig. 3, 4, and 5 show sample basis functions of order 0, 1, and 2, respectively. Each basis function is determined by the points called ‘knots’ indexed by ‘know index.’ Those three graphs share the same knots at the following points. The knots are shown as green or orange dots in the graphs.

[-15.0, -13.0, -11.5, -7.0, -5.0, -1.0, 3.0, 4.0, 4.5, 5.5, 7.0, 9.5, 13.0, 15.0]

See the function definitions N_order0(knots, k, u), N_order1(knots, k, u), and N_order2(knots, k, u) in the scripts. Those define the basis functions for each order. As you can see in the code, the basis functions of order n depend on the basis functions of order n-1 recursively.

Those graphs show the following characteristics of the basis functions:

The strictly positive part of each basis function starts at knot k and ends at knot k + 1 + n, where n denotes the order 0, 1, or 2.
For any u, the sum of all the basis functions is always 1.0.
For any u, there are exactly n + 1 strictly positive basis functions.
If n = 2, which we use in this article, the basis function crosses at k+2 and k+3, and peaks between k + 2 and k + 3.

This article uses only the knots at the regular unit interval (…, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5,… ). The offset 0.5 is added to the knots so that the peaks occur at (…, -1.0, 0.0, 1.0, …). In fact, with this simplification, the peak of each function conveniently occurs at u = float(k), and we can get rid of the explicit array of knots (called ‘knot vector’). Those simplified unit basis functions are implemented in the following Python scripts, which generate the plots in Fig. 6, 7, and 8.

Figure 6: Unit Basis Functions of Order 0

Figure 7: Unit Basis Functions of Order 1

Figure 8: Unit Basis Functions of Order 2

As you can see in the scripts, the basis functions are no longer defined recursively. They are piece-wise polynomials of order n. The following is a snippet from the scripts that define the basis function of order 2.

def N_order2_unit( k : int, u : float) -> float:

    '''Basis function of order 2.
       :param k: knot index.
       :param x: input
       :returns: output       
    '''

    fk = float(k) + 0.5

    if fk - 2.0 <= u and u < fk - 1.0:
        return 0.5 * ( u - (fk - 2.0) ) * ( u - (fk - 2.0) )

    if fk - 1.0 <= u and u < fk:
        return -1.0 * ( u - (fk - 1.0) ) * ( u - (fk - 1.0) ) + ( u - (fk - 1.0)) + 0.5

    if fk <= u and u < fk + 1.0:
        return 0.5 * ( u - fk ) * ( u - fk ) - ( u - fk ) + 0.5

    return 0.0

That’s pretty much it about the B-Spline basis functions.

B-Spline Curve

Now we are ready to interpolate a list of points pₖ = (uₖ,xₖ) in (u,x)-coordinates, where the u-coordinates must be consecutively aligned in integer as in pₖ = (k,xₖ). The ‘k’s represent knot indices (and hence knots themselves) for the simplified basis functions. Here’s an example of points to be interpolated.

Those points are called control points. Now we can calculate the x-coordinate of the interpolated point for any u as follows.

The lower charts of Fig. 6, 7, and 8 show sample interpolations generated by those Python scripts. In this article, we use the interpolation with the basis functions of order 2. Please note that the interpolated curves with order 2 may not coincide with the points pₖ unless three consecutive xₖ have the same value. This is obvious because the peak of the basis functions never reaches 1.0 for order 2, as shown in Fig. 8.
Since only three basis functions are strictly positive for any u for the basis functions of order 2, if u ∈ [−0.5 + k, 0.5 + k], then the summation is only over the following three basis functions and their associated control points.

This is pretty much all there is to know about the basics of the B-Spline interpolation in one dimension. If we expand the control points by one more dimension as in pₖ = (k, xₖ, yₖ ) in (u, x, y)-coordinates, then we can get a B-Spline curve ( x_{interp}(u), y_{interp}(u) ) in the parametric form of u on the two-dimensional surface. If we expand the control points by two dimensions as in pₖ = (k, xₖ, yₖ, zₖ ) in (u, x, y, z)-coordinates, then we can get a B-Spline curve ( x_{interp}(u), y_{interp}(u), z_{interp}(u) ) in the parametric form of u in the three-dimensional space.

Let’s move on to the B-Spline surface interpolation in the next section.

B-Spline Surface

We can expand the same argument to a 2D surface in the 3D space as follows:

Fig. 9 shows an example of a 3×3 patch (4×4 control points) where the 11 x 11 points in the center patch are interpolated from the surrounding 4×4 control points. It is generated by the Python script 2d_patch.py. The pale blue triangles represent the grid of 4×4 control points, and B-Spline interpolates the brown surface in the center patch.

Here we have to distinguish three coordinate systems:

The 2-dim integer [i,j]-coordinates that express the knots and the location of the control points. In this example, the integer [i,j]-coordinates are in the range of [0,0]-[3,3], in which one control point is assigned at each [i,j].
The 2-dim real (u,v)-coordinates in the range of (0.0,0.0)-(1.0,1.0) in the case of Metal. Those values are used to offset the quad patch you want to interpolate. For example, if you want to interpolate the center patch [1,1]-[2,2] in the [i,j]-coordinates as in the sample demo program described later, then the interpolation occurs at the coordinates (1.0 + u, 1.0 + v). The u-axis is aligned with the i-axis, and the v-axis is aligned with the j-axis.
The 3-dim real (x,y,z)-world coordinates system in which the geometric surface you want to render resides. The control points p(i,j) are in this (x,y,z)-coordinate system.

Figure 9: Interpolated Surface for the Center Patch of 3×3 Grid

Surface tangents

Since equation

is an explicit parametric surface equation of (u, v), we can take the partial derivatives of (u,v) to obtain the surface tangent.

where Dₖ(t) = dNₖ(t)/dt. Following is a sample code of Dₖ(t) in Python in 2d_patch.py.

def D_Unit_Order2( k: int, t: float ) -> float:

    '''1st derivative of the basis function: D(t) = (d/dt)N.
       :param k: knot.
       :param t: input
       :returns: output       
    '''

    fk = float(k) + 0.5

    if fk - 2.0 <= t and t < fk - 1.0:

        return 0.5 * 2.0 * ( t - (fk - 2.0) )

    if fk - 1.0 <= t and t < fk:

        return -1.0 * 2.0 * ( t - (fk - 1.0) )+ 1.0

    if fk <= t and t < fk + 1.0:

        return 0.5 * 2.0 * ( t - fk ) - 1.0

    return 0.0

If you have difficulty understanding it, imagine u has slightly moved to u + ∆u. Then p(u + ∆u, v) will slightly change from p(u, v) by (∆x, ∆y, ∆z), then the direction of the change can be expressed by ( ∆x/∆u, ∆y/∆u, ∆z/∆u ). This direction approaches the tangent ∂p/∂u as we make ∆u smaller and smaller. The same argument applies to v with ∆v. The green and blue vectors in Fig. 10 show the (normalized) tangent vectors.

Figure 10: Interpolated Tangent Vectors and Normal

Surface normal

The surface normal is found by taking the normalized cross-product of those tangents.

This comes from the normal being perpendicular to both tangent vectors. The orientation is such that (tᵤ(u, v), tᵥ(u, v), n(u, v)) forms a right-hand coordinate system. The red vectors in Fig. 10 show the normal vectors. Those vectors are calculated in the function sample_surface_tangents_and_normal_in_patch() in 2d_patch.py.

Now we have covered enough of B-Spline surface interpolation. Let’s look at the code of the sample app.

Sample App: Tessellation in Action

Figure 11: Screen Capture of Our Sample App

Now, let’s look at the sample code in action . Please download the Xcode project from my GitHub repo, build it, and run it on an iPhone or iPad. I have tested it with iPhone 13 Mini and iOS 16.3.1, and it should work on any recent iOS devices. It renders an animated water surface in the AR environment.

The app’s orientation is fixed to Portrait to simplify the code. After starting the app, hold it steady for a while to let the device calibrate for AR, and bring the camera somewhere with a flat surface for the device to detect it. After a surface has been detected, it starts showing the animated water surface on the screen. It has three controls:

The slider Factor controls the overall granularity of the tessellation of the patches rendered. The less the value, the finer the mesh becomes. The greater the value, the coarser the mesh.
The slider Intensity control the animation. The greater the value, the more agitated the surface gets.
The toggle switch Show Wireframe shows the wireframe of the triangulated meshes. The wireframe shows the underlying tessellation more clearly. If you move the camera around, you can see how the triangulation of the patches changes depending on the Factor and the distance of the patch from the camera.

Code organization

Before going into the details of the parts of the code for the tessellation, let me give you an overview of the code organization. The demo App is based on the template in my repo to use ARKit with SwiftUI instead of Storyboard.
Take a look at Fig. 11.

There are three important classes in the demo:

TessellationRender takes care of the rendering of the surface with tessellation. The main topic of this article.
TessellationFactors takes care of finding the tessellation factors for the patches. This has been isolated from TessellationRender for better readability of the code.
WaterSurfaceAnimator takes care of the control points. It constructs the control points and updates repeatedly to animate the surface.

Those three objects are managed by TessellationMain, which coordinates the communication among the three objects and with WorldManager.
Before going into the code for those classes, let me briefly explain the overall architecture of the template code for ARKit + SwiftUI.

Template code for ARKit + SwiftUI

The template is found in my GitHub repo. The top-level class is TessellationSampleApp, which instantiates WorldManager and ContentView. ContentView instantiates MetalView, which is a SwiftUI adaptor to MTKView. MetalView instantiates TouchMTKView and ARCoordinator. TouchMTKView is derived from MTKView to handle screen touches and provide a canvas for rendering to Metal.

ARCoordinator takes care of all the AR-related tasks and events. It also renders the images captured by the camera onto the screen in TouchMTKView, on which the water surface will be super-imposed. All the events, such as the user inputs to SwiftUI, Metal-related, and AR-related events are sent to WorldManager, which coordinates actions with TessellationMain. The code for each class is relatively small, and it should be easy for you to read through the entire code and understand what’s going on.

Arrangement of the control points and the indices

Probably the most cumbersome but important, yet not well-documented part is the arrangement of the control points and the indices to the control points. It is best explained graphically in Fig. 13.

Figure 13: Arrangement of Control Points and Indices

Conceptually, the control points are laid out as a grid of N x M, as shown in the upper part of Fig. 13. There are in total (N-3)*(M-3) patches, each of which consists of 4×4 control points. As you can see, the same control point can appear up to 16 patches. In the buffer, the control points are arranged in a one-dimensional array in the column-major format as in CP_{buf} in Fig. 13. Each patch is specified in the 16 consecutive elements in the index array as in Index_{buf}. Since there are (N-3)*(M-3) patches, the length of Index_{buf} is (N-3)*(M-3)*16.

In the sample code, the buffer of the control points CPbuf corresponds to the following:

WaterSurfaceAnimator.buffersControlPoints
TessellationFactors.bufferControlPoints
TessellationRender.bufferControlPoints

And the buffer of the control point indices Index_{buf} corresponds to the following:

TessellationFactors.bufferControlPointIndices
TessellationRender.bufferControlPointIndices

In the render pipeline, the vertex shader can access the control points (not the indices) as if they are arranged consecutively as an array of 16 elements in patch_control_point<CP> cp [[statge in]] as shown in Fig. 13. The vertex function is invoked in parallel many times for the same 16 control points and the patch id. The number of invocations is the number of vertices in the subdivided and triangulated quad patch. Each vertex, hence each invocation, is identified by the (u,v) coordinates given in the argument const float2 uv [[ position in patch ]].

Vertex shader

Let’s look at the vertex shader in tessellation render.metal. This is one of the most important parts of the code:

typedef struct _ControlPoint {
    float4 pos [[ attribute(0) ]]; // <= (3)
} ControlPoint;

[[ patch( quad, 16 ) ]] // <= (1)
vertex VertexOut vertex_tessellation (

    patch_control_point<ControlPoint> control_points [[ stage_in ]], // <= (2)
    constant float4x4&                M_model        [[ buffer( 1 ) ]],
    constant float4x4&                M_view         [[ buffer( 2 ) ]],
    constant float4x4&                M_proj         [[ buffer( 3 ) ]],
    const    float2                   uv             [[ position_in_patch ]] // <= (4)

) {
    const auto u = 1.0f + uv.x; // [1.0 <= u <= 2.0] for the center patch <= (5)
    const auto v = 1.0f + uv.y; // [1.0 <= v <= 2.0] for the center patch <= (5)

    float3 P_interp { 0.0f, 0.0f, 0.0f };
    float3 T_u      { 0.0f, 0.0f, 0.0f };
    float3 T_v      { 0.0f, 0.0f, 0.0f };

    for ( int32_t j = 0; j < 4; j++ ) { // along j-axis

        const auto Nj = N_Unit_Order2( j, v );
        const auto Dj = D_Unit_Order2( j, v );

        for ( int32_t i = 0; i < 4; i++ ) { // along i-axis

            const auto Ni = N_Unit_Order2( i, u );
            const auto Di = D_Unit_Order2( i, u );

            const auto P_ij = control_points[ j * 4 + i ].pos.xyz;

            P_interp += ( P_ij * ( Ni * Nj ) );
            T_u      += ( P_ij * ( Di * Nj ) );
            T_v      += ( P_ij * ( Ni * Dj ) );
        }
    }

    const auto N_lcs = normalize( cross(T_u, T_v) ); // <= (6)

    const auto N_gcs = M_model * float4( N_lcs,    0.0f );
    const auto P_gcs = M_model * float4( P_interp, 1.0f );

    return VertexOut {
        .position         = M_proj * M_view * P_gcs,
        .world_position   = P_gcs.xyz / P_gcs.w,
        .normal           = N_gcs.xyz
    };
}

The attribute at (1) is to indicate that this function is for tessellation with quad patches, and each patch is given 16 control points. The parameter (2) is the spacial stage_in argument for the control points. In this vertex shader, you can access the control points in the array control_points as if the elements were arranged in the order specified in controlPointIndexBuffer of drawIndexedPatches().

The type of control points is specified in the bracket ControlPoint, which has only one member with an attribute index 0 at (3). This corresponds to the vertex descriptor given to the render pipeline descriptor in TessellationRender.createRenderPipelineState(). The parameter uv at (4) is another special argument that specifies one point inside the triangulated quad patch. It is normalized in (0.0, 0.0) − (1.0, 1.0). Since we need the (u,v)-coordinates for the center patch specified by [1, 1] − [2, 2] in the [i,j]-coordinates, we add 1.0 to each of u and v at (5).

In the body of the function, we calculate the surface point and the tangents to P_interp, T_u, and T_v in the local coordinate system. The normal N_lcs is calculated by taking the cross product of T_u and T_v at (6). The rest is the standard coordinate transformation by the model, view, and projection matrices.

Fragment shader

fragment float4 fragment_tessellation(
    VertexOut          in       [[ stage_in  ]],
    constant float4x4& M_camera [[ buffer( 1 ) ]]
) {
...
    // diffuse
    const auto cos_theta = dot( in.normal, light_direction * -1.0f );
    const auto diffuse_coeff = cos_theta * cos_theta; // hack to emphasize the surface gradient <= (7)
    const auto diffuse_color = material_color * diffuse_coeff * 0.7f;
...
}

There is not much to comment on the fragment shader as there is no special treatment required for the tessellation. The function is fragment_tessellation() in tessellation_render.metal. It is a simplified standard ambient-diffuse-specular model using the vertex normal we calculated in the vertex shader. One thing to note is that the diffuse coefficient is changed to use cos2(x) instead of cos(x) to emphasize the slight change on the surface gradient when it’s almost flat at (7).

Compute shader for tesellation factors

This is an extra task with a compute shader for the rendering with the tessellation. This task calculates the six tessellation factors per patch, which control the granularity of the triangulation. The closer the patch to the camera, the finer the granularity. The further the patch, the coarser the triangulation. The calculated factors are then given to setTessellationFactorBuffer() of the encoder in the render pipeline. The per-patch data is defined by MTLQuadTessellationFactorsHalf in Metal, which consists of six half-floats. (Apparently, there is no counterpart structure defined in Swift.)

kernel void find_tessellation_factors(

    constant     TessellationConfig&
                           config                [[ buffer( 0 ) ]],
    constant     float4x4& M_model               [[ buffer( 1 ) ]],
    constant     float4x4& M_view                [[ buffer( 2 ) ]],
    device const float4*   control_points        [[ buffer( 3 ) ]],
    device const int32_t*  control_point_indices [[ buffer( 4 ) ]],
    device       MTLQuadTessellationFactorsHalf*
                           factors               [[ buffer( 5 ) ]],
    uint                   tid                   [[ thread_position_in_grid ]]

) {
    if ( static_cast<int32_t>( tid ) < config.num_patches ) {

        const auto Mvm = M_view * M_model;

        const int32_t control_point_indices_begin = tid * 16;
        const auto i_09 = control_point_indices[control_point_indices_begin +  9 ];// <= (8)
        const auto i_05 = control_point_indices[control_point_indices_begin +  5 ];// <= (8)
        const auto i_06 = control_point_indices[control_point_indices_begin +  6 ];// <= (8)
        const auto i_10 = control_point_indices[control_point_indices_begin + 10 ];// <= (8)

        const auto p_09_gcs = ( Mvm * control_points[ i_09 ] ).xyz;// <= (9)
        const auto p_05_gcs = ( Mvm * control_points[ i_05 ] ).xyz;// <= (9)
        const auto p_06_gcs = ( Mvm * control_points[ i_06 ] ).xyz;// <= (9)
        const auto p_10_gcs = ( Mvm * control_points[ i_10 ] ).xyz;// <= (9)

        const auto dist_09_05 = length( (p_09_gcs + p_05_gcs ) * 0.5f ); // edge factor [0] <= (10)
        const auto dist_05_06 = length( (p_05_gcs + p_06_gcs ) * 0.5f ); // edge factor [1] <= (10)
        const auto dist_06_10 = length( (p_06_gcs + p_10_gcs ) * 0.5f ); // edge factor [2] <= (10)
        const auto dist_10_09 = length( (p_10_gcs + p_09_gcs ) * 0.5f ); // edge factor [3] <= (10)

        const auto dist_avg = ( dist_09_05 + dist_05_06 + dist_06_10 + dist_10_09 ) / 4.0f;

        const auto F = static_cast<float>( config.max_tessellation_factor );

        factors[tid].edgeTessellationFactor[0] = min( F, max( 1.0f, F / ( dist_09_05 * config.factor_coeff ) ) );
        factors[tid].edgeTessellationFactor[1] = min( F, max( 1.0f, F / ( dist_05_06 * config.factor_coeff ) ) );
        factors[tid].edgeTessellationFactor[2] = min( F, max( 1.0f, F / ( dist_06_10 * config.factor_coeff ) ) );
        factors[tid].edgeTessellationFactor[3] = min( F, max( 1.0f, F / ( dist_10_09 * config.factor_coeff ) ) );

        factors[tid].insideTessellationFactor[0] = min( F, max( 1.0f, F / ( dist_avg * config.factor_coeff ) ) );
        factors[tid].insideTessellationFactor[1] = min( F, max( 1.0f, F / ( dist_avg * config.factor_coeff ) ) );
    }
}

Let’s take a look at the shader code. That’s the function find_tessellation_factors() in tessellation_factors.metal. It’s invoked per patch, i.e., one thread ID is assigned per patch. The first thing to do is to find the four control points surrounding the center quad of the grid formed by the 16 control points. The indices surrounding the center patch can be found in Fig. 14.

Figure 14: Arrangement of Control Point Indices Per Patch

The offsets from the beginning are 5, 6, 10, and 9 in a counterclockwise pattern. Part (8) retrieves those four indices. Then four control points in the view coordinate system are retrieved from those indices at (9). In the view coordinate system, the camera is located at the origin (0,0,0). Then the distance to the midpoints of each of the four edges formed by the four control points is calculated at (10). The rest of the code calculates the tessellation factors based on those distances.

Compute shaders for animation

This is for mimicking the water surface by repeatedly updating the height (y-component) of the control points. There are two shaders. The first one is to initialize the control points. Since the buffers for the control points are created with .private attributes, the best way to construct the initial control points in those buffers is to use a compute shader. The shader is invoked per control point, i.e., the thread is assigned to each control point.

The grid is made such that the z-axis corresponds to the i-axis and u-axis (denoted by the primary axis at (11)), the x-axis corresponds to the j-axis and v-axis (denoted by the secondary axis at (12)), and the y-axis is along the height/depth of the water surface. The height is initialized to zero. We arrange those three private buffers for the control points as a circular buffer: previous, current, and new (13).

kernel void water_surface_init(

    constant WaterSurfaceAnimationConfig&
                   config              [[ buffer( 0 ) ]],
    device float4* control_points_prev [[ buffer( 1 ) ]], // <= (13)
    device float4* control_points_cur  [[ buffer( 2 ) ]], // <= (13)
    device float4* control_points_new  [[ buffer( 3 ) ]], // <= (13)
    uint           tid                 [[ thread_position_in_grid ]]

) {
    const auto w = config.width;
    const auto h = config.height;

    const auto hw = static_cast<float>( w ) * 0.5f;
    const auto hh = static_cast<float>( h ) * 0.5f;

    const auto i = static_cast<float>( tid % w );
    const auto j = static_cast<float>( tid / w );

    if ( static_cast<int32_t>( tid ) < w * h ) {

        const float4 p{
            ( j - hh ) * config.edge_length, // secondary axis. <= (12)
            0.0f,                            // height
            ( i - hw ) * config.edge_length, // primary axis. <= (11)
            1.0f
        };

        control_points_prev[ tid ] = p;
        control_points_cur [ tid ] = p;
        control_points_new [ tid ] = p;
    }
}

The second shader is repeatedly invoked to update the control points for animation. The update is based on a simplified surface wave equation.

The height of each control point is updated from its previous value and four surrounding points.

kernel void water_surface_animate(

    constant WaterSurfaceAnimationConfig&
                         config              [[ buffer( 0 ) ]],
    device const float*  agitation           [[ buffer( 1 ) ]],
    device const float4* control_points_prev [[ buffer( 2 ) ]],
    device const float4* control_points_cur  [[ buffer( 3 ) ]],
    device float4*       control_points_new  [[ buffer( 4 ) ]],
    uint                 tid                 [[ thread_position_in_grid ]]

) {
    const auto w = config.width;
    const auto h = config.height;

    const int32_t i = tid % w;
    const int32_t j = tid / w;

    if ( static_cast<int32_t>( tid ) < w * h ) {

        const auto iC = j * w + i; // center
        const auto iN = clamp( j + 1, 0, h - 1 ) * w + i; // north
        const auto iS = clamp( j - 1, 0, h - 1 ) * w + i; // south
        const auto iE = j * w + clamp( i + 1, 0, w - 1 ); // east
        const auto iW = j * w + clamp( i - 1, 0, w - 1 ); // west

        const auto C = control_points_cur[ iC ].y + agitation[ iC ];
        const auto N = control_points_cur[ iN ].y + agitation[ iN ];
        const auto S = control_points_cur[ iS ].y + agitation[ iS ];
        const auto E = control_points_cur[ iE ].y + agitation[ iE ];
        const auto W = control_points_cur[ iW ].y + agitation[ iW ];

        const auto Cprev = control_points_prev[ iC ].y;

        control_points_new[ iC ].y = ( C * 2.0f + ( E + W + N + S - C * 4.0f ) * config.coeff - Cprev ) * config.decay;
    }
}

Render pipeline arrangement

We have covered the shader part. Let’s take a look at the pipelines in Swift. Especially the render pipeline, which has some key differences from the normal render pipeline. Let’s look at the function createRenderPipelineState() below, which builds the render pipeline once the pixel formats of the destination become available. The main difference is in the vertex descriptor.

Please take a look a the code marked by (19). We only have one attribute of type float4 for the control points. Other differences are the five new parameters for the tessellation in (20).

tessellationFactorStepFunction = .perPatch: the factors specified per patch.
maxTessellationFactor = maxNumTessellationFactor: max number of divisions. The hardware limit is 64.
tessellationPartitionMode = .pow2: the edges and area are split recursively by 2.
tessellationOutputWindingOrder = .counterClockwise: the vertices of the subdivided triangles are assumed to be in CCW.
tessellationControlPointIndexType = .uint32: the index array is of type uint32 t.

Now, let’s look at the function TessellationRender.encode() in TessellationRender.swift, which is called repeatedly, mostly at the frame rate.

    public init( ... ) {
        ...
        var indirectArguments = MTLDrawPatchIndirectArguments( // <= (17)
            patchCount:    UInt32(numPatches), // <= (21)
            instanceCount: 1,
            patchStart:    0,
            baseInstance:  0
        )
        ...
        bufferIndirectArguments = device.makeBuffer( // <= (18)
            bytes:   &indirectArguments,
            length:  MemoryLayout<MTLDrawPatchIndirectArguments>.stride,
            options: .storageModeShared
        )
        ...
    }

    public func createRenderPipelineState( ... )
    {
        ...
        let vertexDescriptor = MTLVertexDescriptor() // <= (19)
        vertexDescriptor.attributes[0].format      = .float4 // <= (19)
        vertexDescriptor.attributes[0].offset      = 0 // <= (19)
        vertexDescriptor.attributes[0].bufferIndex = 0 // <= (19)
        vertexDescriptor.layouts[0].stride       = MemoryLayout<SIMD4<Float>>.stride // <= (19)
        vertexDescriptor.layouts[0].stepFunction = .perPatchControlPoint // <= (19)
        pipelineDescriptor.vertexDescriptor                  = vertexDescriptor // <= (19)
        ...
        pipelineDescriptor.tessellationFactorStepFunction    = .perPatch // <= (20)
        pipelineDescriptor.maxTessellationFactor             = maxNumTessellationFactor // <= (20)
        pipelineDescriptor.tessellationPartitionMode         = .pow2 // <= (20)
        pipelineDescriptor.tessellationOutputWindingOrder    = .counterClockwise // <= (20)
        pipelineDescriptor.tessellationControlPointIndexType = .uint32 // <= (20)
        ..
    }

    public func encode(
        encoder       : MTLRenderCommandEncoder,
        controlPoints : MTLBuffer,
        ...
    ) {
        ...
        encoder.setTessellationFactorBuffer(
            bufferTessellationFactors, offset: 0, instanceStride: 0) // <= (14)
        encoder.setVertexBuffer( controlPoints, offset: 0, index: 0 )
        ...
        encoder.drawIndexedPatches( // <= (15).
            numberOfPatchControlPoints:    16,
            patchIndexBuffer:              nil,
            patchIndexBufferOffset:        0,
            controlPointIndexBuffer:       bufferControlPointIndices,
            controlPointIndexBufferOffset: 0,
            indirectBuffer:                bufferIndirectArguments, // <= (16)
            indirectBufferOffset:          0
        )
        ...
    }

There are two key differences. One difference is at (14) where the buffer for the tessellation factors is set by .setTessellationFactorBuffer( ). The tessellation factors have been calculated in finding the tessellation factors() explained above. The other is the draw call drawIndexedPatches() at (15), which takes an indirect argument in MTLBuffer at (16). It is set up once and for all in init() at (17) and (18) and does not change. The most important parameter is patchCount at (21), which specifies the number of quad patches to render.

A bit about AR

The demo app uses the basic functionalities of ARKit to make it more fun. Using AR may be a better way to look at 3D objects from different angles than using some sliders and 3D wheels, and other controls. All the AR-related things are done in the ARCoordinator object. It uses just the bare minimum functionality for the demo. It does not manage the AR session properly, but it just monitors the state of the camera. I’ll briefly go through how the demo uses AR in iOS. Let’s look at the code snippet below:

import ARKit
class ARCoordinator: ... ARSessionDelegate {
    ...
    var arSession:                  ARSession!
    init(...) {
        ...
        arSession = ARSession() // <= (22)
        arSession.delegate = self
    }

    func mtkView(...) {
        ...
        let arConfiguration = ARWorldTrackingConfiguration()  // <= (22)
        arConfiguration.planeDetection = [ .horizontal ] // <= (22)
        arSession.run( arConfiguration ) // <= (22)
    }

    func draw(...) {
        guard let currentFrame = arSession.currentFrame else {
            return
        }
        if arSession.currentFrame?.camera.trackingState != .normal { // <= (23)
            surfaceTransform = nil
            return
        }
        if surfaceTransform == nil {
            for anchor in arSession.currentFrame!.anchors { // <= (24)
                if let planeAnchor = anchor as? ARPlaneAnchor {
                    surfaceTransform = planeAnchor.transform // <= (24)
                    break
                }
            }
        }
        ...
        let V = arSession.currentFrame!.camera.viewMatrix( for: .portrait )  // <= (25)
        let P = arSession.currentFrame!.camera.projectionMatrix( for: .portrait, ...) // <= (25)
        let T = arSession.currentFrame!.camera.transform // <= (25)
        ...
    }
    ...
}

Create and configure a session to detect surfaces at (22).
Monitor the state of the camera tracking at (23).
Try to get one detected surface and keep it at (24).
Get the current camera location and the orientation at (25). It also provides convenience properties for the camera’s view and projection matrices.

Recap

In this article, we have covered all the important points for using the tessellation with B-Spline from some theories to the code in the sample app. In my view, there are two crucial points we should understand clearly as follows:

Clear distinction of [i,j], (u,v), and (x,y,z) coordinate spaces.
The arrangement of the control points and the indices in the Metal buffers.

I hope those points are clear now after you have read this article this far.

Also, as a practical tip, we should keep the implementation of the basis functions as simple as possible, like the ones we used above. We should not add special conditions to the basis functions to treat the endpoints, as it may unnecessarily complicate the code and make it harder to debug. Instead, I think we should repeat the same values at both ends. Anyway, I hope you enjoyed this article and the sample app, and I hope you will make some apps with the tessellation. I appreciate your feedback.

Happy programming.

References

There are already a few good tutorials available for the basics of tessellation. Here they are:

Introduction to Tessellation in Metal. This is a super nice compact tutorial that comes with some sample code.
Metal by Tutorials: 11. Tessellation & Terrains. This is behind a paywall.
Official guide by Apple.

The sample app is based on the following template:

Template Xcode Project for ARKit, Metal, and SwiftUI (without Storyboard)

An article on shallow water simulation.

A numerically efficient and stable algorithm for animating water waves, A.T. Layton, M. van de Panne, Visual Comp 18, 41–53 (2002).

The program for this video with the duckling is based mainly on this article.

Advanced Tessellation with Metal: Dynamic B-Spline Surface was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.