Do you know how to achieve perfection? Just combine Midjourney, Stable Diffusion, depth maps, face morphing, and a little bit of 3D
For some time, I have been working on an internal project whose output, among many other things, is point clouds. When I was thinking about a proper visualization tool for those clouds, I made a list of things it should be able to do:
- Customization
- Ability to pass custom vertex/fragment shaders
- Easy to share
The first two points are kinda satisfying in terms of today’s tools. However, the third one is an issue. Ideally, it should be so streamlined that the other party could just open the browser and preview the data. Then I got an idea — as always, a bad one. I decided to write my own visualizer in Three.js. Customization, check. Custom shaders, check. Easy to share, check. Another advantage is that I can “easily” migrate my code to, i.e., OpenGL and scale the performance on a more powerful HW later.
As always, this side project did not end just with visualization. Since I wanted to make this interesting for others, in the next 15 minutes, I will tell you something about depth maps/MiDaS, face recognition/TensorFlow, Beier–Neely morphing algorithm, and the parallax effect. Don’t worry, there will be plenty of images and videos as well. This article is not a detailed tutorial but rather a showcase of a few concepts. However, it should be good enough as study material with the GitHub repository.
About Point Clouds and Three.js
In general, a point cloud is just a set of points in space. Such point clouds could be obtained via lidars or even modern iPhones. Top lidars can generate several millions of points per second, so optimization and usage of GPU are a must for proper visualization. I might write another article about visualization approaches concerning dataset size, but for now, Three.js and WebGL are good enough. Hopefully, soon, we should be able to use WebGPU as well — it is already available, but nothing official so far in terms of Three.js.
Three.js gives us Points 3D object that is internally rendered with flag gl.POINTS. The Points class expects two parameters, Material and Geometry. Let’s have a closer look at both of them.
Geometry
There are a lot of helper geometries like Plane of Sphere that allow you to do faster prototypes. However, I decided to use BufferGeometry. Although creating something meaningful takes you more time, you have full control over everything geometry-related. Let’s have a look at the snippet below:
if (this.guiWrapper.textureOptions.textureUsage) {
geometry.setAttribute('uv', new THREE.BufferAttribute(pointUv, this.UV_NUM))
} else {
geometry.setAttribute('color', new THREE.BufferAttribute(pointsColors, this.COL_NUM))
}
geometry.setAttribute('position', new THREE.BufferAttribute(pointsToRender, this.POS_NUM))
geometry.morphAttributes.position = []
geometry.morphAttributes.color = []
// Add flat mapping
geometry.morphAttributes.position[0] = new THREE.BufferAttribute(pointsToRenderFlat, this.POS_NUM)
geometry.morphAttributes.color[0] = new THREE.BufferAttribute(pointsColors, this.COL_NUM)
// Add "natural" mapping
geometry.morphAttributes.position[1] = new THREE.BufferAttribute(pointsToRenderMapped, this.POS_NUM)
geometry.morphAttributes.color[1] = new THREE.BufferAttribute(pointsColors, this.COL_NUM)
As you can see, we are passing several arrays as BufferAttributes.
Everything related to this geometry is represented as a linear buffer. Apart from the position, we are also passing UV or color buffers. UV buffer is used in case we pass color data as a texture in uniforms (basically global data available to every vertex), and color buffer is used in case we want to pass RGB data about every vertex directly.
There are also other buffer types, however. Do not forget about morphAttributes. You can use these buffers as an interpolation target for every vertex in geometry for animations. Apart from animation, I am using these buffers as gallery pages, as we will see later. Thanks to that, you do not have to create several separate objects in memory.
In simple cases like 2D images, you could use a depth map as a displacement map, but in my use case, I need more complex behavior for the point clouds and 3D models.
Material
Again, there are a lot of predefined materials in Three.js, but in our case, I used ShaderMaterial since we can provide custom shaders and have more flexibility and better performance. The next snippet shows both basic vertex and fragment shader. Some of the includes are Three.js specific, so you can leverage API accessible on the TypeScript side. Since their code is available on GitHub, you can always check whatever is needed.
export const VERTEX_SHADER = `
#ifdef GL_ES
precision highp float;
#endif
uniform float size;
uniform float scale;
#define USE_COLOR
#define USE_MORPHCOLORS
#include <common>
#include <color_pars_vertex>
#include <morphtarget_pars_vertex>
attribute vec3 color;
varying vec2 vUv;
uniform float u_time;
uniform float u_minZ;
uniform float u_maxZ;
uniform float u_scale;
uniform vec3 u_camera_angle;
uniform int u_parallax_type;
void main()
{
#include <color_vertex>
#include <begin_vertex>
#include <morphtarget_vertex>
vColor = color;
vUv = uv;
gl_PointSize = 2.0;
#if defined( MORPHTARGETS_COUNT )
vColor = morphTargetBaseInfluence;
for ( int i = 0; i < MORPHTARGETS_COUNT; i ++ ) {
if ( morphTargetInfluences[ i ] != 0.0 ) vColor += getMorph( gl_VertexID, i, 2 ).rgb morphTargetInfluences[ i ];
}
#endif
if (u_parallax_type == 1) {
transformed.x += u_scale*u_camera_angle.x*transformed.z*0.05;
transformed.y += u_scale*u_camera_angle.y*transformed.z*0.02;
} else if (u_parallax_type == 2) {
transformed.x += transformed.z*cos(0.5*u_time)*0.01;
transformed.y += transformed.z*sin(0.5*u_time)*0.005;
}
vec4 mvPosition = modelViewMatrix * vec4( transformed, 1.0 );
gl_Position = projectionMatrix * mvPosition;
}
`;
export const FRAGMENT_SHADER = `
#ifdef GL_ES
precision highp float;
#endif
varying vec3 vColor;
varying vec2 vUv;
uniform sampler2D u_texture;
uniform bool u_use_texture;
void main()
{
if (u_use_texture) {
gl_FragColor = texture2D(u_texture, vUv);
} else {
gl_FragColor = vec4( vColor, 1.0 );
}
}
`;
In the previous section, I mentioned color and uv buffers. As you can see, in the fragment shader, we are using vColor in case a color buffer was passed, or they want to use UV coordinates to map textures. Also, you can see that texture is declared uniform and vColor as varying since it comes from the vertex attribute. My data is mostly in the JSON-like format, so it is easier to pass that attribute via a buffer, but I added a texture option as well since users might not want to convert the image to data — or they might want to see two different approaches. This variation is available in the Farm scenario. The vertex shader is mostly responsible for the parallax effect, which will be shown and discussed in the next section.
Important note regarding the live demo website. Gifs and videos seen in this article were made while the visualizer used raw / HQ assets. The live demo uses scaled-down assets by a factor of approximately 10, about 15 MB in total.
Also, while on a local env, I can afford to load raw data and wait a few seconds, the live demo is using some internal shortcuts to make the experience better. For example, geometry construction is partially offloaded to Web worker or loading assets as textures and loading them into offscreen canvas, so I can read ImageData and fill up basic and morphing buffers. Desperate times call for desperate measures. :]
Let’s Make Those Clouds Dance!
In the next few sections, I will tell you more about used concepts and describe demo scenarios you can try later. I am not going to describe pipelines like [Image set -> Face recognition -> Data export -> Three.js integration] in detail, but you should be able to do it yourself by following the provided links.
MiDaS and the parallax effect
MiDaS is an amazing machine-learning model that can calculate depth maps from a single image (monocular depth estimation). It was trained on multiple datasets, gives nice results, and you can use it on your device. Most of the setup is done via conda, and you have to download the desired model (up to 1.3 GB). What a time to be alive. You can easily leverage the knowledge of a bunch of clever people at home. Once you have an image and its respective depth map, nothing stops you from using that in-point cloud renderer, as seen in the next video:
But wait, there is more! Text2Image, Image2Image, Text2Mesh, etc., models are trending, and we can not stay behind. The next video shows a few AI-generated images (Midjourney and Stable Diffusion) processed through MiDaS and visualized. (When images are changing, we are only swapping morphing buffers mentioned earlier):
You probably realized that images are moving. That’s the work of our vertex shader. There are two types of parallax effects. The first one is tracking the camera position and morphing the cloud so the center of the image stares directly at the camera. This effect works kinda nice, but it should be more sophisticated because gravity is not always positioned to the center. I am planning on writing a better approach soon. The code snippet below shows the passing of parallax params to uniforms so every vertex can adjust accordingly. Ultimately, this will be moved to shaders as well.
//Will be simplified once testing / prototyping is finished
const imgCenterPoint = this.geometryHelper.imgCenterPoints[this.guiWrapper.renderingType.getValue()][this.interpolationOptions.frame]
this.lastCameraPosition = this.camera.position.clone()
let angleX = this.camera.position.clone().setY(0).angleTo(imgCenterPoint)
let angleY = this.camera.position.clone().setX(0).angleTo(imgCenterPoint)
let normalX = new THREE.Plane().setFromCoplanarPoints(new THREE.Vector3(), this.camera.position.clone().setY(0), imgCenterPoint).normal
let normalY = new THREE.Plane().setFromCoplanarPoints(new THREE.Vector3(), this.camera.position.clone().setX(0), imgCenterPoint).normal
this.parallaxUniforms.u_scale.value = 1 + this.currentSceneObject.scale.z
this.parallaxUniforms.u_camera_angle.value = new THREE.Vector3(-angleX*normalX.y, angleY*normalY.x, 0)
The second type of parallax effect moves points in a circle or ellipse, and the magnitude is weighted linearly by depth, which means closer objects are moving faster. I also experimented with inverted exponential decay/envelope similar to energy decay, but it did not look amazing. I thought it might be pleasing since we are used to this kind of behavior as humans. The above-mentioned video examples are accessible via the Farm and ai_images folders in the Rendering type UI dropdown.
Image morphing and Koalas
Depth maps and AI-generated images are cool and all, but we also need to give some respect to nature. There is a scenario named Koala with two features: image morphing between two images and “nature” image representation. Let’s start with the latter. In this scenario, colors are mapped into the z-plane as 24-bit numbers and then normalized.
There is something magical about spatial image representation, especially because it gives us the possibility to feel objects, and it also gives us a closer intuition into why neural networks for object recognition works. You can almost feel the characterisation of objects in space, as shown in this video:
Regarding the morphing, in this scenario, we have two different koalas and ten morphing intermediate images — you can browse through them via a slider or let WebGL animate them in a bouncy way. Images between originals were generated via THE Beier–Neely morphing algorithm. For this algorithm, you need to map a 1–1 set of lines where each line represents some feature like eyes or chin.
I will be talking about the algorithm in more detail in the next section. In the video below, you can see there are some weird fragments because I used just ten-line features and wanted to see the results. But even when imperfect, the animated version is pretty nice, especially the morphing scarf— very psychedelic:
Image morphing, portrait depth API and waifus
It is time to combine everything above and look at a more complex example. In this section, we will interpolate between two images with depth maps. As I mentioned, MiDaS is generally amazing, but when you want to get a depth map of the human face, you need a more detailed model trained on human faces. That does not mean you cannot get a pretty good depth map from MiDaS, though.
Earlier this year, TensorFlow presented Portrait Depth API, and similarly to MiDaS, this project can run on your device. For this section, I selected Anne Hathaway and Olivia Munn as the morphing targets. In the dropdown, you can find this section, which is named Waifus. Don’t ask me why. Before morphing, let’s have a look at Anne’s depth map as a point cloud:
We can see that the model gave us pretty good results. We would not be able to create a real-like mesh of the human face, but otherwise, there are nice details like teeth volume, the head’s at a different level than the torso, and the facial relief is also accurate. One could say there is an image glitch because it seems like Anne has a second chin, but we know it is not true.
As Nietzsche said, “Only people with double chins are those looking down into the abyss of their failing lives.” Also, never trust any quote on the internet. :]
Portrait Depth API works in two steps. First, it detects face/hair/neck and other close-to-face parts (I am not sure how it is weighted. This sentence comes from my experience while messing around with the model). After that, it creates a black mask on the rest of the image and finally produces a depth map. The masking step is not always accurate, so the final depth map has sharp noise on the edge. Fortunately, it creates good noise — it can be removed in the same walkthrough while writing to buffers. I wrote a heuristic to remove the noise, which worked on the first try.
removeNoise(depthMapColor: number, depthmap: Array<Array<number>>, i: number, j: number, width: number, height: number, pointPointer: number, pointsToRender: Float32Array) {
if (depthMapColor != 0) {
const percentage = depthMapColor/100
let left = depthMapColor
let right = depthMapColor
let top = depthMapColor
let down = depthMapColor
const dropThreshold = 5*percentage
if (j > 0) left = depthmap[i][j-1]
if (j < width-1) right = depthmap[i][j+1]
if (i > 0) top = depthmap[i-1][j]
if (i < height-1) down = depthmap[i+1][j]
if(Math.abs(left - depthMapColor) > dropThreshold || Math.abs(right - depthMapColor) > dropThreshold) {
pointsToRender[pointPointer*3 + 2] = 0
}
else if(Math.abs(top - depthMapColor) > dropThreshold || Math.abs(down - depthMapColor) > dropThreshold) {
pointsToRender[pointPointer*3 + 2] = 0
} else {
// extra scale
pointsToRender[pointPointer*3 + 2] = 3*(1 - depthMapColor)
}
}
}
It basically checks the closest neighbors in pairs (left-right and top-bottom), and if there is a big difference/huge derivation, the pixel is considered as noise and sent to the background. In this particular case, I have to say that point cloud representation was super helpful because I was able to see noise as a 3D shape.
Since the morphing of koalas produced fragments, when working on morphing the faces, I decided to use TensorFlow and their face/features recognition model. I used about 80 feature lines, eight times more than for koalas. Here’s another video:
The result is much better, and it looks really cool, even when using depth maps! There is one glitch with Olivia’s hair, but nothing that can’t be fixed later. When you animate this morphing, each frame cuts away a chunk of pixels. This is because the background is considered to be flat but could also be fixed in post-processing. We would need to map that chunk to the smooth function and accelerate the animation based on the distance from the head.
Image morphing, in general
Regarding the Beier–Neely morphing algorithm, I could not find any parallelized or GPU-accelerated implementation, so computing intermediate frames on large images like 2k*2k pixels with tens of feature lines takes a lot of time. In the future, I am planning on writing these implementations. The first implementation would be for local/server usage via CUDA, and the second would utilize GPGPU and shaders. GPGPU is especially interesting since you can be more creative.
There are several projects (i.e., DiffMorph) that accompany neural networks in the morphing process — as a speed-up of the process or as a complete process. The reasoning behind usage and experimenting with the good old deterministic algorithm is that I wanted to look into a few papers and do GPU-related implementation.
Side Projects are Great
It started as a quick prototype to visualize point clouds from my different projects, but in the end, I spent about a week testing and adding new features just for fun, and I also learned a few things on the way.
Below is a link to my GitHub with the complete code and website for a live demo. Feel free to fork or even make a PR with new ideas. I plan to add more features but do not want to spoil more.
Just a disclaimer, I am not a web developer, so I might be using some weird constructions, but I am open to criticism!
GitHub repository
Live web demo
The project is a static website without the possibility of uploading custom images. For custom usage, you need to clone the repository and provide images/depth maps yourself. If you ever used conda and npm, you should have no issues regarding the MiDaS and other tools. I might make the project dynamic and generate depth maps/morphing/3D photos on the server side, but it depends on my available time. But I, for sure, am planning on adding features and refactoring. Some scenarios cause a page to reload on the iPhone while using Safari or Chrome. I will debug that soon.
Thank you for your time and for reading this article. See you soon!
Point Clouds Visualization With Three.js was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.