Julia Tutorial for 3D Data Science

Discover the make-it-all alternative to Python, Matlab, R, Perl, Ruby, and C through a 6-step workflow for 3D point cloud and mesh processing.

A 3D point cloud of a library segmented automatically then visualize within Julia. © Florent Poux
A 3D point cloud of a library is segmented automatically then visualized within Julia. © Florent Poux

If you are always on the lookout for great ideas and new “tools” that make them easier to achieve, then you may have heard of Julia before. A very young language that is less than ten years old 👶, and that is a super cool way to get into fast scripting/coding for quickly achieving working ideas 🌼.

Supposing you are into scientific computing, machine learning, data mining, 3D data science, large-scale linear algebra, distributed and parallel computing, I think it is worth following up on this hands-on tutorial to get started with Julia and start doing fun 3D stuff!

In this newcomer tutorial, I will cut straight through the matter and provide you with a laser-focused 6-step workflow to get up and running with 3D data within the next ten minutes! Ready?

A bit of history?

Julia logo

Ha, but before, I want to highlight the ambitions of the creators of Julia. The creators of Julia, namely Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. They blogged in 2012 about their aspirations and essentially why they initiated the Julia movement:

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled. It should be as fast as C.

But do these claims hold 🤔? Is this an “I want it all” language? Well, I first discovered Julia in early 2019, when I was making a research visit to the Visual Computing Institute (Computer Graphics) in RWTH Aachen University (Deutschland).

Unsupervised Segmentation of Indoor 3D Point Cloud: Application to Object-based Classification
Florent Poux, Christian Mattes, Leif Kobbelt 3D GeoInfo Conference 2020 Point cloud data of indoor scenes is primarily…www.vci.rwth-aachen.de

Since then, I swear almost essentially through Julia! And that comes from a Pythonista/C “programmer” mindset. It is super clear, effortless to pick up, super-fast, and you can loop python scripts within, call your favorite libraries directly, and pure Julia will execute crazy fast! Indeed, Julia is compiled, not interpreted. For faster runtime performance, Julia is just-in-time (JIT) compiled using the LLVM compiler framework. At its best, Julia can approach or match the speed of C, and that is awesome 🚀! In three words, I would say that Julia is Fast, Dynamic with Reproducible environments.

⚠️ A warning, though, at the time of the writing, the documentation and tutorials are still very sparse, and the community is also pretty small compared to what you would be used to if using Python. But hey, I intend to change that a bit. Shall we dive in?

Step 1: Downloading and Installing Julia

Okay, now that you have your coffee ☕ or teacup 🍵 next to you, ready to find our way into the fog, let us dive in! First, we need to download Julia from the official website: https://julialang.org/downloads/

Note: This tutorial is made with Julia version 1.6.2, 64bits using windows, but you can use any stable release, which should work (in principle 😉).

Install the pretty stuff once the executable is downloaded until you have shiny “software” ready for your beautiful hands!

Step 2: Setting up your Julia development environment

Julia supports various editors like VSCode, Atom, Emacs, Sublime, Vim, Notepad++, and IDEs like IntelliJ. If you like Jupyter notebooks, you can also use them directly with Julia. For this tutorial, I will show you my favorite three choices to choose from.

Julia with Jupyter Notebook

It would be the first choice to demo your project in a powerpoint-style fashion or if what you intend to do is more linked to an exploratory data analysis. For this, nothing simpler, launch your freshly installed Julia executable, then a window like below should pop up.

Julia.exe makes this little window appear, which acts as a REPL for you to work with. © Florent Poux
Julia.exe makes this little window appear, which acts as a REPL for you to work with. © Florent Poux

From there, all you have to do is to type the command using Pkg and then press Enter followed by the command Pkg.add("IJulia") and Enter. The process of installing the IJulia package necessary to use Julia within Jupyter Notebooks takes around 1 minute.

Adding a package with the Pkg.add() commands results in Julia updating the registry then fetching the necessary elements to install the package in the current environment. © Florent Poux
Adding a package with the Pkg.add() commands results in Julia updating the registry then fetching the necessary elements to install the package in the current environment. © Florent Poux

Then, you can launch Anaconda Navigator (with GUI) from which Jupyter Notebook on the environment of your choosing.

Note: If you do not have Anaconda installed, you can follow the tutorial below:

Once the GUI appears, you can launch Jupyter directly. © Florent Poux
Once the GUI appears, you can launch Jupyter directly. © Florent Poux

Once opened, you can create a new notebook with Julia acting as the programming language (see below for starting a new notebook)

Once in Jupyter, you can click on new and select Julia 1.X.X for creating a new Julia-based notebook. © F. Poux
Once in Jupyter, you can click on new and select Julia 1.X.X for creating a new Julia-based notebook. © F. Poux

Julia with Atom IDE and Juno

I tend to favor the Atom+Juno combination, which allows you to benefit from an interactive REPL mode, much like what you would be used to with Spyder. If you choose to follow in these footsteps, you first need to install Atom in your system following the instructions given at https://atom.io/.

A snapshot of the GUI in ATOM. It makes the process of creating fully-fledged software much handier. © Florent Poux
A snapshot of the GUI in ATOM. It makes the process of creating fully-fledged software much handier. © Florent Poux

Then, you can launch Atom and head over to the package installation settings through Packages>Settings view>Open or the shortcut Ctrl+,. There search for juno and then click the install button for uber-juno and if not installed juno-client. Once Juno is installed, you can try opening the REPL with Juno > Open REPL or Ctrl+J Ctrl+O (Cmd+J Cmd+O on macOS), and then pressing Enter in the REPL to start a Julia session. And that is it! We are ready to code!

When searching for Juno in the bar, I look for the necessary pieces of Packages that would be useful, of which uber-juno and juno-client. © F. Poux

Note: If the REPL does not seem to launch correctly and does not display the Julia logo, head over to the Juno Client Settings from the Package menu and verify the path to tour Julia executable.

The path to my Julia executable. © Florent Poux
The path to my Julia executable. © Florent Poux

Bonus: Julia with Google Colab

You can also use a google Colab environment, but that needs a specific block of code to use Julia instead of Python. For this, you will find at this link a template that makes it possible to work directly within Colab. It also contains the main code of this tutorial.

Note: Every time you want to use Julia on the Google Cloud, you need to run the first block, refresh the page, and continue directly to the second block without re-running the first until every element is ready for you.

Step 3: Loading datasets

Great, so now we are ready to Julia code. First, let us discover where we are working, so our current working directory, using the command pwd(). Hmm, it looks like we are in the base directory, so let us change it to a project directory that you would create where you want to store most of your project (data, results, …) with cd(“C://DEV/JULIA/TUTORIALS/3D_PROJECT_1”), followed by pwd() to check out the new path.

Okay, we are all set. Let us download a dataset, for starters, a small noisy point cloud. For this, very handy, you can use the following command:

download("https://raw.githubusercontent.com/florentPoux/3D-small-datasets/main/tree_sample.csv","point_cloud_sample.csv")

Note: The download() command first takes the link to the data you want to download, which I dropped on my GitHub account, and then specifies its name locally after download.

Great, we now have a local version of the data in our working directory that we specified with the cd() command. Now, how do we load it in the script? well, we will make use of a Package called DelimitedFiles.

Note: A Package is a handy set of functions, methods, classes, and more that allow you to build on existing code without writing everything from scratch. The DelimitedFiles package allows manipulating (e.g., read and write) delimited files like our current point cloud at hand.

For using a package, we first have to load the package manager utility by just typing using Pkg. To add a new package, it is pretty simple; we just write Pkg.add(“DelimitedFiles”), and wait until the download + requirement checks are finished.

What is great about this is that you do not have to worry about dependencies (other packages needed by the current) as all is handled for you! Cool, huh? And on top, we can create excellent packages easily to ensure reproducibility of the results, for example, and independent environments, but that is for another tutorial 😉.

Note: Managing packages is then pretty straightforward, and we have a bunch of functions to update the packages, to know their current status, if we have any conflicts between them (rare), or to load unregistered packages by other like-minded coders, even one of your future local package 😉. I usually manage them by using the REPL and entering the package manager with the command ] in the right environment. To exit the package manager, one needs just to do Ctrl+C .

Ok, now that the package is installed (you only need to run this once per environment). You can use it in your current project by typing using DelimitedFiles, and also, if there are no conflicts of function names, you do not need to write from which package a function comes. to read a delimited file DelimitedFiles.readdlm() is equivalent to readdlm().

From there, let us read the point cloud at hand and store the data in the variable pointlist:

pointlist=readdlm(“point_cloud_sample.csv”,’;’)

The first lines should look like below.

Step 4: First pre-processing operations

Okay, pretty cool up until now, hun? And now, let us see the first real surprise if you are used to other programming languages: indexing. You can try doing pointlist[0] to retrieve the first element. What are we getting? a BoundsError.

Haha, in Julia, indexes start at 1, so if you want to retrieve the first element (the X coordinate of the first point), you just input pointlist[1] that returns 41.61793137. A bit confusing at first, it is pretty handy and logical, at least from a scientific point of view 😅. So now, if you want to retrieve the first point, then you need to know that indexes work on the first axis (rows) first, followed by the second axis (column(s)), and so on. Thus, to retrieve the first point (first row and every column):

pointlist[1,:]

Very cool, and now, to get further, if we want to store the coordinates in the points variables and the normals in the normals variable, we just do:

points  = pointlist[:,1:3]
normals = pointlist[:,4:6]

Note: If you want to know the type of a variable, typeof() is what you are looking for. typeof(points)will display that we deal with Matrices, which are an alias of 2-dimensional arrays. Also Float64 is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Double precision may be chosen when the range or precision of single-precision (Float32) would be insufficient.

One last straightforward pre-processing step would be to know how to quickly sample the variable extracting, let say, 1 line every tenth. For this, we can do the following (a bit like Python, but we need to put the word end to treat the total variable):

points_sampled=points[1:10:end,:] 

We first work on rows, taking one every tenth, and for each line, we keep all columns with : following ,.

Hint: There are many ways to sample a dataset, but it is always important to balance the gain performance/results. This way is pretty quick and straightforward to execute; thus, it usually acts to get first visual results to know what we are dealing with without using too much memory.

Step 5: 3D Data vizualisation

We now are ready to address 3D data visualization, a critical step to grasp what we are dealing with! Here, one library is the go-to solution: Makie. Because we didn’t use it yet, we first need to import the package, along with two other “backends” depending on your paradigm (webGL visualization or OpenGL visualization), namely WGLMakie and GLMakie. Nothing more straightforward, we just run the following three lines only once:

Pkg.add(“Makie”)
Pkg.add(“WGLMakie”)
Pkg.add(“GLMakie”)

Note: Once executed, if anytime you want to explore the packages you have installed in your current environment, you can use the command Pkg.installed().

Once the needed packages are installed, to make them usable in your working session, you can add the line using Makie, GLMakie (or using Makie, WGLMakie if you want interactivity on the web with Colab or Jupyter), much like import matplotlib in Python.

Nice, now, before visualizing, let us prepare a bit of color for our colorless points, depending on their z-value. We will create a color vector the size of the sampled point cloud that will range from 0 to 1, depending on how close to the maximum value it is, by dividing each point:

zcoloring_scale=points_sampled[:,3]/maximum(points_sampled[:,3])

Hint: Julia automatically understands that you want to divide each element from points_sampled[:,3] by the maximum value with maximum() . Handy, huh? But it is a particular case of broadcasting, which in Julia, you just write by inputting a . before your function or mathematic symbol. If you do ./ here, you will get the same result.

Okay, now, all we have to do is plot our results as a scattered point cloud in 3D. For this, we use the scatter function from Makie, and we pass as arguments the coordinates of the points (points_sampled), the color vector zcoloring_scale, and also some parameters like the size of each point with markersize, and if we want to display the axis.

scatter(points_sampled, color=zcoloring_scale, markersize=100, show_axis=true)
On the left, the plot using the scatter function. On the right the plot using the meshscatter function. In both case, we can now grasp with what kind of point cloud we are dealing with. © Florent Poux
On the left, the plot using the scatter function. On the right the plot using the meshscatter function. In both case, we can now grasp with what kind of point cloud we are dealing with. © Florent Poux

You also can do a meshscatter plot that will generate a small sphere for each point (like the image above on the right)

scene_1 = meshscatter(points_sampled, color=zcoloring_scale, markersize=0.2, show_axis=true)

How very cool! If you want some interactivity on the web, you should make sure to use WGLMakie instead of GLMakie, which will just give you a fixed backend to generate a view.

Hint: If you want to print your figure to a file, an effortless way is to first install the FileIO package using Pkg.add(“FileIO”) and using FileIO that holds the necessary methods and function to deal with a lot of file formats incl. meshes, images, vectors … Once available in your running session, a simple save(“scatter.png”, scene_1) will save the plot to an image in your working directory.

[BONUS] 3D Mesh I/O

An effortless way to display a mesh is to use FileIO, like hinted above. In two simple lines, you can display your mesh. Take the one available on my GitHub like the point cloud before with:

download("https://raw.githubusercontent.com/florentPoux/3D-small-datasets/main/hippo_1k.obj","hippo_1k.obj")

and then, you can just store it in the obj variable with:

obj = load(“hippo_1k.obj”)

that you pass to the Makie.mesh() function shortened to mesh() to display your 3D model.

mesh(obj, color=”honeydew2”, show_axis=false)
The results of using the 3D mesh visualizer from Makie. © Florent Poux
The results of using the 3D mesh visualizer from Makie. © Florent Poux

If this is not optimized code clarity, I do not know what it is 😆!

Hint: If you want to access the single-shot color names, check out this link that gives all the available possibilities, and there are a lot 🙃.

Step 6: 3D Data Analysis

Now we can explore a bit the data and play around with some functions. The first thing would be to know how to access a specific portion of the 3D point cloud dataset. In our case, we would like to find and separate (more or less) the ground from the rest. First, we would find through visualization the higher z of the ground and then use that as a threshold. Let us plot the 2D projection of the dataset:

scatter(points_sampled[:,2:3], color=zcoloring_scale, markersize=100, show_axis=true)
A 2D projection of the point cloud on the y-axis, to try and grasp a potential threshold line to separate our 3D point cloud dataset.
A 2D projection of the point cloud on the y-axis, to try and grasp a potential threshold line to separate our 3D point cloud dataset.

We can identify that we get a threshold at more or less 3 meters (red line). Let us use it directly for our experiments.

Hint: If you want to display other elements on the same figure, a simple way to do is to add ! to the following plots, like if you want to display the y=3 line in red like above, you would do lines!(range(38, 60, length=30),(x*0).+3, color=”red”) and admire the results with the commandcurrent_figure() 😀.

Now we want to find all points below and above the visually defined threshold and store them in a separate variable. We will use the findall function that returns the indexes of the points satisfying the condition in brackets. Let us work solely on points[:,3] as we just need to study this and retrieve indexes that we can later use to filter our full point cloud. How cool 😆?

indexes_ground=findall(x->x < 3, points[:,3])
indexes_tree=findall(x->x>=3, points[:,3])

Very nice! Now, if we would like to get the points corresponding to these indexes list, we just have to pass indexes_ground or indexes_tree as a row selector in the points variable, for example, for getting all the points that belong to the ground and all the rest. And if we want to plot this, then we can do:

meshscatter(points[indexes_ground,:], color=”navajowhite4”, markersize=0.3, show_axis=false)
meshscatter!(points[indexes_tree,:], color=”honeydew3”, markersize=0.3, show_axis=false)
current_figure()

Awesome! We just did a manual instance segmentation step, where we have one ground element and one tree element, and that with a new language; well done!

Conclusion

In Julia, you just learned how to import, sub-sample, export, and visualize a point cloud composed of hundreds of thousands of points! Well done! But the path does not end here, and future posts will dive deeper into point cloud spatial analysis, file formats, data structures, visualization, animation, and meshing. We will especially look into managing big point cloud data as defined in the articles below.

This Article is derived from the one originally published in Towards Data Science

Leave a Comment

Your email address will not be published.