By Bella Ratmelia, Librarian, Data Services
If you have been delving in data science and machine learning (ML) for a while, then at this point you have most likely heard of or even be familiar with how to code in Python and R, the two most established languages in these areas. But you may or may not have heard of Julia, which is touted as the up-and-coming language specifically made for data science.
What is Julia? What’s the difference between Julia, Python, and R?
Julia is a general-purpose coding language, and if you’re familiar with Python, you will notice that the syntax in Julia looks remarkably similar to Python. All three languages are also open-source languages, which mean they are free to use and distribute, even for commercial purposes. One of the biggest selling points for Julia is its speed, and various benchmarking tests have shown that Julia executes code at a much faster rate than Python and R. (Here’s one conducted by NASA, for example).
The major differences between these three languages are outlined in the table below:
(Note: There are quite a few technical differences between the three languages, but this article will focus on the high-level differences between them.)
Python | R | Julia | |
---|---|---|---|
Release Date | 1991 | 1993 | 2012 |
Example syntax | if a < b: print("a is bigger") |
if (a < b) { print("a is bigger") } |
if a < b "a is bigger" |
Syntax readability | Python emphasises on code readability and thus the syntax looks close to natural human language | R is still somewhat readable, but much less so than Python or Julia | As Julia is similar to Python, it is also very readable to human. |
Data Science packages |
Python has a lot of well-established packages for data science (e.g., NumPy, pandas, etc.). Python has more established machine learning, text analysis, and AI packages. |
R has a lot of well-established packages for data science (e.g., tidyverse, tidyr, etc.). R has more established mathematical and statistical packages. |
Currently there are fewer well-established packages compared to Python and R, but more are continuously developed. Julia can call and use Python packages through the PyCall.jl package |
Key “selling point” |
One of the most popular programming languages; robust support for machine learning and AI; also used in other sector such as info security, game, and web development. |
Robust support for data mining, statistical and numerical analysis tasks; known as the better alternative to SAS, SPSS, and Stata; commonly used in academia. |
Designed for data science and ML; great code readability; better speed than Python and R. |
Learning Support for Beginners |
Python has strong community support and has been around for decades, so there are many tutorials and support available. |
R has strong community support and has been around for decades, so there are many tutorials and support available. |
Julia is still relatively new, so tutorials, while available, may not be as abundant as Python or R. |
You would have seen a lot of examples of how Python or R is used by now. But how about Julia? Despite its relatively young age, Julia has been used by major companies in the pharmaceutical, aviation, biomedical, and other industries.
So, which one should I learn if I want to learn Data Science or Machine Learning?
The answer to this question is undoubtedly “it depends”. Here are some things to consider, in general:
- What are your peers using? The pragmatic approach is to see what’s more popular in your current field of study/research and go with that. You can ask your fellow students, researchers, PI, or faculty. One other way you can find out is by looking at the research papers in your field and seeing what coding language/tools have been used in those research (Tip: you can use the library databases to do this. Feel free to approach our librarians for help!)
- What kind of tasks are you trying to complete? Consider what are the main things that you want to carry out. If most of your works involve complex statistical analysis, then learning R first is probably the better choice. If you want to eventually learn about machine learning or do some text analysis, then perhaps learning Python or Julia first is more effective. Learning what’s currently popular might seem like the safer way, but sometimes what’s popular might not be the best tool to solve your problem.
- Consider the support available, especially if you are a beginner. More established languages like Python and R would have more support and more tutorials available, which will not be the case for Julia. It depends on your appetite for challenge, in a way.
However, the coding language is but a tool. Tools come and go, and they will always change and evolve. Python is undoubtedly the most popular programming language right now, but who knows? Maybe in the next 3-5 years or so, a new language will take its place.
Rather than getting too fixated on the tool, it would be more effective to have a good grasp of the fundamentals (i.e., basic programming concepts, math/statistics, and domain knowledge in your field) and most importantly, how to apply them. Even if you know how to code in R or are proficient in statistics, it will not help much if you do not know how to identify the problem or ask the right questions. Having good foundations will make learning any data science or data analytics tools a breeze! (Or flatten the learning curve a little bit.)
As always, if you have any questions or comments, I can be reached at bellar@smu.edu.sg