This post is a part of a series, and you can use the following links to go directly to the other parts of the series:
- Part 1: Introduction
- Part 2: Setting up a VPS
- Part 3: Setting up RStudio Server
- Part 4: Setting up Jupyter Notebook
- Part 5: Setting up Visual Studio Code
What is a VPS?
Also known as Virtual Dedicated Servers, a VPS (Virtual Private Server) is essentially a traditional server machine (you know, one of those big ones you see in movies) which runs virtualisation technology, to share the its resources among the multiple “virtual” servers living on it.
Each virtual server is self-contained, meaning that they cannot interact with any of the other virtual servers even if they’re hosted on the same physical machine. In fact, it is impossible for a virtual server to even “look” beyond its own virtual environment, to see if there are any other virtual servers sharing their physical environment. Furthermore, each virtual server also has their own dedicated IP address and firewall setups.
What all this means is that for all practical intents and purposes, a virtual server feels and acts exactly like a dedicated server machine at your disposal.
Most of them also come with full root access, so you are completely free to install additional software and configure the server however way you want it.
How to Choose a VPS Provider
There are literally thousands of VPS providers out there from all over the world, and you need to have a clear idea of what you want it for in order to shortlist the ones most suitable for you.
For our purpose (building a cloud computing data science platform), these are the key considerations:
Full Root Access
VPS providers can be largely split into two categories, managed and unmanaged.
Managed hosting is, as its name suggests, where the hosting provider has professionals managing your server for you and performing tasks such as installing software, applying patches, backing up data.
This is a great option if you’re not tech-savvy and you don’t have a dedicated IT team, but you need a VPS for purposes such as hosting your company website and email.
Of course this convenience comes at a cost, more apparently the premium you need to pay for the professional management, but also the loss in freedom of being able to install the softwares you want and to configure the system in just the way you want it.
Since we are technically savvy (or even if not savvy yet, have desire to be in future) and we have a specific combination of not-so-standard software we want to install on our server, we want an unmanaged hosting service where we are given full root access (that’s Unix/Linux jargon for full administrative privileges) but left to our own devices when it comes to managing the server.
Even though a VPS runs on shared resources, most VPS providers these days offer a guarantee on the minimum resources that will be available for your particular server at any given point of time. For example if your VPS is guaranteed 16GB of memory, your server will always have at least this much memory available to it no matter how many other “tenants” are sharing the physical server.
Naturally it will be more expensive the more resources you’re guaranteed.
Dedicated IP Address
An IP address on the internet is in many ways similar to physical addresses in the real world. If your family is the sole occupant of an entire house, any mails directed to the address of that house will reach your family. Even if you’re living in an apartment (i.e. shared building), your unit number can uniquely identify your particular home.
In a similar sense, as long as your VPS is given a dedicated IP address, your server can have a unique “home” on the internet.
This is important, because without a unique IP address you won’t be able to connect directly to your server as a cloud computing platform.
This indicates how stable the VPS service is, in particular how often your server will lose access to the internet or go through a forced reboot.
It is less of a problem these days than it used to be in the past, mostly because providers with low reliability have lost customers and have been crowded out of the increasingly competitive market.
Still, it is only prudent to check the provider’s uptime guarantee and read reviews from past users to get a sense of how reliable the provider is, especially if they’re cheap. If something sounds too good to be true, that’s most probably because it’s not true.
The price of the service can hold different importance to different people, based on their use case and requirements.
Hobbyists just starting out on their data science journey (as I assume would be the majority of readers) would want a VPS for as cheap as realistically possible, and willing to accept some compromise in terms of guaranteed resources or reliability.
On the other end of the spectrum, companies seeking to run mission-critical operations would need the utmost stability and reliability as highest priority, even if it comes at a hefty price.
The location of the physical server hosting your VPS should not be a concern for the vast majority of people, unless you have a very specific reason for preferring a certain location (e.g. continental United States) over others (e.g. China).
You might have concerns such as economic stability, geopolitical stability or data security regarding the country the server is hosted, and this might be a determining factor when choosing a VPS provider.
My personal recommendation is Contabo because they offer VPS plans running on 100% SSD (solid state drive) with high reliability and generous resource guarantees at a very attractive price.
Disclaimer: I will be paid a small referral fee if you purchase a VPS plan from Contabo through the links on this post. However even without the referral fee my recommendation for Contabo still stands, as I genuinely believe it is one of the best value-for-money VPS providers for hobbyists and small businesses.
Their VPS S SSD plan for example comes with resource guarantee of 4 CPU cores, 8GB memory and 200GB storage, with full root access and dedicated IP for only 4.99 EUR a month. That’s a decently powered VPS for about the cost of a cup of StarBucks coffee each month!
While 8GB memory is a bit on the low side for proper data science work, it’s more than sufficient for budding hobbyists just getting started, especially if the server is running on Linux instead of Windows. If you can spare an additional 4 EUR a month, I would strongly recommend the VPS M SSD plan at 8.99 EUR a month instead.
Compare this against Liquid Web, one of the biggest and oldest names in VPS hosting. A VPS with 8 CPU cores and 8GB memory will set you back 139 USD a month.
Another alternative could be Cloud Computing platforms such as Google Cloud Platform, Amazon Web Services or Microsoft Azure, which have gained huge popularity in recent years.
They have a deceptively appealing “pay as you go” pricing model, where you’re only billed for the actual amount of resources you’re using. So if you were to log into their console to spin up your server only when you need to use it and always remember to turn it off once you’re done, you could probably run a similar powered server (4-8 cores, 8GB memory) for around 10-20 USD a month.
But if you were to keep the server up and running 24/7, the monthly bill will come up to 100 USD or more!
Setting up the VPS
For the rest of this article, let’s assume you’re going with my recommended VPS provider of Contabo, and say you’re getting a VPS M SSD plan.
There are a few options to customise your order with before you’re done.
By far the most important and impactful decision is to choose your operating system between Windows and Linux.
While I would be the first to recommend Windows to most people for their daily use on their personal PC/laptop, Linux should be the no-brainer choice for our purpose of setting up a data science platform:
- It’s designed to run as a server
- It’s free, and therefore do not require additional license fees
- Some of the software we want to install (RStudio Server, JupyterHub) only works with Linux
Once you’ve decided on Linux, there comes the next decision of which distro (Linux jargon for “distribution”) to choose. While there are armies of strong proponents defending each Linux distro, let’s go with Ubuntu 18.04 which is the latest stable LTS (Long Term Support) version of one of the most popular distros in recent years.
Choose the Webmin + LAMP option, since it can be added on for at no additional cost. LAMP stands for “Linux, Apache, MySQL and PHP”, which is by far the most common combination of software used for web hosting, so much so that it even has its own acronym.
If you need the server to be physically located in the United States and you’re willing to pay an extra 2 EUR each month for it, choose the “St. Louis” option.
If you’re fine with Germany, you can choose the free Nuremberg option.
There’s nothing particularly important here. SSL certificate definitely is important to secure your server, but I’ll show you how to get a free one later.
The final choice is what your payment frequency will be.
If you choose the monthly option, you can cancel at any time but will be charged a one-time setup fee of 4.99 EUR. This setup fee is discounted to 3.99 EUR for quarterly payment, 2.99 EUR for bi-yearly payment, and waived for yearly payment options.
Connecting to Your Server
Once you’re happy with your choices, hit “Order now”, make payment, and your shiny new VPS will be ready in a couple of days!
Once your server is ready, you'll receive an email like this:
Take note of the
user name and
password fields as these are what you'll need to connect to your server.
If Windows has been the only operating system you've been using all your life, the next steps might feel shocking because your server doesn't have a graphical interface! This is for good reason, as running a graphical interface uses up precious system resources which are better reserved for "real work" such as running your latest big data analysis.
We will be connecting to the server using SSH (Secure Shell), which is already available by default if your desktop/laptop runs Mac OS or Linux but you'll need to download additional software such as PuTTY if you're on Windows.
For the next step, let's assume that your server's IP address is
On Mac OS / Linux
Open up a terminal, and type:
This establishes a connection to the server with IP address
18.104.22.168, and requests authentication with user name
root. When prompted for the password, type in the password you've received in email.
Open up PuTTY, and in the
Host Name (or IP address) section type:
Make sure your
Connection Type is selected as
SSH, and click on the "Open" button to connect.
Important: Changing Your Initial Password
Once you've successfully logged into your server, the very first thing you should to is change your password. This is especially important for the
root user, because this user has full administrative privileges over the entire server and someone with access to
root credentials can do whatever he/she wants with your server.
In order to change your password, type:
The server will then tell you to enter your new password, and to retype it to confirm. Note that you won't be able to see anything on the screen while typing your new password. This is as per design, as it ensures that someone looking over your shoulder won't even be able to tell how long the password is.
Creating a New User
Due to how sensitive and potentially disastrous incorrect usage of a
root user is, it is best practice to never log in with
root other than during the initial setup.
Instead, create a new user for yourself (even if you're going to be the one and only user of the server) and use this to subsequently log in.
This command will create a new user named
max. Next you'll want to set the password for this new user.
Note how the
passwd command by itself changes the password of the user currently logged in, but if you append a username to the command it changes the password of that user. Naturally, this can only be done by
root or a superuser.
Invoking Superuser Rights
Now log out and log in again using your newly created user account.
The next time you need to perform an operation (such as installing a new software) which requires administrative privileges, you can use the
su command to elevate your rights to
root level. We'll explore this in future posts.
That wraps up the process of getting your own VPS and logging into it.
In the next series of posts, I'll show you how to install the necessary software and how to configure them so that you can have your very own cloud computing data science platform.