The articles in this series:
Before deploying home server, difference between GPU hiring and Colab or Kaggle
Selection of home server chassis and purchasing advice for server components
What Operating System should you choose for your home server and why
Before start
Due to there are a lot of things could be talked in deploying the home server, I will separate this topic into several articles. If you are also interested in self-deploy home server, I am happy to share my experiences with you 🙂
This is the first article that in this serious, I will mainly discuss the different between several platforms that provided free GPU for AI learning. And the reason that why I decided to built my own server after using all the products I tried.
No ads in all of these articles ! ! !
The story starts during my first year in high school …
Start Self-Learning AI
I started to learn AI in 2022, I explored some of the AI code, model at that time. Because I got some knowledge in Python before, so I learn some brief key points in TensorFlow. You know, as a student that very interested in AI, I was always interesting on how can I built using the knowledge that I already learned. Maybe the product could be meaningless, but I still want to have a try.
I register Kaggle (a online AI learning platform powered by Google) and download several datasets and combined them into a “large” one which contain 50,000 images in 35 different animal categories.
I will post the link to my dataset, if you want to have a basic try on your classification neural network, feel free to download it: https://www.kaggle.com/datasets/tylerhong/animals-testing35
Google Colab
As we all know, AI training requires GPUs. However, during that periods, I only got a poor performance rubbish Mac, do not have any ability to run the neural network. What should I do? The project need to be keep going. One of my friends told me Google Colab provided free GPUs for us to learn the AI and deploying AI applications on it. I was very excited until I tried that :(. In my opinion, this is not a platform that encourage you to train your model on it. It is only suitable for reasoning and deploying some small tasks, like deploying the code on GitHub can let user quickly understand the key points of your project, for example the picture below shows the Stable Diffusion Web UI in Colab.
Because Colab has a forced logout mechanism, for us free users, the time available for use will gradually decrease. For instance, if you were forced offline after 8 hours this time, the likelihood is that next time you won’t reach 8 hours, and might be forced offline after just 4 hours. Speaking of free users, it’s worth mentioning Colab’s paid plans. In my opinion, unless you have a strong need for the Google ecosystem, it’s not worth trying, as the cost-performance ratio is quite poor. Below is a screenshot of their membership plan:
As you can see, for 10 US dollars per month, what do you get? 100 compute units. Many people have no concept of Google’s compute units, so based on data provided by Reddit users about the hourly compute consumption for each type of GPU (https://www.reddit.com/r/GoogleColab/comments/xs62o9/what_exactly_is_a_compute_unit/)
GPU Type | Compute Units per Hour |
---|---|
P100 | 4 |
V100 | 5 |
A100 40G | 15 |
Let me briefly mention that the P100 and V100 are already considered ancient GPUs, especially the P100. You spend 10 dollars to get just over 6 hours of usage rights for the A100 40G current version. So, that’s about all there is to say regarding Google Colab.
Kaggle Platform’s GPU
Actually, Kaggle does a pretty good job in supporting students learning AI by offering each free user 20 hours of free GPU time per month. Since Kaggle is a large platform, and it’s free, Google can’t afford to deploy a very large number of servers. This results in having to queue almost all the time when you want to use it. Sometimes you might wait a few minutes, which is still okay, but when there are hundreds of people ahead of you, it can be quite discouraging. For someone as impatient as me, this setup is a pass.
GPU Cloud Rentals in China
Although I had never heard of GPT rentals before, upon further thought, renting cloud servers seemed reasonable, so renting GPUs made sense as well. After comparing prices across various platforms, I found that AutoDL’s services were very cheap, costing just 1.5 yuan per hour for a single 3090 card, which is quite acceptable. They also offer some unique services. Since this is not an advertisement, I won’t go into details here, but interested folks can look it up themselves.
Training Completion
Eventually, I ran my project code here and obtained a trained neural network checkpoint. Honestly, I was very excited at that moment. Later, using a test dataset, the accuracy was around 90%, which is not bad, though it is still far from the top international standards. However, this couldn’t dampen my joy. Here, I also want to share some results I randomly pulled from the internet and tested with my trained network.
I’m just randomly posting three images here, as this isn’t the main focus of this chapter.
Problem Discovery
Actually, if it were just small projects like this, renting a cloud server when needed would be enough. But why did I eventually go down the road of building my own server?
The main reason lies in debugging. Here, I’m not referring to environmental debugging, but code debugging. For my small project, there were only a few lines of code, so the pressure to debug was quite minimal. However, as the volume and difficulty of projects increased, the practice of debugging on a local computer and then uploading to a cloud server became impractical. The main reason is that small projects can still run on a local computer, but large projects simply can’t, requiring everything from development to debugging to training to be done in the cloud. This also incurs significant costs. Moreover, uploading data can also be cumbersome. Although AutoDL provides a cloud storage service, it’s understandable that not everyone wants to store their carefully organized datasets on someone else’s cloud.
To Be Continued
In the next chapter, I will discuss my own experience building a server, including the selection process and some pitfalls to avoid. Thank you for reading this far.
The next article’s URL: https://global.tylerhong.cn/selection-of-home-server-chassis-and-purchasing-advice-for-server-components/