This guide is part of a larger roadmap to data engineering. Please refer back for context.
Ah, Ubuntu – the bread and butter of many data scientists and engineers. It’s like stepping into a well-organized workshop where every tool you need is within arm’s reach. Your Ubuntu machine is fairly ready for data work right out of the box.
Start by updating your package list and upgrading your system. In the Terminal, run sudo apt update && sudo apt upgrade. It’s like giving your car a thorough service before a long journey.
Ubuntu comes with Python, but it’s wise to install a version manager like pyenv. This lets you switch between Python versions seamlessly. Use sudo apt install pyenv to install.
For managing Python packages, ensure pip is installed by running sudo apt install python3-pip.
Virtual environments (like venv or virtualenv) are crucial for managing project-specific dependencies. Install virtualenv using pip: pip install virtualenv.
Create a new environment for each project to keep your workspace clean and organized.
Install essential Python libraries for data science such as NumPy, Pandas, Matplotlib, and Scikit-Learn using pip.
Jupyter Notebooks are a staple in data science. Install them via pip: pip install jupyterlab.