Phase 2: Setting Up Your Learning Environment

2.3 Software Installation -
Setting up Ubuntu

This guide is part of a larger roadmap to data engineering. Please refer back for context.

Set Up Linux Ubuntu

 

Ah, Ubuntu – the bread and butter of many data scientists and engineers. It’s like stepping into a well-organized workshop where every tool you need is within arm’s reach. Your Ubuntu machine is fairly ready for data work right out of the box.

 

Ensure System Update and Upgrade:

Start by updating your package list and upgrading your system. In the Terminal, run sudo apt update && sudo apt upgrade. It’s like giving your car a thorough service before a long journey.

 

Install Python and Python Environment Management Tools:

Ubuntu comes with Python, but it’s wise to install a version manager like pyenv. This lets you switch between Python versions seamlessly. Use sudo apt install pyenv to install.

For managing Python packages, ensure pip is installed by running sudo apt install python3-pip.

 

Set Up a Virtual Environment:

Virtual environments (like venv or virtualenv) are crucial for managing project-specific dependencies. Install virtualenv using pip: pip install virtualenv.

 

Create a new environment for each project to keep your workspace clean and organized.

 

Install Data Science Libraries and Jupyter Notebook:

Install essential Python libraries for data science such as NumPy, Pandas, Matplotlib, and Scikit-Learn using pip.

Jupyter Notebooks are a staple in data science. Install them via pip: pip install jupyterlab.