Phase 2: Setting Up Your Learning Environment

2.3 Software Installation -
Setting up Windows

This guide is part of a larger roadmap to data engineering. Please refer back for context.

Set Up Windows


Setting up your Windows machine for data engineering or data science is like gearing up for a deep-sea dive; you need the right equipment to explore the depths! Unfortunately Windows is the least well-equipped operating system for data exploration. No worries explorer! Once everything setup, it is only smooth sailing. 


Let’s dive into some Windows-specific instructions and configurations to ensure your machine is ready for the adventure:


1. Windows Update:

First things first, ensure your Windows is up-to-date. It’s like checking the weather before setting sail. Navigate to ‘Settings’ > ‘Update & Security’ > ‘Windows Update’, and click ‘Check for updates’.


We highly recommend using Windows 11 Pro. You will need the Pro edition to turn on some high-end features like Hyper-V for data work. 


2. Enable and Configure Hyper-V:

For virtualization needs, like running Docker or virtual machines, Hyper-V is key. It’s like having a remote-operated vehicle (ROV) for deeper exploration.


Enable Hyper-V through ‘Control Panel’ > ‘Programs’ > ‘Turn Windows features on or off’. Check ‘Hyper-V’ and click OK. 


Note: You may have to Google specific instructions for your hardware. Some hardware may require modifications to your BIOS (low-level hardware configs). This is ok! Just be careful.


3. Install Windows Subsystem for Linux (WSL):

WSL is your submarine in the ocean of data engineering. It allows you to run a Linux environment directly on Windows.


What is WSL and Why Do You Need It?

WSL stands for Windows Subsystem for Linux. It allows you to run a Linux environment directly on Windows, without the overhead of a traditional virtual machine or dual-boot setup. For data engineers and scientists, this is a game changer. Why? Because most data tools are developed for and run best on Linux. With WSL, you can access these tools seamlessly on your Windows machine. It’s like speaking both languages fluently at an international conference!


For more detailed installation  instructions and troubleshooting, Microsoft’s WSL documentation is our treasure chest. Check out the Microsoft WSL Documentation.


4. Set Up a Python Environment:

Python is your diving gear, essential for data work. Download and install the latest Python version from the official website. Remember to check ‘Add Python to PATH’ during installation.


5. Install Data Science Tools:

Equip your mission with tools like Jupyter Notebooks, Pandas, and more, using either Python’s pip installer or Anaconda, which you can download from the Anaconda website.


6. Install Git:

Git is your underwater communication system, crucial for version control. Download and install Git from Git for Windows.


7. Install Visual Studio Code:

VS Code will be your control panel, a versatile editor for coding. Download it from the VS Code website and install it. Enhance it with extensions like the Python extension or the Remote – WSL extension for integrated Linux development.


8. Set Up Docker:

For containerization (think of containers as mini-submarines), install Docker Desktop from the Docker website. Ensure you have WSL 2 for this, as it’s required for Docker Desktop on Windows.


9. Install Windows Terminal:

The Windows Terminal is your dashboard. Customize it for a better experience. Install it from the Microsoft Store and play around with settings like multiple tabs, PowerShell, and Ubuntu (WSL) integration.


10. Optimize Performance Settings:

 Ensure your machine is set for high performance. Go to ‘Control Panel’ > ‘System and Security’ > ‘Power Options’ and select ‘High performance’.


11. Regular Maintenance:

Regularly clean up your system, update software, and check for any security issues. It’s like maintaining your diving equipment for optimal performance.


By following these steps, your Windows machine will be fully equipped and ready to dive into the world of data engineering and data science, prepared to handle the challenges and opportunities that lie in the depths of data exploration and analysis.