Memory Profiling in Python

Data Scientists often need to sharpen their tools. If you use Python for analyzing data or running predictive models, here’s a tool to help you avoid those dreaded out-of-memory issues that tend to come up with large datasets. Enter memory_profiler for Python This memory profile was designed to assess the memory usage of Python programs. It’s cross platform and should work on any modern Python version (2.7 and up). To use it, you’ll need to install it (using pip is the preferred way).

Using Jupyter on Remote Servers

As a data scientist, it really helps to have a powerful computer nearby when you need it. Even with an i7 laptop with 16GB of RAM in it, you’ll sometimes find yourself needing more power. Whether your task is compute or memory constrained, though, you’ll find yourself looking to the cloud for more resources. Today I’ll outline how to be more effective when you have to compute remotely. I like to refer folks to this great article on setting up SSH configs.

Getting Up and Running With Python Virtual Environments

Python is a great tool to have available for all sorts of tasks, including data analysis or machine learning. It’s a great language to start off with if you’re a beginner, and there are loads of tutorials out there. So, if you’re a neophyte Pythonista, head over there and come back here later. Additionally, plenty of great developers have been working on tools that just get the job done, including pandas for wrangling your data (and turning it into something that looks like a spreadsheet), as well as Scikit-Learn for running anything from basic statistics to more complex learning algorithms on your data.

You Probably Need a Database

When I see organizations using and talking about their data, they love to present the tools they’re using to handle and wrangle it. You’ve probably heard terms like Hadoop, Spark, Shark, PostgreSQL, MySQL, MongoDB, and rarely Excel. (If you haven’t, there’s a good list to look up on Wikipedia.) I won’t argue that taming data doesn’t take good tools, but what I will argue is that the tools you use depend on the scale of your data.