Eigenlayouts

For a long time, I’ve been interested in with web technology. In high school, I read Jesse Liberty’s Complete Idiot`s Guide to a Career in Computer Programming learning about Perl, CGI (common gateway interface), HTML, and other technologies. It wasn’t until I finished a degree in mathematics that I really started learning the basics, namely HTML, CSS, and JavaScript. At that point, folks were just starting to come out of the dark ages of table-base layouts and experimenting with separating content (HTML) from presentation (CSS) from behavior (JavaScript).

Memory Profiling in Python

Data Scientists often need to sharpen their tools. If you use Python for analyzing data or running predictive models, here’s a tool to help you avoid those dreaded out-of-memory issues that tend to come up with large datasets. Enter memory_profiler for Python This memory profile was designed to assess the memory usage of Python programs. It’s cross platform and should work on any modern Python version (2.7 and up). To use it, you’ll need to install it (using pip is the preferred way).

Using Jupyter on Remote Servers

As a data scientist, it really helps to have a powerful computer nearby when you need it. Even with an i7 laptop with 16GB of RAM in it, you’ll sometimes find yourself needing more power. Whether your task is compute or memory constrained, though, you’ll find yourself looking to the cloud for more resources. Today I’ll outline how to be more effective when you have to compute remotely. I like to refer folks to this great article on setting up SSH configs.

Bloom Filters in Practice

Today we’re going to talk about what a Bloom filter is and discuss some of the applications in data science. In a later post, we’ll build a simple implementation with the goal of learning more about how they work. What is a Bloom Filter? A Bloom filter is a probabilistic data structure. Let’s break that term down. Any time you hear the word “probabilistic” the first thing that should come to mind is “error.

Getting Up and Running With Python Virtual Environments

Python is a great tool to have available for all sorts of tasks, including data analysis or machine learning. It’s a great language to start off with if you’re a beginner, and there are loads of tutorials out there. So, if you’re a neophyte Pythonista, head over there and come back here later. Additionally, plenty of great developers have been working on tools that just get the job done, including pandas for wrangling your data (and turning it into something that looks like a spreadsheet), as well as Scikit-Learn for running anything from basic statistics to more complex learning algorithms on your data.