Beginner data analysts and data scientists often wonder how many and which tools they should learn to do data analysis.
In my experience, choosing your tools for analysis is very similar to choosing your tools for painting. There are three approaches you can take:
One Tool to Rule Them All
Four Tools (aka The Bob Ross Approach)
Specialized Tools for Specialized Projects
1. One Tool to Rule Them All
I have a few paintings hanging up at my house that I’ve painted at my neighborhood art studio. The fact that they’re hanging up means that I’ve been pretty proud of the results! The number of brushes that I used? One mid-sized round brush.
Professionally, I’ve worked on a lot of data analysis projects throughout my 15-year data career. For roughly half of them, I’ve used just one tool — Microsoft Excel.
While one paint brush or one data tool may not seem like enough to do extensive painting or analysis, the reality is that sometimes you don’t need all the bells and whistles. You just need to get the job done well, and that’s what a tool like Excel can help you do.
If you’re just starting out in data analytics, the one tool you should learn is Excel. It will allow you to manipulate, aggregate and visualize data to answer the majority of the data questions that you come across.
2. Four Tools (aka The Bob Ross Approach)
Did you know that Bob Ross used just four tools — three paint brushes and a palette knife — to create the majority of the paintings on his show?
Personally, there are four tools that I use for pretty much every data science project that I work on:
1. Excel: I usually start by importing my data into a spreadsheet and quickly viewing, filtering and sorting the data to come up with quick insights
2. SQL: If I need to gather more data that’s sitting in a company’s database, then I use SQL to query the database and extract the data that I need
3. Python: I use Python for several things:
If I need to gather more data that’s on a webpage or through an API, then I write a Python script to do so
If I need to clean or analyze millions of rows of data, then I use Python to automate the tasks
If I’m applying any machine learning algorithms, then I’ll use Python to do so
4. Data Viz Tool: Tableau is my interactive data visualization tool of choice, but there are many to choose from. I like to create visualizations not just at the end of my analysis, but also as I’m exploring a data set to visually find trends and anomalies in the data
After a decade as a data scientist, this foundational tool set hasn’t changed much and with the knowledge of these four tools, I feel empowered to tackle pretty much any analysis project.
There are alternatives to each tool I’ve mentioned, so I want to generalize the tool list as well:
1. Spreadsheet software: Excel is the most popular, but Google Sheets and others have similar functionality
2. Database language: SQL is the language you need to know to work with databases, and it comes in many flavors, such as MySQL and Oracle, as well as cloud-based options like Snowflake and BigQuery
3. General purpose programming language: a coding language that allows you to automate tasks, such as Python, Java and others
4. Machine learning tool: a tool that allows you to apply algorithms, such as Python and R (open source), or SAS and MATLAB (proprietary)
5. Data visualization tool: while some of these other tools have data viz capabilities, tools like Tableau and PowerBI allow you to easily create beautiful, interactive visualizations
With Excel to work with spreadsheets, SQL to work with databases, Python to both automate tasks and apply ML algorithms and a data viz tool to create interactive visualizations, you can create Bob Ross level masterpieces… with data. 😉
3. Specialized Tools for Specialized Projects
While for many paintings, you can just use a medium-sized round brush, imagine you’re tasked with painting a fence or painting with watercolors. You would need to slightly modify your techniques and use more specialized tools.
The same thing goes for data analysis. If you happen to be working with streaming data or data that’s structured as a network instead of within tables, you would need more specialized tools. You may have to use Kafka to process streaming data or use a NoSQL database instead of SQL database to store non-relational data.
Often as a data professional, the number of tools and techniques that are introduced each year seems overwhelming. When looking at job descriptions, it feels like you have to be a pro at each tool listed. But the reality is that no one knows how to use every tool well — no one.
Bob Ross was an expert oil painter, but probably not the best fence painter or water color painter. That said, because he had foundational painting skills and many years of experience, I’m sure if he were tasked with painting a fence or creating some watercolor paintings, he would pick it up in no time.
The same thing applies to data tools. While the number of tools and techniques is vast, if you have the foundational skills (Excel + SQL + Python + a data viz tool), you can confidently walk into a company and quickly get up to speed with whatever specialized data tools that they use.
The main takeaway is this — Don’t feel like you need to learn all the specialized tools. You can quickly learn them on the job as long as you have the foundational data skills.
I really like the Bob Ross quote, “All you need to paint is a few tools, a little instruction, and a vision in your mind”. I think it applies to a lot of areas of life, including data analysis. With the knowledge of just a few tools, you can do some pretty amazing things.
BLACK FRIDAY CAME EARLY!
Save up to 50% on Maven Pro plans today!
This week, we're offering up major discounts on individual subscriptions at Maven Analytics. Don't wait -- this offer ends Wednesday, November 6th!
Alice Zhao
Data Science Instructor
Alice Zhao is a seasoned data scientist and author of the book, SQL Pocket Guide, 4th Edition (O'Reilly). She has taught numerous courses in Python, SQL, and R as a data science instructor at Maven Analytics and Metis, and as a co-founder of Best Fit Analytics.