Picture

Data Analytics - Tools of the trade

November 13, 2024 (1mo ago)

At its core, Data Analytics is simple: Make sense of data and turn it into useful insights.

But getting this far needs many steps and each step has different tools. So where does one even start ?

Let’s look at some of the steps and some of the tools they use

  • Data Collection

    • Collecting relevant data from various sources.
    • Tools: SQL, Python (requests, web scraping with BeautifulSoup, Selenium), R (rvest)
  • Data Cleaning/Preparation

    • Preparing and structuring data for analysis by handling missing values, correcting errors, and transforming formats.
    • Tools: Python (Pandas, NumPy), R (dplyr, tidyr), Excel, Alteryx, Apache Spark, SQL, Trifacta
  • Data Analysis

    • Applying statistical and computational methods to interpret data and extract insights.
    • Tools: Python (Pandas, Scikit-Learn, SciPy), R (caret, stats), Excel, SPSS, SAS, Apache Spark
  • Data Visualization

    • Presenting data in visual formats to enhance understanding and communicate findings.
    • Tools: Tableau, Power BI, Python (Matplotlib, Seaborn, Plotly), R (ggplot2), Excel
  • Machine Learning/Modeling

    • Using algorithms to create predictive or prescriptive models based on data.
    • Tools: Python (Scikit-Learn, TensorFlow, Keras), R (caret), Spark MLlib, Databricks, Google Colab
  • Data Storage/Warehousing

    • Storing large volumes of data for quick access and analysis.
    • Tools: Amazon Redshift, Google BigQuery, Snowflake, SQL databases (PostgreSQL, MySQL), Microsoft Azure
  • Business Intelligence (BI)

    • Generating actionable insights and analytics for business decision-making.
    • Tools: Tableau, Power BI, Looker, Qlik, Qlik Sense, Microsoft Excel
  • Reporting and Dashboards

    • Creating automated reports and dashboards to monitor data and key metrics.
    • Tools: Tableau, Power BI, Looker, QlikView, Google Data Studio, Excel
  • ETL (Extract, Transform, Load)

    • Extracting data from sources, transforming it to fit the analysis, and loading it into storage or visualization tools.
    • Tools: Apache NiFi, Talend, Alteryx, Informatica, Airflow, SQL

If you have a specific, well defined requirement in a given environment then you need to pick a specific tool.

But if you are looking to enter the area or improve your skillset, note that a few tools appear in many steps:

  1. Python: Collection, Cleaning, Prep, Analysis, Visualization, Machine learning
  2. SQL: Collection, Cleaning, Prep
  3. Excel: Cleaning, Prep, Analysis, Visualization, BI, Dashboards
  4. Tableau or Power BI: Analysis, Visualization, BI, Dashboard

These 4 tools are a great place to start if you’re trying to expand your skillset.

Here’s some more tips on where to start:

  • If you are a complete newbie, start with Excel
  • If you are already familiar with Excel, Tableau would be a good place to start
  • If you are familiar with programming ( in any language), Data Analysis with Python would be a good place to start
  • If you are mostly interested in Dashboards and Presentation, Tableau or PowerBI would be the right starting point
  • If you are looking for a flexible tool that can help in almost all the steps then Python is good. But remember that Python has a learning curve and you have to master several libraries for each step
  • If you are mostly interested in databases and the technical aspects of data, SQL is a good starting point

All tools overlap and all have unique aspects. But you cannot go wrong with these four.