Data exploration and hypothesis building, how to properly start your data analysis

Before proceeding with in-depth data analysis, you first need to ensure that the data you have available is suited for solving your business problem. Data exploration is a first step in the data science workflow. It helps you to find out if the data you have available is suited for the problem you want to study. Our training session on data exploration helps you on your way.

Data exploration is a first step in the data science workflow intended to allow you to get a deeper understanding of the important properties of the data, e.g. the relevance, volume, completeness and quality of the data. Based on the insights obtained from this exploratory step, you can start preparing the data, and more importantly, determine hypotheses that will serve as the basis for further analysis and modelling. In this session, we will present a wide variety of approaches for exploring data and building hypotheses.

Data exploration helps you to find out if the data you have available is suited for the problem you want to study, before proceeding with the in-depth data analysis. Among others, it will help you to answer the following questions:

  • Does the data cover all the situations you need to consider? For example, if you want to predict when a machine will fail, does your data include sufficient instances of different machines with different failures?
  • Does the data span a sufficient period of time? For example, if you expect seasonal patterns in your data, depending on the time of the year, do you have data that covers several years?
  • Is the data complete? For example, are all the influencing factors (weather, processed material characteristics, configuration parameters, maintenance history, …) represented in the data?
  • Does the data have the right quality level? How much missing/inaccurate data is there, are there significant outliers, …?

Furthermore, this first analysis of your dataset will also allow you to get some preliminary insights, such as clearly observable patterns and trends, that can be exploited further on and help you to better understand the problem at hand. In addition, this exploration will help you to define a working hypothesis, that serves as the reference point to test in your subsequent in-depth analysis.

Training session

Do you want to learn more on data exploration and hypothesis building? Then check out our training session on ‘the importance of data exploration and hypothesis building’ on 23 March. During this session we will present different data exploration techniques and best practices that will allow you to make and formulate some first hypotheses, among others:

  • An overview of several statistical and visual data exploration techniques to identify data quality issues and check completeness
  • Data selection methods to construct representative datasets
  • Advanced visualisation techniques to discover patterns and structure in the data