Friday 1:55 p.m.–2:25 p.m. in Room 26A/B/C
The importance of exploratory data analysis and data visualization in machine learning
All the data in the world is useless if you cannot understand it. EDA and data visualization are the most crucial yet overlooked stage in analytics process. This is because they give insights on the most relevant features in a particular data set required to build an accurate model. It is often said that the more the data, the better the model but sometimes, this can be counter-productive as more data can be a disadvantage. EDA helps avoid that. EDA is useful for professionals while data visualization is useful for end-users. For end-users: A good sketch is better than a long speech. The value of a machine learning model is not known unless it is used to make data driven decisions. It is therefore necessary for data scientists to master the act of telling a story for their work to stay relevant. This is where data visualization is extremely useful. We must remember that the end-users of the results are not professionals like us but people who know little or nothing about data analysis. For effective communication of our analysis, there is need for a detailed yet simple data visualization because the work of a data scientist is not done if data-driven insights and decisions are not made. For professionals: How do you ensure you are ready to use machine learning algorithms in a project? How do you choose the most suitable algorithms for your data set? How do you define the feature variables that can potentially be used for machine learning? Most data scientists ask these questions. EDA answers these questions explicitly. Also, EDA helps in understanding the data. Understanding the data brings familiarity with the data, giving insights on the best models that fit the data set, the features in the dataset that will be useful for building an accurate machine learning model, making feature engineering an easy process. In this talk, I will give a detailed explanation on what EDA and data visualization are and why they are very helpful in building accurate machine learning models for analytics as well as enhancing productivity and better understanding for clients. I will also discuss the risks of not mastering EDA and data visualization as a data scientist.