The second course in the EmbRaceR series teaches attendees learn how to perform the initial data overview. Statistical and graphical methods are equally important.
The second course in the EmbRaceR series teaches attendees learn how to perform the initial data overview. A serious data science project always starts with data overview. You need to deeply understand how the values are stored and get familiar with the distribution of those values. In order to get a good comprehension, you need both, graphs and numbers. R is a powerful language for all data science tasks, including data overview. In this session, you will learn about datasets, cases and variables, and types of variables. You will get the basic understanding of a distribution through introductory statistics for discrete variables and descriptive statistics for continuous variables. The course covers presenting data through graphs as well. Finally, you will learn about important statistical terms like sampling, confidence level, and confidence interval.
The first module is an introduction. The module discusses the datasets you analyze and introduces cases and variables. Then you learn how to check for the normality of a distribution.
In the second module, you will learn how to do the overview of the two basic types of variables- discrete and continuous.
R has extremely powerful graphing capabilities. This module shows how to draw some basic and also enhanced plots.