Data Science with SQL Server (and Azure SQL Database)

Academy Membership

This is for individual Membership: When you are registering for yourself. if you want to register for a group or team, check out the team membership. Worth more than $10K! 48-hours free trial

  • Access to all video courses.
  • 200+ hours of video courses.
  • Instructed by RADACAD Coaches
  • Get certification for completed courses.
  • Member-only discount on upcoming in-person training sessions.
  • Yearly and Monthly plans. You can cancel anytime.

From: $49.00 / month with a 2-day free trial

Academy Membership - for Teams

This is for team Membership: When you have a group of team members registering. if you want to register for one person, check out the individual membership. Worth more than $10K! 48-hours free trial

  • Access to all video courses.
  • 200+ hours of video courses.
  • Instructed by RADACAD Coaches
  • Get certification for completed courses.
  • Member-only discount on upcoming in-person training sessions.
  • Yearly and Monthly plans. You can cancel anytime.

From: $49.00 / month with a 2-day free trial per member

Data Science with SQL Server (and Azure SQL Database)

This course teaches you how to start doing data science with Microsoft SQL Server (and Azure SQL Database) using the in-database Machine Learning Services (ML Services). During the course, you will learn about all stages of a real-life data science project, starting with business and data understanding, continuing with data overview and data preparation, and then switching to basic statistical analysis, followed by using more advanced algorithms, evaluating, and finally deploying the model in a SQL Server database. You will learn how to use languages that come with SQL Server, from Transact-SQL (T-SQL) to ML Services (in-database) with R and Python. Code examples are written in all three languages mentioned. You will also learn how to select the right algorithm for specific tasks and get a basic understanding of the mathematics behind each algorithm.

Length: 6 Hours and 33 Minutes

$259.00
Or log in to access your purchased courses

This course teaches you how to start doing data science with Microsoft SQL Server (and Azure SQL Database) using the in-database Machine Learning Services (ML Services). During the course, you will learn about all stages of a real-life data science project, starting with business and data understanding, continuing with data overview and data preparation, and then switching to basic statistical analysis, followed by using more advanced algorithms, evaluating, and finally deploying the model in a SQL Server database. You will learn how to use languages that come with SQL Server, from Transact-SQL (T-SQL) to ML Services (in-database) with R and Python. Code examples are written in all three languages mentioned. You will also learn how to select the right algorithm for specific tasks and get a basic understanding of the mathematics behind each algorithm.

The course is based on my Data Science with SQL Server Quick Start Guide, Packt, August 2018.

Coach/Instructor/Course Author:

Dejan Sarka, MCT and Data Platform MVP, is an independent trainer and consultant that focuses on the development of database and business intelligence applications. Besides projects, he spends about half of the time on training and mentoring. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or co-author of eighteen books about databases and SQL Server. Dejan Sarka has also developed many courses and seminars for Microsoft, SolidQ, and Pluralsight.

Modules

Module 01: Reviewing querying with T-SQL

This first module is not a comprehensive reference guide to T-SQL; I will rather focus on the mighty SELECT statement only, the statement you need to use immediately when your data is located in a SQL Server database. However, besides the basic clauses, I will explain also advanced techniques like window functions, common table expressions, and APPLY operator. Lessons: Installing SQL Server Data science projects lifecycle Basic SELECT Aggregating data Introducing subqueries and common table expressions Window functions TOP and APPLY

Lessons

Module 02: Starting with R

When you talk about statistics, data mining and machine learning, many people, especially the ones working in academic areas, think about R. R is the engine and the language that the engine executes. You can use multiple different R environments - engines and development tools; however, the basic R language is only one. Of course, in order to use R, you need to learn how to program in this language. Module two is introducing the R language. Lessons: Obtaining R R language basics Expressions and variables Vectors and packages Collections and objects R data frame

Lessons

Module 03: Introducing Python

Python is a general-purpose high-level interpreted language. Because it is a general-purpose language, it is probably even more popular than R or T-SQL. Python was created by Guido van Rossum, with the first release in 1991. SQL Server 2016 started to support R. In addition, SQL Server 2016 brought all of the infrastructure needed to support more additional languages. Therefore, it was easy for SQL Server 2017 to add the support for Python. This module will teach you selecting the Python environment, starting with basic Python, and understanding Python data structures. Lessons: Selecting the Python environment Introducing Python Basic operations Strings and variables Getting matrix operations in Python with numpy The mighty pandas library

Lessons

Module 04 : Initial data overview

Before doing some advanced analyses, you need to understand your data. In this stage, you typically do some preliminary statistical analysis and quite a few data visualizations. Lessons: Datasets, descriptive statistics, and graphs Introductory statistics for discrete variables Centers and spreads for continuous variables Higher population moments Advanced graphs for data overview

Lessons

Module 05: The hard part: Data Preparation

Unfortunately, many data you get to work with is not immediately useful for a data science project. A major part of the work on such project is the data preparation part. Lessons: Handling missing values and outliers Creating numerics from strings Derived variables Discretizing continuous variables Introducing entropy Data smoothing and normalization Data manipulation with dplyr

Lessons

Module 06: Calculating intermediate-level statistics and graphing the data

In this module, we will start to analyze associations between pairs of variables. Lessons: Associations between continuous variables Dependencies between discrete variables Associations between discrete and continuous variables pairs

Lessons

Module 07: Basic classification and prediction algorithms and feature selection

After measuring associations between pairs of variables in the previous module, we are switching to the multivariate analysis. This module introduces some of the most important basic multivariate algorithms. With linear regression, you try to express a continuous target variable as a linear function of one or more input variables. Naïve Bayes is the first classification algorithm introduced in this course, where all input variables and the predictable variable are discrete. Principal component analysis is useful when you want to reduce the number of variables used in further analysis, and exploratory factor analysis tries to find the non-observable latent variables. Lessons: The mother of all predictive algorithms: linear regression Naïve Bayes Principal component analysis and exploratory factor analysis

Lessons

Module 08: Unsupervised learning

Unsupervised learning is like fishing in the mud; you hope you will catch something, without an explicit plan what to catch. The most important unsupervised algorithms include all kind of clustering for finding hidden groups of cases and association rules for market basket analysis. Lessons: K-means clustering Hierarchical clustering Association rules

Lessons

Module 09: Supervised learning

The supervised algorithms have a target, or a dependent variable. They try to explain the values of that variable with the formula and the values of the independent variables. This explanation is stored in a model, which you use to predict the target variable value on a new dataset. The dependent variable supervises the development of the model. In a real project, you create many models, and then you deploy the one that suits your business needs the best. Therefore, you need to evaluate the models before the deployment. Lessons: Evaluating predictive models Neural network and logistic regression Decision trees K-nearest neighbors and look-alike modeling

Lessons

Module 10: Deploying and using models in a SQL database

In the last module of this course, you will learn how to use the data science models developed with R or with Python inside SQL Server, inside the T-SQL language. Lessons: SQL Server and ML integration Installing libraries and using the sys.sp_execute_external_script stored procedure Deploying models and performing native predictions

Lessons