When it comes to choosing a programming language, people love to debate endlessly on such topics.

Should you start with R or Python programming?

Why Python is seemingly preferred and not R?

Evidently, both R and Python play a significant role in the life of a data science professional. Both programming languages are mandatory and useful and are found amongst the most frequently required skillsets by top employers. However, each of these programming languages offers certain advantages and disadvantages for performing data science work. However, based on the kind of project, the required programming language can be chosen for further analysis.

According to a 2019 survey by Stack Overflow, Python continues to be the fastest-growing programming language today. Further on, Python topped to be the most wanted programming language by 25.7% while R remained to be at 4.9%.

Here’s an interesting article on how to represent a categorical feature, with 100’s of levels, in a model in R.

In this post, we will discuss using an embedding matrix as an alternative to using one-hot encoded categorical features for in modeling. We usually find references to embedding matrices in natural language processing applications but they may also be used on tabular data. An embedding matrix replaces the spares one-hot encoded matrix with an array of vectors where each vector represents some level of the feature. Using an embedding matrix can greatly reduce the memory needed to handle the categorical features.

My latest MSDN article is now available.

This month, I explore R and the TidyVerse.

Loading Data with readr The readr package provides a fast and easy way to read rectangular data files, such as .csv files. It can flexibly parse many types of data files, while handling errors robustly. To get started, create a new R language Jupyter Notebook. For details on Jupyter […]

Yes, you read the title correctly.

Keras and Deep Learning are not just for Pythonic peoples, R developers can play along, too. Here’s a great article on how to use Keras from R.

 This talk introduces you to using Keras from within R, highlighting the packages and supporting tools (and some unique tools) available that make R an excellent option for deep learning

Here’s an interesting read on the 4 most important big data programming languages: Python, R, Scala, and Java. While debates over programming languages tend to quickly devolve into shouting matches, this article seems quite level-headed.

Programming languages, just like spoken languages, have their own unique structures, formats, and flows. While spoken languages are typically determined by geography, the use of programming languages is determined more by the coder’s preference, IT culture, and business objectives. When it comes to data science, there are four programming […]

In my latest column in MSDN Magazine, I explore R and what makes it a powerful and elegant language for exploring and manipulating data.

A robust developer community has emerged around R, with the most popular repository for R packages being the Comprehensive R Archive Network (CRAN). CRAN has various packages that cover anything from Bayesian Accrual Prediction to Spectral Processing for High Resolution Flow Infusion Mass Spectrometry. A complete list of R packages available in CRAN is online at bit.ly/2DGjuEJ. Suffice it to say that R and CRAN provide robust tools for any data science or scientific research project.