Tag Archives: data science

programming languages

Programming languages analysts should learn

1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 4.44 out of 5)
Loading...

Programming is a skill often associated only with Developers and not so much with analysts or data scientists. Why learn to code when you have tools such as Excel with it’s PivotTables, filtering and formulas. On the other hand, when thinking about analytic tools it seems like Tableu, ClickView, SAS or IBM SPSS is the way to go.programming languagesThese tools are great when working with Big Data. But most of us work with data sets counting not more than a few hundred thousand records. So why use a sledgehammer to crack a nut? Programming languages such as VBA, SQL or R let you squeeze much more of Excel and traditional tools .

VBA (Visual Basic for Applications) (MS Office)

USEFULNESS:
EASY TO LEARN:

Excel is the entry-level tool for most analysts and data scientists, allowing for easy data crunching tasks. In most cases formulas and PivotTables suffice in everyday analytics tasks. However, VBA extends Excels capabilities almost without limits – allowing for creating complex algorithms, extraction of data from various data sources, web scraping and load others. VBA owns it popularity to Excel beings its programming and testing environment at the same time. Write a piece of code and see it run in seconds.
If you want to progress as an analyst / data scientist – learn VBA.

One other perk to learning Visual Basic for Applications (VBA) is that the syntax is identical to VBscript. If you are not familiar – VBscript is a Windows scripting language working alongside Windows Batch Scripting (used in BAT files). VBscript can be used to write Windows scripts that don’t need Excel or MS Office installed on the workstation. VBA code can often be copied identically and used to create VBscript (with some differences like VBscript cannot include data types e.g. “Dim str as String” would raise an exception).

Example uses of VBA

programming languages vbaWhen VBA can prove useful?

Well there are ample examples to consider below. One of the most often proven ways to start is by using the Record Macro feature on the Developer tab.

  • Automate mundane Excel tasks e.g. formatting, editing etc.
  • Creating new custom Excel functions (UDF)
  • Implementing algorithms or complex formulas e.g. complex computations
  • Extending capabilities e.g. working with files, connecting with Internet resources
  • Creating Excel-based apps with User Forms and interaction

Code example

Here is a simple VBA script to show a simple message box with Hello World!

Sub HelloWorldProcedure()
  MsgBox "Hello World!"
End Sub

Where to start?

SQL (Structured Query Language) (RDBMS)

USEFULNESS:
EASY TO LEARN:

SQL lets you unleash the potential of database development. While Visual Basic (VBA is an implementation of Visual Basic) is a general-purpose scripting programming language, SQL is a special-purpose programming language- aimed at running queries and CRUD (Create, Read, Update, Delete) operations on relational databases such as MySQL, MS SQL, Oracle, MS Access etc. SQL is ubiquitous in the data world.

What is often not appreciated about SQL is that it supported by MS Office. Microsoft developed a database engine (JET.OLEDB and the newer ACE.OLEDB), that supports SQL, and can be used to query several Microsoft products such as MS Excel and MS Access (but also other formats like text files). Why is this important? SQL let’s you carry out many operations much easier and quicker e.g. merging Excel worksheets, filtering Excel worksheets on multiple complex conditions, linking Excel Workbooks with other Workbooks or Access databases. If you wanna learn more check out my Excel SQL Add-In

When will you want to reach out for SQL? Whenever you are analyzing/processing large data sets. Excel will typically start falling apart when processing 1000k record databases with the use of formulas, PivotTables etc.

Feel free to check-out this SQL Quick start learning video for Beginners:

Example uses of SQL

programming languages sqlSQL currently rules the world of data, even though NoSQL databases have enriched the database mix. Although many have announced the end of SQL times to be near, recent developments in approaches to programming such as micro-services and the overall ease of use and integrity of SQL database seem to negate these forecasts. Most datasets sit on SQL databases (with MySQL in the lead) so as an analyst sooner or later you will benefit from knowing how to query these datasets directly looking for insights and when building reports.

But does SQL come in handy when you are not dealing with large datasets or structured relational databases? Sure both in Access and Excel SQL can and should be used when dealing with data tables. Here are some of the main aspects SQL is valued for:

  • CRUD: SQL enables Create, Update, Update, Delete operations on relational databases (relations of tabular data sets)
  • Queries: SQL allows you to create queries based on multiple tables of data
  • Performance: SQL is fast and efficient for processing databases including large amounts of data

Code example

List all people at the age above 30, and with red eyes from a hypothetical People table :

SELECT * FROM People
WHERE Age > 30 AND EyeColor = "Red"

Where to start?

R programming language

USEFULNESS:
EASY TO LEARN:

R is a powerful programming languages for statistical computing and graphics. It’s one of the top programming languages used by data scientists (along with e.g. Python) out there (at least on Kaggle as I am aware). R has a moderately steep learning curve as its not a typical programming language, and was historically created by statisticians not developers, although it certainly is worth the effort. R focuses on operations on data frames (tables) and its whole design is oriented around this. What is more important in the case of R is its vast CRAN library of packages that provide additional statistical, graphics and machine-learning capabilities. R is also the go to language when processing so called Big Data. It make slicing and dicing data easy and pleasant.

Another nice thing about R is that it comes with a software environment (RStudio) similarly like you have the Visual Basic environment packed together with MS Office applications. This makes starting to work with R much easier, especially for beginners.

Here is a nice zero-to-hero learning series on R:

Example uses of R

Typically R is considered a great tool for the below. However, keep in mind that you can almost extend R with any capability by downloading the right package for R’s CRAN library e.g. Web Scraping etc.

  • Data exploration e.g. plotting statistical data relationships
  • Statistical data analysis
  • Data cleansing – correcting, formatting series of data e.g. CSV and text files
  • Machine-learning algorithms

Code Example

A simple function that introduces itself:

hello <- function( name ) {
  sprintf( "Hello, %s", name );
}
hello("World")

Just how easy it is to load CSV/text file data with headers:

data <- read.csv("data.csv", header = TRUE)
x <- c(1,2,3,4,5,6)   # Vector of numbers
y <- x^2  #Result: c(1, 4, 9, 16, 25, 36)

Where to start?

Here are some useful links to get you started with the R programming language:

Other programming languages worth mentioning

Now for some other honorable mentions worth exploring:

Python

USEFULNESS:
EASY TO LEARN:

Python is one of the most popular script programming languages out there currently. Especially valued for its ease of use and for being applicable to many environments and scenarios. Python is great for writing quick scripts to process files, data or automate easy tasks. It doesn’t have the overhead of object oriented programming languages and is very flexible. I personally use Python most often for Web Scraping or for automating Linux/UNIX tasks.

If you already know VBA and Excel is your primary working environment, Python might be of less value to you.

C#

USEFULNESS:
EASY TO LEARN:

C# is Microsoft’s flagship programming language and a rival to the all-popular Java. C# is a typical Object Oriented programming language and it’s primary software development environment is Microsoft’s Visual Studio. C# is very useful when making efficient and optimal solutions. Visual Studio is a big asset of C# as programming in VS is pleasant and easy. When to use C#? When you need a Graphical User Interface to put a face on your solution and when you want to “click out” most of your code instead of writing it.