Tag Archives: sorting

distinct list

Dynamic Distinct Column in Excel using Array Formulas

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.00 out of 5)

I see that often many users have issues with managing lists of values and translating them to unique/distinct lists where values do not repeat themselves. There are many way to tackle this problem and you would be surprised that there even is a formula to handle this task. Let’s dive into this subject.

What we have (unsorted list) Vs. What we want (distinct list of sorted values)

Let’s assume a simple table of data. Where column A is full of repeating Names and column B is a list of corresponding values. We will be working on this data set through out this post. This data set is a typical example of the issue we often have with repeating rows of indistinct data. What we will want to do is to somehow summarize this table with a list of distinct Names and aggregated Values.

What we have

Let’s assume we have a simple table of Names and Values. What we will want to do is to somehow summarize this table with a list of distinct Names and aggregated Values.

Name (A column) Value (column B)
Tom 60
Matthew 98
James 19
John 16
Matthew 45
John 26
John 70
James 60

What we want

The result of this operation should look somewhat like this:

Name Value
James 79
John 112
Matthew 143
Tom 60

Now let’s familiarize with 2 approaches to this issue.

Method 1: Distinct column using a PivotTable

The most obvious solution is of course to create a PivotTable from this data set. All we need to do is add the Names as ROWS and Values as VALUES to the Pivot to get a simple summary with the exact data we need. See below:

Distinct list: Pivot Table to aggregate Values
Pivot Table to aggregate Values

Method 2: Distinct column using Array Formulas

Now putting aside the obvious, in some cases Pivots are not the answer. Especially if we don’t want to use VBA Macros and want to create a dynamic table which will simply update itself with the latest Values and Names.

Provide list of distinct Names

The first issue which we stumble upon is to somehow produce a list of distinct Names. At first this may seem impossible to be done by a Excel function. Fortunately again Array Formulas can come in handy in this task. This site features the elegant solution to this problem which I will try to explain in much detail.

To provide a list of distinct Nameswe must use the following formula:


This is the final outcome when we hit CTRL+SHIFT+ENTER to make the formula an Array Formula and drag it down:

Distinct list: Distinct column of Names
Distinct column of Names

Now let’s ponder for a second on how the formula works as it might not be so straightforward as it seems:

'Gets an item from A$1:A$9 with an index provided by the MATCH function
'Find value "0" in the column of value provided by the COUNTIF function
'Returns an array. Count items from array A$1:A$9 if they are provided
'on the E$1:E1 list
'Notice that this list has only 1 static item. The first item is the 
'anchor, however, the second item will move to include all rows above
'the current one

The exciting thing is how the MATCH function is used above. Usually the MATCH function is provided with a range of cells. This time, however, we are providing it with an array being the result of the COUNTIF function. What the MATCH function does is search for the first item resulting from the COUNTIF that has 0 counts i.e. it has not yet been provided in the list.

Now the formula will start producing #N/A errors when dragged beyond the number of distinct elements in the Names columns. You can correct this by wrapping it with a IFERROR function:


Sort the list of distinct Names

Now that we know how to provide a list of distinct names we still need to make sure that the list is sorted alphabetically else it will be provided in the same order as the items on the initial list. Here we need again to resort to Array Formulas to help us with this tasks.

Let’s start with explaining how to get the index of each name (concept from Chandoo.org):

Get index of elements in our unsorted distinct list
Get index of elements in our unsorted distinct list

Notice the code in column E:


What we are doing is counting, for each distinct name, the number of items alphabetically of lower lexicographical value than our the current Name (D2).

Let’s now consider an Array Formula based on this:


If we hit F9 we will see that an Array Formula would evaluate to:


This is the correct sequence of our list of distinct items.

If we combine this with an INDEX-MATCH combo we can iterate through this sequence:

Sorted list of distinct Names
Sorted list of distinct Names

'Gets an item from D$2:D$5 with an index provided by the MATCH function
'Find value of the current ROW-2 (this is simply to sequence through 
'the arrary) in the column of value provided by the COUNTIF function
'Evaluates to {3;2;0;1} - this is the correct sequence of our distinct
'array of Names


Now to summarize what we have.

  • First, we provided the distinct list of Names in column D
  • Secondly, we provided a separate column which sorts the distinct Names in column D
  • Lastly, if we add a simple SUMIF function as shown below we can sum all values for each distinct Name in the Names column
Final data table
Final data table

Next Steps

Check out other similar posts:
EXCEL: Dynamic row numbers
EXCEL: 10 Top Excel features
EXCEL: Split columns on any pattern

Excel Top 10 Features

Top 10 Excel Features

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 3.80 out of 5)

Today on Top 10 Excel features you need to know. I am a frequent visitor to StackOverflow to see what is troubling the Excel community (and other developer/analyst communities). It is nothing incredible that many users have issues due to not being aware of some of the most significant features in Excel like PivotTables, Array Formulas, Tables and other. I have seen more than once users reaching out to use VBA macros instead of much easier PivotTables/Charts. Although, I value VBA it should also be the solution of last resort compare to “native” Excel features used more common.

I have decided therefore to list some of the top 10 features regarded often as the most significant and useful to know. So let’s start with the 10 Top Excel Features…

No. 1: PivotTables

You can’t call yourself an advanced Excel user without knowing about PivotTables! There is no other feature in Excel I being used more often and with success. Almost the first thing I always do when analyzing data is pivot the data in multiple manners and analyze the patterns/results. Pivots can process a large amount of data in a short period of time and are optimized in the Excel back-end for performance. This is definitely one of the 10 top Excel features!

PivotTables allow you to transform and analyze data in a structure manner. Just select a range of data (data in columns with headers) and select the ROWS, COLUMNS and VALUES for your Pivot Table! You can also create custom columns (based on formulas), summarize data by groups/rows/columns etc. There is almost no limit in the possibilities.

How to find it in Excel?

No. 2: Filtering and sorting data

Filtering and sorting your data is just as useful as using PivotTables. Excel is meant to transform and analyze data and filtering/sorting is one of the key elements. When provided with a table of data you will probably want to sort the data in a descending/ascending manner or filter out rows based on some features (values in certain columns). This is a must-know feature.

How to find it in Excel?

No. 3: Excel Tables

If you want your data tables to be neat and structured you need to use Excel Data Tables. What do you get when using Tables in Excel? A consistent structure and formatting of your entire data table, automated copied formulas (across columns), non-repeating column headers and more. It is always good to resort to Excel Data Tables as you will have less work managing your data table and can focus on more interesting work like data transformation/analysis.

How to find it in Excel?

No. 4: Conditional formatting

Analyzing/transforming data is important, but it is just as useful to be able to identify variances in a range of values using graphics like colors, bars or icons. Conditional formatting can allow you to notice patterns in data values which might not be obvious when looking at raw numbers.

How to find it in Excel?
Home->Styles->Conditional Formatting

No. 5: Lookup Excel functions

Although these functions are not exactly a separate feature they are considered to be one of the most useful and most often used when analyzing data. I can’t stress how often I stumbled upon articles/posts about these functions. These functions are also frequently used by recruiters for Excel jobs.

  • VLOOKUP – search the first column of a range of cells, and then return a value from any cell on the same row of the range
  • HLOOKUP – as above but for columns instead of rows
  • INDEX – returns the value of an element in a table or an array, selected by provided index
  • MATCH – searches for a specified item in a range of cells, and then returns the relative position of that item in the range

How to use these functions?
The VLOOKUP function will return a corresponding value from another cell of the same row of a value found in the first column of the data table.

The INDEX and MATCH functions are best to be used together (see the links section below). Why? They allow you to basically achieve the same result as the LOOKUP functions HOWEVER these are more flexible. I encourage you to read the links below.

No. 6: Array Formulas

Array Formulas are one of the greatest knowledge gaps in the Excel community in my opinion. I see so often questions which can be easily answered if someone at least made an effort to learn them. Many Excel users fall into the trap of writing a lot of custom VBA just because they are not aware or are too lazy to use a neat Array Formula.

How to use Array Functions?
Go to the links section for a decent tutorial. However, the process itself is quite simple:

  1. Create a function using an Excel range e.g. A1:A10

So easy and yet so powerful! Let’s jump into a simple example:

Excel Array Formula Example
Let’s assume we have a range of value for a certain period of time. We want to get the maximum value for dates after the 1st of March 2015. We can get this in a single Array Formula! See below.

10 Top excel features: Excel Array Formula Example
Excel Array Formula Example

How does it work?

'MAX ( returns all cells from A2:A8 for which B2:B8 are older than 2015-03-01)

See the logic? You can also multiply/divide/sum and do other cool thing with Array Formulas. See the tutorial link below.

No. 7: Data Analysis Excel Tools

All tools needed in doing basic data cleansing can be found in the Data Tools section of the Data ribbon. Working on data often? You need to know how and when to use Text to Columns, Data Validation and all the other neat tools.

How to find it in Excel?
Data->Data Tools

No. 8: Naming fields / NameManager

Naming cells/ranges comes in handy when you repeatedly reference certain cells or arrays e.g. USD/EUR currency field, interest rate used in formulas etc. This is a nice and clean way to manage all reference to those fields and allows you to easily relocate these cells or ranges.

How to set a Defined Name to an Excel cell/range?

  • Click on the text field in the upper-left with the cell/range reference
  • Type in Defined Name for this cell/range – it must not have whitespaces
  • Hit ENTER

Now when trying to reference the cell/range in a formula simply type the new Defined Name.

10 Top excel features: Setting a Defined Name to a Cell
Setting a Defined Name to a Cell

How to find it in Excel?
Formulas->Defined Names

No. 9: VBA macros and recording macros

The reason VBA is placed before last on the 10 Top Excel features list is because, in my opinion, it is so often abused but users who refuse to learn well the remaining Excel features. VBA macros fills the gap of all the missing features/functions in Excel. Macros let’s you program almost anything in Excel, you name it – forms, database connectivity, analytics, web browsing etc. You can’t basically consider yourself an Excel Pro without being able to program macros in VBA. However, it is important to stress that the problem with VBA is that once learned it tends to make analysts lazy – instead of Excel Array Formulas you will see custom macros or hideous UDF-functions. VBA should be used as a tool of last resort!

Typical applications of VBA

  • Cleansing/filtering/sorting/copying data
  • Custom algorithms (custom analysis of data sets)
  • Custom Excel UDF functions (User Defined Functions)
  • Excel Forms (making custom user forms to input data or create a custom UI)

How to find it in Excel?
Developer->Code->Visual Basic
Recording macros
The other fantastic thing with Excel is that it enables you to record macros – by recording your actions in Excel and translating them into VBA code. In some cases you need not even understand the code to be able to reuse it e.g. applying custom formatting to selected cells. This is definitely a great feature on it’s own.

How to find it in Excel?
Developer->Code->Record Macro

No. 10: Microsoft Power Add-In’s

Last but not least on the 10 Top Excel features list are the PowerPivot, PowerQuery and PowerMap powerful Microsoft developed Add-Ins for Excel. Harness the power of Big Data, SQL, complex pivots and charts with these fantastic Add-ins! The PowerMap is a relatively new member of the family delivering nice bells and whistles to your Workbooks!

PowerPivot basically pumps Excel with more Analytics features by extend the PivotTable with summarization, cross-tabulation, expanded data capacity, advanced calculations, ability to import data from multiple sources, and the ability to publish the workbooks as interactive web applications.

PowerQuery will allow you to easily harness data and access to external data sources such as files, the Web, databases etc. and more easily manipulate and cleanse data. PowerQuery enables you to process enormous data sources/tables counting millions of records (more than an Excel Worksheet can contain).

PowerMap (as quoted on the official MS site) is a three-dimensional (3-D) data visualization tool that lets you look at information in new ways. A power map lets you discover insights you might not see in traditional two-dimensional (2-D) tables and charts.

Both these tools complete one another. If you want to do Business Intelligence in Excel you need to be able to use both these Add-Ins. Microsoft seems to have great plans for them and I would anticipate that both these Add-Ins become a “native” part of Excel in the following versions of Microsoft Excel.

dynamic row numbers

Dynamic Row Numbers in Excel

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)

Today I want to elaborate shortly on how to correctly and easily number rows in Excel by adding dynamic row numbers using simple formulas. Every neat data table in Excel should have a numbering column in place so that every row can be easily reference at least by the item number. One way of numbering rows is to simply input the numbers and drag them down. However, this is a static manner and the numbering will not refresh automatically if you change the places of any rows or add/delete rows. Here I want to introduce several easy ways to achieve nice and neat dynamic row numbering in Excel. Ok so let’s dive right into 3 methods for achieving nice dynamic row numbers in Excel.

Method 1 – dynamic row numbering with w/o skipping empty rows

Let’s say we want to dynamically number rows counting also every empty row in between i.e. if there is an empty row between the numbers we want the numbering to account for that an increase the index. I use this approach most often due to it’s simplicity. See the example below on how this works.

Method 1 - Dynamic row numbers w/o skipping rows
Method 1 – Dynamic numbering w/o skipping rows

The formula is very simple:


A$1 is simply the header of the column to guarantee we start numbering from 1. Easy and neat right?

Method 2 – Dynamic numbering skipping empty rows

This time let’s account for every empty row in between. We want to continue the numbering from the last index. This comes in handy when you have sections of data or if you group the rows into different headers but want to retain the right numbering. See the example below:

Method 2 - Dynamic row numbers with skipping empty rows
Method 2 – Dynamic numbering skipping empty rows

Again the formula is dead simple:


We use the static $ marker to make the range start from always the first cell in the column. The formula will count all non-blank rows so will skip any blanks we leave in between.

Dynamic nested numbering

Now for a bonus – dynamic nested numbering. Sometimes we need to add numbering with nested indices. I certainly encourage using nested numbering as it makes many tables more clear to read and the grouping more obvious. The approach/formula below can easily be reused to support nesting of additional levels. Numbering nested indices manually is often a nightmare if we need to frequently rearrange rows or add/delete some rows in between. Unfortunately, no formula will know for us when to automatically increment the first few nested indices (e.g. 1.2.1 to 1.3.1), but we can automatically increment the last index in the nested numbering index. See below.

Dynamic nested row numbering
Dynamic nested numbering

We need to input manually add the first index e.g.

'Cell F2

Then below we can now use an automatic formula that we do the increment for us:

'Cell F3

We can repeat this process with the next indices e.g. 2.1, 3.1 etc. We will always need to type the first one manually but the formula can help renumber the subsequent indices automatically e.g. 2.2,2.3,2.4 etc.

Hope this helps you with generating those dynamic row numbers!

excel c# multithreading

Multithreading VBA using C#.NET

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)

Multithreading in VBA in Excel is possible! No wait… that can’t be true right? Almost everyone knows that VBA runs in a single thread. So am I kidding? Not quite! Although this may seems like a workaround there is actually a way to multithread your VBA algorithms. There is only 1 condition… you must be willing to rewrite it to .NET :). Yeah, now you know the catch. But either way this is not as hard as it seems and can prove really useful. You are probably also curious how efficient the native VBA code is vs. called C# dll methods. Below you will find an answer to both questions – how to introduce multithreading to your VBA code and how efficient will multithreaded C# methods be against single threaded VBA algorithms.

Multithreading VBA: Sorting an array of random values using Quicksort

Quicksort is one of the most popular sorting algorithms. It uses the “Divide and Conquer” strategy in order to sort a array of values, partitioning the array of value for sorting and applying the QuickSort algoirhtm to each partition recursively. Quicksort is therefore a natural and simple example for multithreading.

Below you can find a simple example of executing a native VBA QuickSort procedure and both a C# QuickSort sequential and parallel method.

Executing C# QuickSort in VBA

Declare Function CreateTestClass Lib "C:\[PATH TO DLL]\TestLib.dll" () As Object
Sub TestQuickSort()
  Dim i As Long, ar() As Long, testClass As Object, StartTime as Date, EndTime As Date, stp As Long
  Set testClass = CreateTestClass() 'Load DLL for the first time to reduce 1st execution overhead

  ReDim ar(0 To 100000)
  '---VBA sort---
  Call Randomize(100)
  Call RandArray(ar)
  StartTime = Timer
  Call QuickSort(ar)
  EndTime = Timer
  qTime = Format((EndTime - StartTime), "0.00")
  '---C# sequential sort---
  Call Randomize(100)
  Call RandArray(ar)
  StartTime = Timer
  Set testClass = CreateTestClass()
  Call testClass.QuickSortSequential(ar)
  EndTime = Timer
  qTimeC = Format((EndTime - StartTime), "0.00")
  '---C# parallel sort---
  Call Randomize(100)
  Call RandArray(ar)
  StartTime = Timer
  Set testClass = CreateTestClass()
  Call testClass.QuickSortParallel(ar)
  EndTime = Timer
  pqTimeC = Format((EndTime - StartTime), "0.00")

  Call MsgBox("VBA time: " & qTime & ", C# sequential time: " & qTimeC & ", C# parallel time: " & pqTimeC)
End Sub

Simple right? If you read my post on this is probably nothing new to you. However, as you can see the only setback is really the requirement to reference the C# dll. You can of course add it to your libraries and reference it in other VBA projects using “Tools->References”.

How are the C# methods implemented?

C# TestClass implementation (Unmanaged Exports)

using System.Collections.Generic;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace TestLib
    [ComVisible(true), ClassInterface(ClassInterfaceType.AutoDual)]
    public class TestClass
        [return: MarshalAs(UnmanagedType.SafeArray)]
        public int[] QuickSortParallel([MarshalAs(UnmanagedType.SafeArray)]ref int[] ar)
            return ar;

        [return: MarshalAs(UnmanagedType.SafeArray)]
        public int[] QuickSortSequential([MarshalAs(UnmanagedType.SafeArray)]ref int[] ar)
            return ar;

    static class UnmanagedExports
        [return: MarshalAs(UnmanagedType.IDispatch)]
        static Object CreateTestClass()
            return new TestClass();

To learn how to build C# dlls which can be referenced from VBA projects see my blog post here: EXCEL: Using C# in Excel VBA

Below you can find also the popular C# QuickSort implementation (from Google Summer of code 2008):

C# QuickSort implementation

public class ParallelSort 
        #region Public Static Methods 

        public static void QuicksortSequential<T>(T [] arr) where T : IComparable<T> 
            QuicksortSequential(arr, 0, arr.Length - 1); 

        public static void QuicksortParallel<T>(T[] arr) where T : IComparable<T> 
            QuicksortParallel(arr, 0, arr.Length - 1); 


        #region Private Static Methods 

        private static void QuicksortSequential<T>(T[] arr, int left, int right)  
            where T : IComparable<T> 
            if (right > left) 
                int pivot = Partition(arr, left, right); 
                QuicksortSequential(arr, left, pivot - 1); 
                QuicksortSequential(arr, pivot + 1, right); 

        private static void QuicksortParallel<T>(T[] arr, int left, int right)  
            where T : IComparable<T> 
            const int SEQUENTIAL_THRESHOLD = 2048; 
            if (right > left) 
                if (right - left < SEQUENTIAL_THRESHOLD) 
                    QuicksortSequential(arr, left, right); 
                    int pivot = Partition(arr, left, right); 
                    Parallel.Invoke(new Action[] { delegate {QuicksortParallel(arr, left, pivot - 1);}, 
                                                   delegate {QuicksortParallel(arr, pivot + 1, right);} 

        private static void Swap<T>(T[] arr, int i, int j) 
            T tmp = arr[i]; 
            arr[i] = arr[j]; 
            arr[j] = tmp; 

        private static int Partition<T>(T[] arr, int low, int high)  
            where T : IComparable<T> 
            int pivotPos = (high + low) / 2; 
            T pivot = arr[pivotPos]; 
            Swap(arr, low, pivotPos); 

            int left = low; 
            for (int i = low + 1; i <= high; i++) 
                if (arr[i].CompareTo(pivot) < 0) 
                    Swap(arr, i, left); 

            Swap(arr, low, left); 
            return left; 


As you will notice the above parallel QuickSort implementation utilizes the C# Parallel.Invoke method which allows you to easily leverage multithreading in C# without the hassle of having to manage thread joining etc. The Parallel C# class facilitates the following methods which make multithreading fairly easy:

  • Parallel.For – for iterating over integers
  • Parallel.ForEach – for iterating over any array of objects
  • Parallel.Invoke – basically invokes any number of actions (threads)

Using the For and ForEach linq methods are very straightforward making it very easy to mulithread exisiting singlethreaded algorithms. See a simple Parallel for example for building a multiplication table below:

Parallel.For(0, maxJ, i =>
            for (int j = 0; j < maxJ; j++)
                result[i, j] = i*j;

If you would like to see the VBA implementation of QuickSort see below. This implementation was based on this algorithm: Link

VBA QuickSort implementation

Private Sub QuickSort(ByRef values As Variant, Optional ByVal Left As Long, Optional ByVal Right As Long)
  Dim i As Long
  Dim j As Long
  Dim K As Long
  Dim Item1 As Variant
  Dim Item2 As Variant

  On Error GoTo Catch
  If IsMissing(Left) Or Left = 0 Then Left = LBound(values)
  If IsMissing(Right) Or Right = 0 Then Right = UBound(values)
  i = Left
  j = Right

  Item1 = values((Left + Right)  2)
  Do While i < j
    Do While values(i) < Item1 And i < Right
      i = i + 1
    Do While values(j) > Item1 And j > Left
      j = j - 1
    If i < j Then
      Call Swap(values, i, j)
    End If
    If i <= j Then
      i = i + 1
      j = j - 1
    End If
  If j > Left Then Call QuickSort(values, Left, j)
  If i < Right Then Call QuickSort(values, i, Right)
    Exit Sub
  MsgBox Err.Description, vbCritical
End Sub
Private Sub Swap(ByRef values As Variant, ByVal i As Long, ByVal j As Long)
  Dim Temp1 As Double
  Dim Temp2 As Double
  Temp1 = values(i)
  values(i) = values(j)
  values(j) = Temp1
End Sub

QuickSort Performance: Native VBA vs. C# sequential vs. C# multithreading

Now let’s analyze the performance of the various algorithms. The benefits of multithreading will be clearly visible when we’ll compare arrays of several sizes and sort them using the various algorithms:

Disclaimer: The following performance metrics are not precise as they simply measure the time that elapsed during code execution and do not reflect the actual processor time. What is more the QuickSort implementations are also slightly different which might also somewhat influence performance. The statistics can nevertheless be used to show the relative performance between VBA and C# (referenced from within VBA projects).

QuickSort: Native VBA vs. C# sequential vs. C# multithreading
QuickSort: Native VBA vs. C# sequential vs. C# multithreading

While it takes almost 15 seconds to sort 2 mln elements with the VBA singlethreaded QuickSort, the C# singlethreaded algorithm requires only slightly above 1 second and the multithreaded algorithm required approx. half a second! For me it was a real surprise that even without multithreading the C# algorithm was sooo much faster than the native VBA algorithm (over 10-12x). In case of the multithreaded algorithm there are of course several performance parameters to consider like the “SEQUENTIAL_THRESHOLD” parameter.

Conclusions: Multithreading in VBA possible, but not in native VBA

As you can see multithreading in VBA is possible if you export your VBA algorithms to C# and reference them from inside your VBA projects. As shown above you can expect significantly improved performance even in singlethreaded algorithms and leverage additional performance boosts thanks to multithreading!. I know that this may seem like a little unsatisfactory – as it’s not actually native VBA multithreading. On the other hand Excel is not a tool designed and appropriate for implementing processor intensive algorithms. What is more, even if VBA multithreading were possible, I hope you noticed that either way even the singlethreaded imported C# method was still much faster (10-12x) than the one in VBA.

You may rarely experience such situations in which your VBA algorithms will take more than a couple of seconds to execute. However, for those of you working in analytics – hopefully this will be a useful tool.

Let me know what you think!

Be sure to checkout a comparison of all available approaches to VBA multithreading:
EXCEL: Multithreading VBA – VBA vs. VBscript vs. C#.NET.

Here you can download the Excel xlsm file and the C# Project: