Tag Archives: file

Merge CSV files or TXT files in a folder – using Excel or CMD

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 5.00 out of 5)
Loading...

Often we face the challenge of having to merge csv files or txt files in a folder, into a single file. Excel is the obvious tool for such tasks and today I will show a couple of easy ways for merging multiple files, in a single or even a whole structure of folders, into a single CSV or text file. To merge csv files or other text files it is often best to use Visual Basic for Applications in Excel.

Let’s start with the simplest approach using Windows Command line without having to use Excel.

Merge CSV files using Windows CMD

This approach uses the Windows Command line Copy command.

Open the folder which should contain your CSV or TXT files

Open in Windows Explorer the folder containing CSV or TXT files to be merged. These should be without headers or only the first file should be with headers.
merge CSV files Windows CMD

Open CMD command line within folder

Click on the filepath of the Windows Explorer window and type cmd and hit ENTER.
merge CSV files Windows CMD

Merge the files using COPY command

The CMD Windows command line Window should open. Type the following command and hit ENTER to merge files

copy *.csv merge.csv

merge CSV files Windows CMD
The result will be the newly created merge.csv file with merged data across all CSV files within the directory.

Simply replace *.csv with *.txt to merge text files instead of CSV files

Merge list of csv, txt files

The previous method was very simple and didn’t require the use of Excel or MS Office. The below and the following approaches will provided you with more flexibility when merging files. If you don’t know how to use Macros in Excel read my Tutorial first.

Assuming you want to merge a list of files in a String Array you can use the procedure below. It will merge all provided csv or text files into a single new text file.

Sub MergeFiles(fileNames() As String, newFileName As String, Optional headers As Boolean = True, Optional addNewLine As Boolean = False)
    Dim fileName As Variant, textData As String, fileNo As Integer, result As String, firstHeader As Boolean
    firstHeader = True
    For Each fileName In fileNames
        fileNo = FreeFile
        Open fileName For Input As #fileNo
        textData = Input$(LOF(fileNo), fileNo)
        Close #fileNo
        If headers Then
            result = result & IIf(addNewLine, vbNewLine, "") & IIf(firstHeader, textData, Right(textData, Len(textData) - InStr(textData, vbNewLine)))
            firstHeader = False
        Else
            result = result & IIf(addNewLine, vbNewLine, "") & textData
        End If
    Next fileName
    fileNo = FreeFile
    Open newFileName For Output As #fileNo
    Print #fileNo, result
    Close #fileNo
End Sub

Merge csv, txt files example

See an example below of how to use the MergeFiles procedure:

Dim fileNames(0 To 1) As String
fileNames(0) = "C:\somefolder\test.csv"
fileNames(1) = "C:\somefolder\test1.csv"
    
MergeFiles fileNames, "C:\Merged.csv", True, False

MergeFiles Parameters

fileNames()
Array of Strings representing full file paths to files that are to be merged

newFileName
The name of the new merged file that is to be created

headers
Optional. True by default. This is meant for CSV TXT files (HDR). If True assumes that all files have headers (first row with columns). Only first header will be merged into the new file (newFileName)

addNewLine
Optional. False by default. If True a new line (vbNewLine) character will be added between each merged file

Merge csv, txt files within specified folder

Another case is when you want to merge all csv files within a single folder. This procedure is similar to the previous one with the exception that it runs through all files within a single directory (excluding subdirectories – for that scroll to next procedure). You can also use wildcards such as “*.csv” to be sure that only csv files are merged a not other files – read my post on the VBA Dir function to learn more.

Sub MergeFilesInFolder(folderName As String, newFileName As String, Optional headers As Boolean = True, Optional addNewLine As Boolean = False)
    Dim fileName As Variant, textData As String, fileNo As Integer, result As String, firstHeader As Boolean
    firstHeader = True
    fileName = Dir(folderName)
    Do Until fileName = ""
        fileNo = FreeFile
        Open (Left(folderName, InStrRev(folderName, "\")) & fileName) For Input As #fileNo
        textData = Input$(LOF(fileNo), fileNo)
        Close #fileNo
        If headers Then
            result = result & IIf(addNewLine, vbNewLine, "") & IIf(firstHeader, textData, Right(textData, Len(textData) - InStr(textData, vbNewLine)))
            firstHeader = False
        Else
            result = result & IIf(addNewLine, vbNewLine, "") & textData
        End If
        fileName = Dir
    Loop
    fileNo = FreeFile
    Open newFileName For Output As #fileNo
    Print #fileNo, result
    Close #fileNo
End Sub

Merge csv, txt files in folder example

See an example below of how to use the MergeFilesInFolder procedure:

MergeFilesInFolder "C:\somefolder\*.csv", "C:\MergedFolder.csv", True, False

MergeFilesInFolder Parameters

folderName
A folder including all files to be merged. Wildcards are permitted if supported by the VBA Dir function

newFileName
The name of the new merged file that is to be created

headers
Optional. True by default. This is meant for csv files (HDR). If True assumes that all files have headers (first row with columns). Only first header will be merged into the new file (newFileName)

addNewLine
Optional. False by default. If True a new line (vbNewLine) character will be added between each merged file

Merge csv, txt files within all subfolders

The most complex case is when you want to merge files not only within a certain directory but also within all subdirectories. This will equally work for a scenario when there are no subfolders.

Sub MergeFilesInSubFolders(folderName As String, pattern As String, newFileName As String, Optional headers As Boolean = True, Optional addNewLine As Boolean = False)
    Dim fileName As Variant, folder As Variant, textData As String, fileNo As Integer, result As String, firstHeader As Boolean
    firstHeader = True
    Dim col As Collection
    Set col = New Collection
    col.Add folderName
    TraversePath folderName, col
    For Each folder In col
        fileName = Dir(folder & pattern)
        Do Until fileName = ""
            fileNo = FreeFile
            Open (Left(folder, InStrRev(folder, "\")) & fileName) For Input As #fileNo
            textData = Input$(LOF(fileNo), fileNo)
            Close #fileNo
            If headers Then
                result = result & IIf(addNewLine, vbNewLine, "") & IIf(firstHeader, textData, Right(textData, Len(textData) - InStr(textData, vbNewLine)))
                firstHeader = False
            Else
                result = result & IIf(addNewLine, vbNewLine, "") & textData
            End If
            fileName = Dir
        Loop
    Next folder
    fileNo = FreeFile
    Open newFileName For Output As #fileNo
    Print #fileNo, result
    Close #fileNo
End Sub

Function TraversePath(path As Variant, allDirCollection As Collection)
    Dim currentPath As String, directory As Variant
    Dim dirCollection As Collection
    Set dirCollection = New Collection
     
    currentPath = Dir(path, vbDirectory)
    'Explore current directory
    Do Until currentPath = vbNullString
        If Left(currentPath, 1) <> "." And Left(currentPath, 2) <> ".." And _
            (GetAttr(path & currentPath) And vbDirectory) = vbDirectory Then
            dirCollection.Add path & currentPath & "\"
            allDirCollection.Add path & currentPath & "\"
        End If
        currentPath = Dir()
    Loop
     
    'Explore subsequent directories
    For Each directory In dirCollection
        TraversePath directory, allDirCollection
    Next directory
End Function
Read this post to learn more on using the VBA Dir function to traverse directories and subdirectories

Merge csv, txt files within subfolders example

See an example below of how to use the MergeFilesInSubFoldersprocedure:

MergeFilesInSubFolders "C:\somefolder\", "C:\MergedSubFolders.csv", True, False

MergeFilesInSubFolders Parameters

folderName
A folder with or without subfolders including all files to be merged. Use wildcards with pattern parameter

pattern
If needed a pattern using wildcards permitted by the VBA Dir function

newFileName
The name of the new merged file that is to be created

headers
Optional. True by default. This is meant for csv files (HDR). If True assumes that all files have headers (first row with columns). Only first header will be merged into the new file (newFileName)

addNewLine
Optional. False by default. If True a new line (vbNewLine) character will be added between each merged file

Merge CSV files – filter records

Sometimes we want to download just a subset of records in our CSV files. One way is uploading the data and then filtering it in Excel. But why not do it in one go? See my SQL AddIn or my read CSV file using SQL example in this post here.

VBA Dir function – How to traverse directories?

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 5.00 out of 5)
Loading...

The VBA Dir function is crucial to exploring any file system with the use of VBA – whether in Excel or other MS Office applications. In some case true you can resort to a Select File Dialog, however, in other cases you need to be able to easily match files or directories in file systems in VBA. Today, let us learn how to use the Dir function.

VBA Dir Syntax

The Excel Dir function returns the first filename (directory) that matches the path and the provided attributes. To retrieve subsequent filenames (directories) that match the path and attributes call the Dir function again without providing ANY arguments.

The VBA Dir function syntax
The VBA Dir function syntax

The Dir function by default DOES NOT TRAVERSE the entire directory structure provided by the given path. It will match filenames (directories) matching the path and attributes within only the given path. Read on to learn how to use Dir to traverse the whole directory structure.

VBA Dir example

Let’s start with a simple example before we jump into more advanced examples. Let’s see how to explore the C:\Root\ directory:

VBA Dir Example Directory
VBA Dir Example Directory

Notice that I highlighted on purpose both calls to the Dir function. The first Dir(path) call will return the first filename within the C:\Root\ directory. Any subsequent Dir() (notice no arguments) calls will return next files under the directory. Also as I have not provided any attributes the Dir function will only match against vbNormal (normal files)!
VBA Dir Example1

Using attributes

Assuming the same directory structure as in the previous example let’s change the default vbNormal attribute to vbDirectory. This will allow us to explore not only file matched under the given path but also the directories.

Notice that now we should be seeing, aside from our file, also directories being listed. The output:
VBA Dir Example 2

Similarly you can use the other VBA Dir attributes to search for hidden, system files, aliases and volumes. Most attributes include normal files so be aware the the files listed WON’T BE LIMITED to just e.g. hidden files.

Using wildcards

The VBA Dir function works great also with wild cards:

  • * – multiple character wildcard, including zero-length
  • ? – single character wildcard

Some common examples:

Traversing directories

The VBA Dir function unfortunately by default does not traverse the entire directory, exploring subfolders. In the examples we were limited to just listing the files and folders of the C:\Root directory:
VBA Dir Directory
A simple recursive VBA Dir implementation won’t do either as the Dir function is reset whenever it is called globally with any arguments. Fortunately, we can use a native VBA object called the VBA Collection. We will stack all subsequent directories onto our Collection and then explore them recursively. The TraversePath function will do just that:

Let’s try it out on our C:\Root directory example:

The TraversePath should not traverse all directories within our Root directory and list any normal files and folders:
VBA Dir Example 3

Working with XML files in VBA (VBA XML)

1 Star2 Stars3 Stars4 Stars5 Stars (13 votes, average: 4.92 out of 5)
Loading...

XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.

Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.

Loading XML document in VBA

The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.

Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.

Alternatively load an XML from a string:

That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.

XML DOM nodes in VBA

For the below I will use the following examples XML:

The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:

Node Reference Type Description
parentNode [XDoc.DocumentElement] The parent node, one node higher in the DOM hierarchy
firstChild [XDoc.DocumentElement] The first child node, first node lower in the DOM hierarchy
lastChild [XDoc.DocumentElement] The last child node, last node lower in the DOM hierarchy
childNodes [Array of type XDoc.DocumentElement] All child nodes of the current node, all nodes lower in the DOM hierarchy
nextSibling [XDoc.DocumentElement] Next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node
previousSibling [XDoc.DocumentElement] Previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node

All the above references allow you to free move within the XML DOM.

ChildNodes

Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.

This is the result

Traversing through the whole XML in VBA

Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).

This is the result:

Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.

XML Document example node references

Now that we have a hang of our XML document, based on the example XML I provided above I mapped a reference to how to obtain various elements of our XML file by using node references:

  • DistributionLists [FirstChild]
    • List [ChildNodes(0)]
      • Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
      • TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
      • CC: Jane;Ashley
      • BCC: Brent
    • (…)

    • List [ChildNodes(1)]
      • Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
      • TO: Huck;Ashley
      • CC: Tom;Andrew
      • BCC: John;Seema

XPath in VBA

Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.

If you want to learn XPath I can recommend this overview:
https://www.w3schools.com/xml/xpath_syntax.asp

Now let’s jump into an example:

Example 1: Extract all Lists

Example 2: Extracting all TO fields

Example 3: Extracting the first and last Name field

Example 3: Extracting all child List nodes (Name, TO, CC, BCC)

XML Attributes in VBA

Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.

Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.

Creating XML documents

Creating documents is also quite straight forward in VBA.

Excel XLSB vs XLSX file format. The Pros and Cons of XLSB Files

1 Star2 Stars3 Stars4 Stars5 Stars (8 votes, average: 4.63 out of 5)
Loading...

An XLSB file is a Microsoft Office Excel Binary file. Information inside XLSB file is stored as binary data compared, while in usual XSLX files data is stored as XML files which are then compressed.

Working with large Excel files is often a drag. They open slower, they take an eternity to save and they often need to be uploaded to the Cloud to be shared with your coworkers or family. Why not explore the benefits of the XSLB file format then (sometimes mixed up with XLSB)?

What is an XLSB file format?

This is what is inside an XLSX (or XLSM) file (image on the right). What do you mean inside? – you ask.

Inside an XLSX file
Inside an XLSX file

XLSX and XLSM files are in fact compressed archives with XML files inside. That is because Microsoft has opened the Excel file format and decided to break the insides into XML files. When an XLSX or XLSM file is saved Excel needs to break it down into separate XML files, compress it and finally save it as XLSX or XLSM.

The XLSB file format on the other hand is a binary Excel file. It resembles the old XLS file format which was also a binary file.
What happens when Excel wants to save an XSLB file? Not much really. It mostly dumps the binary data into a binary file and saves it as XLSB.

XLSB advantages (XLSB vs. XLSX)

  • XLSB files are smallerSmaller file size – the Excel binary file uses noticeably less space. This is more noticeable especially when working with large Excel files. In some cases I heard of there being XLSB files that required 10% of the original file size – this is more visible with VERY large files
  • XLSB files save or open fasterOpens/saves more quickly – loading binary data is faster than parsing text (XML) files – similarly as you would compare opening a book in Spanish and having to translate every sentence to English as opposed to picking up a ready translated copy. Similarly, when saving the file – the binary format is more efficient than dumping the data back into the XML and then compressing it. From my experience XLSB files open and get saved 2x faster
  • XLSB files support longer formulasSupports formulas above the 8192 character limit. In other file formats they don’t save properly

Yahoo! Well it would seem there is nothing less obvious to do then to start working only on the XLSB file format. However, it’s not a straightforward decision as there are some minor setbacks. Here are some that come to mind…

XLSB disadvantages (XLSB vs. XLSX)

  • Compatibility – the XLSB Excel format is not supported by Excel 2003 and earlier versions, which frankly is less of a problem nowadays
  • Security (VBA) – with the distinction between the XLSM and XLSX format you know which files may or not contain VBA macros. With XLSB you won’t know for sure. So beware when opening XLSB files from unknown sources or from people/websites you don’t trust
  • You can’t make changes to the Excel Ribbon when working on an XLSB. You must temporarily save your file as XLSX or XLSM, makes changes and save back as XLSB.
  • Lack of interoperability with third-party tools. XLSB is a binary file format unlike the open XML XLSX and XLSM files. Hence you often won’t see your XLSB files working everywhere – like in OpenOffice

Other suggestions when working with large Excel files or datasets

I usually start with the quick wins suggestions and leave the more complex for desert. The XLSB file format is a good start and often won’t require that you meddle with the data/formatting/content of your workbook, rightfully as you shouldn’t have too. However, from time to time there will be those moments when that won’t be enough and you just won’t be able to work any longer with a slow and large Excel file. Here are some useful tips:

  • Reduce the file size by deleting unused cells
  • Save data files without formatting – formatting may account for a lot of storage space and if you are simply working with a dataset and don’t need formatting save your file in .xml format
  • Save data files without formatting – formatting may account for a lot of storage space and if you are simply working with a dataset and don’t need formatting save your file in .xml format or as a .csv
  • Turn automatic calculations off – often even not so large Excel files cause Excel to freeze or crash. That may be because of an abundance of Excel formulas having to recalculate each time a change is made etc. Try to turn Calculation Options to Manual to get rid of this problem
  • PowerQuery Add-In – Excel has its limitations when working with large datasets. The Excel PowerQuery Add-In by Microsoft has been designed to handle Big Data and complex data queries to external databases or datasources. PowerQuery will allow you to work more efficiently with these data sets and will not limit the amount of records you can handle within a single Excel Worksheet

Common myths about XLSB

Do XLSB runs formulas faster than XLSX?

  • Not true

XLSB file are only loaded and unloaded faster (saved and closed) than XLSX files. Afterwards both formats run in RAM memory with similar performance on the same Excel engine. Hence, you won’t see your Excel formula’s running significantly faster. However, if you do save your file frequently you will definitely notice that the XLSB file format saves faster.

Do XLSB files crash more often?

  • Not true

The XLSB file format does not in any way increase the probability of crashing. However, if the file does crash it may be harder to recover. XLSB are binary files, where XLSX and XLSM files are in fact compressed XML files – text files in XML format. Therefore, in a critical situation you have definitely a better chance of reading a text file than a binary file. Then again I wouldn’t worry about this too much.

Can XLSB files be opened only via Excel?

  • True

Binary Excel files are (usually) not supported by other Office Suites (such as Open Office). This is because by introducing the *X formats (docx, xlsx etc.) Microsoft introduced an open file format that allowed these other programs to open as well as edit these files more easily than before. For this reason be aware when working with the binary Excel format when working with collegues who might want to use alternative programs to collaborate with you.

Conclusions

On a daily basis I would recommend sticking to XLSX and XSLM as standard Excel file formats. It is worth reaching out to the XLSB file format whenever you file starts running slow or uses an enormous amount of space.

XLSB will actually not benefit small Excel files and you might even see small XLSB files taking more space than small XLSX/XSLM files. Your clients / coworkers may also have doubts when opening XLSB files as Excel treats these file formats with an extra dose of caution.

Use the XLSB file format MAINLY with very LARGE Excel files.

Writing files in VBA (txt, xml, csv, binary)

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 4.80 out of 5)
Loading...

write file vbaContinuing on the topic of working with files using VBA. Text/binary files are common ways of storing data as opposed to databases or regular Excel files.

Looking at various resources I missed a single resource which would demonstrate the various methods for PROPERLY writing files in VBA. It is important to remember that you shouldn’t write all fills using the same approach. Be aware of the structure of the file. If you want performance – always select the right approach.

Writing text files in VBA

Approach #1: The Write function (strings in quotes)

Writing strings of data to text files with quotes:

This approach will result in the following output:

Notice the “” quotes. Writing further strings will result in appending further strings wrapped in double quotes which might not be a satisfactory approach. Look down to the Print function to avoid these quotes.

To Append to the text file instead of overwriting replace For Output with For Append in the Open fileName For Output As #fileNo

Approach #2: The Print function (dumping strings without quotes)

Writing strings of data to text files without quotes:

This approach will result in the following output:

This time there are no quotes in the file. In most case you will want to use the Print function instead of the Write function

Writing binary files in VBA

With Binary files often you will be using objects which are not of fixed byte length like Integers. For example you would want to save Strings. In such cases use the VBA Type object data type.

Below a simple example using the Type data type to save an object with an Integer and String.

Writing CSV files in VBA

If you want to save your worksheet as CSV you can always resort to the Text file section. However, this is an ineffective approach. If you want to save your current worksheet as a semicolon delimited file (‘;’) you can use the Workbook.SaveAs function like below.

In case you want to instead copy your worksheet simply replace Move for Copy:

Functions needed to write files in VBA

Function Description
FreeFile Get next free file number available for the Open statement / FileOpen function. Using this function is important especially when operating on multiple files simultaneously.
More info here.
BOF(fileNumber) Returns true if you are at the beginning of the file described by the file number.
More info here.
EOF(fileNumber) Returns true if you have reached the end of the file described by the file number. More info here.
Loc(fileNumber) Returns the current read/write position within an open file. More info here.
LOF(fileNumber) Returns the size in bytes of the file represented by the file number. More info here.

Write file VBA summary

Writing files in VBA is not hard and usually takes just a couple of lines of code. It makes sense however to do it write and be aware of certain traps like not omitting the use of the FreeFile function or knowing the difference between Print and Write.