Reference Columns By Name in Google Sheets Query()

Google Sheets, a part of the largest G Suite, is becoming an essential tool for many people and companies working on an MVP (Minimum Viable Product) and is the first and most approachable tool to automate a new process. A spreadsheet is a critical tool to tune and consolidate a new business process until it stabilizes and reaches a scale where it would make sense to move to a custom system.
In many cases, that means to import one or more CSV files from other systems (eCommerce platform, CRM, etc.) to generate specific reports and trigger certain actions.

Importing CSV files can break our formulas

Importing CSV files into a spreadsheet makes the data from the file accessible to popular formulas like lookup() or the most advanced query(). The latest is a de facto standard for the generation of complex reports inside a Google Sheet. While a spreadsheet is a very powerful tool, it also suffers from all the limitations and vulnerabilities of a spreadsheet. One recurring issue is the fragility of the query() formula. The SQL-like command inside the formula assumes that the CSV structure is static and that each field can be accessed by its column position. This is not always the case, as many of us learned in a hard way when importing the same CSV data. The sources of our data occasionally change the column position of fields by adding or removing additional columns, and every while, they even rename the columns with new labels. While these changes can be justified and presented to us as improvements, they always break our formulas forcing us to revise the entire spreadsheet, updating each field with the new position. It’s a long, cumbersome process that I don’t really enjoy doing. That’s why I designed a system to make my spreadsheets more resilient to this kind of CSV changes.

A better way to import and query CSV files

Let’s describe a very simple Let’s describe a straightforward case of a spreadsheet with 3 tabs:

  • CSV” tab will host the imported data from a CSV file (sample data courtesy of SpatialKey)
  • Report” tab will contain a sample report generated by a query() function working on the CSV data
  • Finally, a “DataMap” tab where we will put a simple table to make our spreadsheet resilient to CSV changes.
The CSV tab with the data from the imported CSV file.
The Report tab shows the results of a query(0) to the data on the CSV tab.

To generate the simple report, I’m using a basic query to extract all the records where Country=’United States’. This is based on the assumption that the “Country” field will always be in column H of the CSV tab.

=query(
    CSV!1:999,
    "SELECT * WHERE H ='United States'", 
    -1)  

Every change that moves the field “Country” to a different column will break my formula.

Step1: Going from Column Letter to Column Position

A solution is to create a map that assigns the column position to a variable (a named range in Google Sheets) and then uses that variable in the query formula. To get there, we first need to change the way we refer to the field position by enclosing the first argument of the query() formula in brackets. This will allow us to refer the columns by column number instead of the column letter. Here is the same formula with the new notation:

=query(
    {CSV!1:999},
    "SELECT * WHERE Col8 ='United States'", 
    -1)  

Step 2: Implementing a Data Map

The next step is to create a map between the column headers of the CSV tab and the Column number. We will put this map in a tab called “DataMap”.

The DataMap tab mapping CSV labels to column positions

The items in the 2nd column are the column labels I plan to use for my queries. I don’t need to map all the headers, only the one I’m gonna use in my formulas.
The items in the first column are calculated by the spreadsheet using the following simple formula. What the formula does, is to search, using the match() formula, for a header in the CSV tab that is identical to the label defined on the cell on the right of the formula.

= "Col" & 
  match(
    B2, CSV!1:1, 0)

Step 3: Creating named range to use in our formulas

We need to create some “variable” to help us not only to write more resilient query() formulas but also to create formulas that are easy to understand than the one based on the column position. To do that, we add a 3rd column where we assign a name to each field. We can be very creative here, but I recommend sticking to a simple naming convention, like starting every field name with “F_” to make it immediately recognizable. Once we have completed the 3rd column, we can name each cell of the first column with those exact names. For example, the 8th cell of the first column (excluding the header) should be called F_COUNTRY. We can do that by using the “Define Named Range” command of Google Sheet, as shown in the picture below.

Naming each cell in the first column of the DataMap tab.

Step 4: Making query() formulas resilient and easier to read

Now we can go back to our query() formula and apply those changes to make the formula much easier to read and resilient to CSV changes. The new formula will be:

=query(
    {CSV!1:999},
    "SELECT * WHERE " & F_COUNTRY & "='United States'", 
    -1)  

This should do the trick. We can now move the “Country” column without breaking our formula. This should do the trick. If the “Country” column changes position inside the CSV file and tab, it will not break our formulas. Also, our spreadsheet can quickly adapt to label changes. If in the future the label “Country” changes in the CSV, e.g., to “Customer Country” we can promptly repair our spreadsheet by merely changing the Country label in our map (cell B9) as shown in the image below:

The updated DataMap now refers to the country with the label “Customer Country”.

Download the sample Spreadsheet

I’ve shared the spreadsheet described in this article and it’s now available to everybody. If you like to test the solution described in this article make a copy of the spreadsheet and make your own changes.

Conclusions

I hope this simple technique can help you create more robust and manageable spreadsheets. If you find this article useful or you have any recommendations on improving this solution, please let me know by leaving a comment in the area below.

Similar Articles