Referring Columns By Name in Google Sheets Query()

How to query a Google Sheet by referring columns by names? This is an issue we experience in particular when importing external CSV files (Comma-Separated Values). I will show a better way to import and link CSV files in a Google spreadsheet. I will also explain a simple technique on how to query data from an external CSV using the powerful query() function to create more resilient and robust spreadsheets.

Google Sheets, an important component of the largest Google G Suite or Google Workspace, are becoming an essential tool for many people to automate repetitive tasks or to create MVPs (Minimum Viable Product). Many times, we find Google Sheets to be the quickest and most approachable tool to automate a new business process. In many cases, that involves importing one or more CSV files from an external systems (e.g. your eCommerce or CRM platforms) to generate reports and trigger certain actions. In a very dynamic quickly evolving context, the structure of the CSV files we need to import can frequently change. Any change to a CSV structure can disrupt our Google spreadsheets that are linking to it. But don’t worry, there is a simple technique to protect the consistency of a spreadsheet even when the CSV files structure changes significantly.

Can we update CSV files without breaking our formulas?

Importing CSV files into a spreadsheet makes new data accessible to popular formulas like lookup() or the most advanced query(). The latest is a de facto standard for the generation of complex reports inside a Google spreadsheet. While spreadsheets are very powerful tools, they suffer from many limitations and vulnerabilities. One recurring issue is the fragility of the query() formula. The SQL-like command inside the formula assumes that the CSV structure is static and doesn’t change when refreshing the content of the file. The assumption is that each field can be accessed by its column position, e.g. column B, column E. This is not always the case, as many of us learned in the hard way when importing the same CSV data. The sources of our data occasionally change the column position of fields by adding or removing columns.Every while, the source of our CSV even rename the columns with new labels. While these changes can be justified and presented to us as improvements, there is always the risk they are going to break our formulas forcing us to manually revise the entire spreadsheet, updating each field and formula with the new position. It’s a long, cumbersome process that I don’t really enjoy doing. That’s why I designed a system to make my spreadsheets more resilient to this kind of CSV changes.

A better way to import and query CSV files

Let’s describe a very simple Let’s describe a straightforward case of a spreadsheet with 3 tabs:

  • CSV” tab will host the imported data from a CSV file (sample data courtesy of SpatialKey)
  • Report” tab will contain a sample report generated by a query() function working on the CSV data
  • Finally, a “DataMap” tab where we will put a simple table to make our spreadsheet resilient to CSV changes.
The CSV tab with the data from the imported CSV file.
The Report tab shows the results of a query(0) to the data on the CSV tab.

To generate the simple report, I’m using a basic query to extract all the records where Country=’United States’. This is based on the assumption that the “Country” field will always be in column H of the CSV tab.

=query(
    CSV!1:999,
    "SELECT * WHERE H ='United States'", 
    -1)  

Every change that moves the field “Country” to a different column will break my formula.

Step1: Going from Column Letter to Column Position with Literal Arrays

A solution is to create a map that assigns the column position to a variable (a named range in Google Sheets) and then uses that variable in the query formula. To get there, we first need to change the way we refer to the field position to a literal array by enclosing the first argument of the query() formula in brackets. This will allow us to refer the columns by column number (Col1, Col2, …) instead of the column letter. Here is the same formula with the new notation:

=query(
    {CSV!1:999},
    "SELECT * WHERE Col8 ='United States'", 
    -1)  

Step 2: Implementing a Data Map

The next step is to create a map between the column headers of the CSV tab and the Column number. We will put this map in a tab called “DataMap”.

The DataMap tab mapping CSV labels to column positions

The items in the 2nd column are the column labels I plan to use for my queries. I don’t need to map all the headers, only the one I’m gonna use in my formulas.
The items in the first column are calculated by the spreadsheet using the following simple formula. What the formula does, is to search, using the match() formula, for a header in the CSV tab that is identical to the label defined on the cell on the right of the formula.

= "Col" & 
  match(
    B2, CSV!1:1, 0)

Step 3: Creating named range to use in our formulas

We need to create some “variable” to help us not only to write more resilient query() formulas but also to create formulas that are easy to understand than the one based on the column position. To do that, we add a 3rd column where we assign a name to each field. We can be very creative here, but I recommend sticking to a simple naming convention, like starting every field name with “F_” to make it immediately recognizable. Once we have completed the 3rd column, we can name each cell of the first column with those exact names. For example, the 8th cell of the first column (excluding the header) should be called F_COUNTRY. We can do that by using the “Define Named Range” command of Google Sheet, as shown in the picture below.

Naming each cell in the first column of the DataMap tab.

Step 4: Making query() formulas resilient and easier to read

Now we can go back to our query() formula and apply those changes to make the formula much easier to read and resilient to CSV changes. The new formula will be:

=query(
    {CSV!1:999},
    "SELECT * WHERE " & F_COUNTRY & "='United States'", 
    -1)  

This should do the trick. We can now move the “Country” column without breaking our formula. This should do the trick. If the “Country” column changes position inside the CSV file and tab, it will not break our formulas. Also, our spreadsheet can quickly adapt to label changes. If in the future the label “Country” changes in the CSV, e.g., to “Customer Country” we can promptly repair our spreadsheet by merely changing the Country label in our map (cell B9) as shown in the image below:

The updated DataMap now refers to the country with the label “Customer Country”.

Download the sample Spreadsheet

I’ve shared the spreadsheet described in this article and it’s now available to everybody. If you like to test the solution described in this article make a copy of the spreadsheet and make your own changes.

Conclusions

I hope this simple technique can help you create more robust and manageable spreadsheets. If you find this article useful or you have any recommendations on improving this solution, please let me know by leaving a comment in the area below.

Similar Articles