Jacob J. Walker's Blog

Scholarly Thoughts, Research, and Journalism for Informal Peer Review

Archive for the ‘Excel & Spreadsheets’ Category

Big Excel Spreadsheet Tip: Don’t Use Smart Tags to Convert Text to Numbers

without comments

Don't do this for big spreadsheets!

Don’t do this!

I often work with really big Excel spreadsheets (>500,000 rows) to do various types of data analysis.  And often I’m grabbing data from databases (how else are you going to get such big spreadsheets???)  And a common thing that can happen, is that numbers will be copied in as being in a text format, and many math and lookup formulas won’t work on them.  One tempting way to convert these is to use a “Smart Tag” where, when you select all the numbers that are being treated as text, you get an ! in a diamond, that you can click on, and it gives you the “nice” option to just “Convert to Number”.   If your spreadsheet is over 10,000 rows, DON’T DO THIS.  It will take forever to complete…  (ok, not really forever, but possibly several days to complete.) Instead you should do the following:

Read the rest of this entry »

Written by Jacob Walker

March 10th, 2017 at 12:34 pm

Fixing Formula Auto-Fill / Replication / Calculated Column Problem in an Excel 2016 Query Table

without comments

One of the great new features of Excel 2016, is that it has Power Query and Power Pivot built-in.  This makes connecting to a database a LOT easier, where a database table, view, or query will get imported into an Excel table, and then new columns can be added to that table with formulas.  For folks like myself who are good with Excel formulas, this is often easier (and sometimes more powerful) than trying to have the SQL query itself do calculations.

But, I had run into a problem, which was that on one of my spreadsheets, every time I would refresh a query, all the new data from the query would not have the formula column automatically fill in like I was expecting.   And since I had a fairly complex spreadsheet, if I forgot about this problem, I would get bad data and reports….   So I needed to fix it.

Read the rest of this entry »

Written by Jacob Walker

December 7th, 2016 at 4:13 pm

Using Match and Index on a Pivot Table: Sometimes Excel has better Performance with “Old School” Formulas

without comments

I am a huge fan of pivot tables, and often use them to help transform raw data into a different table, by “mirroring” data from a pivot table, where I have a table to the left of the pivot table, and I want to have the table show the same value the pivot table has, or possibly transform it with a formula.  Often you can just choose the cell reference that the pivot table has and you are done…  But sometimes the pivot table may have multiple columns, that vary, and you wish to do some sort of lookup on the pivot table…  Traditionally, to do this, it would require creating a GETPIVOTDATA formula.  These have always been a bit complex, and now with PowerPivot being built-in to Excel 2016, they are even MORE complex.  This is why some people choose to use cube functions instead, which can now be done on PowerPivot tables….  But I have tested both of these, and they are very slow when you have a lot of rows.   So, I have gone back to an old school method of using the MATCH function (with absolute references) to determine which column a particular heading is in, and then use the INDEX function to select the value from that heading.  This old school trick turns out to be a LOT more efficient than using GETPIVOTDATA, and thus I thought I should share this with any other Excel nerds who read my blog.

Written by Jacob Walker

September 20th, 2016 at 6:59 pm

The Introduction/Background to my Revised Doctoral Research Proposal

with one comment

Compendium_of_Countries_LogoToday, I finished revising the first part of my doctoral research proposal, as there have been several underlying methodological and technological changes from the original proposal.  While I know doctoral research is usually not of general interest, I am still going to be posting the sections of my revised proposal as I finish them, for those who are interested.  Please feel free to ask questions if you have them, and I will do my best to explain statistical techniques or the technology, etc. that I’m talking about.

Read the rest of this entry »

Written by Jacob Walker

February 20th, 2016 at 6:12 pm

Macro to Find and Delete Blank Strings produced by Formulas

without comments

Some day I’ll write a “Top 10 list” about the problems with Excel, and workarounds, and when I do, this post will be on the list! While I know the title of this post sounds obscure, I run into a problem ALL THE TIME with Excel, that often I have a formula that if there is an error I want it to resolve to being blank, and I can kind of do this with =iferror(formula,””)  And while this looks blank on the screen, and many formulas will treat it as being blank, unfortunately both PivotTables and charts will not!  For years, I have solved this problem by manually going to each cell that has a blank, and pressing delete. (Because find and replace can’t do this!!)  But that wastes a lot of time, so today, I finally decided to take the time to create a macro to fix this, because in my searching on the Internet, I never found one.

Read the rest of this entry »

Written by Jacob Walker

December 30th, 2015 at 9:31 am

Python Script to Automate Refreshing an Excel Spreadsheet

with 3 comments

Often I run into situations where it makes sense to do analysis of a lot of database data in an Excel spreadsheet, but due to the amount of processing the spreadsheet requires when updating, it takes a long time for the spreadsheet to “Refresh All”.

One solution to this problem is to automate the spreadsheet so it refreshes every night.  The following is a small Python script that can do this using the Python for Windows Extension:

Read the rest of this entry »

Written by Jacob Walker

January 24th, 2015 at 9:42 am

Solution to Excel/Access/ODBC to PostgreSQL Integration Problems

without comments

In order to do more in-depth analysis of our student data, we have recently setup ODBC access to our online SIS, which is running on PostgreSQL.  My goal was to have Access connect to the PostgreSQL data source, so I could create queries (views) in access, which I could then import into Excel.  While Excel should easily connect to Access, as they are both Microsoft products, this has NOT been the case. But now that I have been able to figure out how to get it to work, I’m posting the solution for anyone else who might run there head into the same brick wall.

Read the rest of this entry »

Written by Jacob Walker

January 10th, 2015 at 2:40 pm

Excel VBA Code to Scrape California State Job Openings

without comments

I have become interested over time in web scraping and potentially using this to do a massive job market study to determine what education should truly focus on to ensure that we are providing the right skills to our students.   While I’d ultimately like to do this with a website like Indeed, and probably would use something like Python and PostgreSQL, I did a quick VBA script in Excel this morning to scrape the California Job Openings, putting a new opening per sheet, and I thought I’d share the code.

Read the rest of this entry »

Written by Jacob Walker

June 7th, 2014 at 9:27 am

A use for “Data Tables” in Excel

with one comment

One of the features of Excel that I really have never liked is that of “data tables”.  First, they sound too much like “tables” which is a newer feature that really is cool (yet, surprisingly my experience has shown that many otherwise knowledgeable Excel users still don’t know about them).    The other problem with the data table, is that it was really meant to be a what-if tool, but I have not found a situation yet, where I couldn’t create the equivalent of a data table through the use of mixed references within a formula, and the method of setting up a data table has always been tricky, as they use an array function.  (Yet unfortunately, they are still commonly taught, and Microsoft includes them in their advanced certification test.)

But, with my critique of the official data tables feature aside, the concept of having a table that includes numbers as headings to both the rows and columns can be quite useful at times.  And recently I ran into a very good use for this configuration, which is that I wanted to have a table to lookup some data within, but I needed to use 2 variables.  I usually solve this sort of problem by making a compound key by concatenating 2 or more fields into the first field, and then using a VLOOKUP with range_lookup set to FALSE.  But in the case of my problem today, I needed to have both variables be able to do an approximate lookup (in other words, find them within a range of values.)

So I solved this by making a group of data in the form of a data table (although I didn’t use the data table or =TABLE() function, nor did I make it an Excel table), where I had one variable as my row labels, and another set as my column labels.   I then combined an INDEX() formula with 2 MATCH() formulas within it, both of the match formulas having a match_type set to 1.

I know this probably doesn’t make sense to those who aren’t die hard Excel folks…  But, I have been working on a set of Libre Excel Tools for Data Science, which I’m putting together User Defined Formulas to make this sort of thing easier, and with my recent experience, I plan to create a User Defined Formula (UDF) that makes this easier to do.

 

Written by Jacob Walker

April 18th, 2014 at 11:59 am

Five Fundamental Principles about Learning & Teaching Spreadsheets (Excel)

without comments

A 3-D Icon for ExcelAs an instructor who has been teaching Microsoft Excel to students for over a decade, I have found key problems of learning to often occur with students due to the structure of curriculum.  The following five fundamental principles are those that I have found to be true for how students can best learn spreadsheets.

Read the rest of this entry »

Written by Jacob Walker

April 14th, 2014 at 11:59 am