Jacob J. Walker's Blog

Scholarly Thoughts, Research, and Journalism for Informal Peer Review

Archive for the ‘Excel & Spreadsheets’ Category

Happy Pi Day! Using Excel as a Music Synthesizer

without comments

As some of my nerd friends know, today is Pi Day. (3.14)  (Of course next year will be the BIG celebration because it will be 3.14.15!)

This year I was talking with my older daughter about how music and Pi are quite related, since all sound are waves, and waves are based on sine waves, and sine waves are based upon Pi…  So I wanted to show her (and also my students) how to create a formula in Excel for a wave, and how to graph it and then convert it to music.

Read the rest of this entry »

Written by Jacob Walker

March 14th, 2014 at 9:27 am

I accepted the position of Full-Time Faculty of Information Technology at Heald College Online.

with one comment

I am excited to share that today I signed my paperwork to accept becoming a professor with Heald College Online, where I will be instructing information and communication technology courses, along with helping Heald Online with various ICT projects.

Heald has a long history of being a respect business college, and recently received Senior College accreditation through WASC so it can start to offer Bachelor degrees.   Heald Online is much newer, only recently offering full programs online, and I am excited to be part of what is in many ways similar to a start-up.

Although Corinthian Colleges, the company that now owns Heald, has had some issues; from what I can see so far, it appears that there is work being done to solve these issues, and that the people I’m working with are working to provide excellent education.

Written by Jacob Walker

December 26th, 2013 at 8:53 pm

I am now a Certified Microsoft Office Specialist Expert for Excel 2010

with one comment

Today I took and passed my first Microsoft Office Specialist exam, for the Excel 2010 Expert certification.  I have used Excel for years in advanced ways, and have also taught it for many years, including some advanced courses.  Thus, when I recently posted a profile on Talent Exchange, I had listed myself as an Expert.  I was a little hesitant to write that at the time, as I know that there is a lot more that I could still learn, but I felt that my skill level was definitely within the top 10% of users, and probably higher.

Taking the test today gives external verification to my skill level, and I am very happy that not only did I pass the Expert exam, I had a perfect score of 1000 points!  Thus, my self-judgement was reasonable.  (Although, I make no claim that I will score at such a high level if and when I take the Office 2013 expert exams.)

For those interested in taking the exam, I recommend reading the official MOS 2010 Study Guide, and taking the GMetrix practice tests, although I found some flaws in the practice tests, which I’ll be working with GMetrix to rectify.  Thankfully, these same flaws were not in the actual test (at least not the one I took).  But even with the flaws I found in the GMetrix tests, I think I did much better on the final test because I had taken the practice tests.

Written by Jacob Walker

December 18th, 2013 at 4:43 pm

Creating a Link to USPS Zip+4 Lookup in Excel

without comments

I am working on mailing a large number of organizations (which I will talk more about soon in another post), and the list I have in Excel of their mailing addresses often doesn’t include the Zip+4, which I originally thought was needed the business rate for First Class Mail. (Since my original posting, I read the “fine print” more and it seems that a standard ZIP code is sufficient for the rate, as long as I can put the appropriate bar code on the envelope.)

While I would ultimately like to make a full web scraper to automatically download the Zip+4, unfortunately Excel has limitations about how User Defined Functions (UDFs) and QueryTable can work with each other (specifically, I want to create a temporary sheet to download a URL into, to then get a particular cell, and UDFs just can’t do that.)

So as a temporary workaround until I can program something either in Python or write some VBA code that isn’t a UDF, I have created a formula that creates a link to the USPS Zip+4 Lookup webpage.  For those who are interested, here is the formula:

=HYPERLINK(“https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=0&companyName=&address1=”&SUBSTITUTE([@ADDR],” “,”+”)&”&address2=&city=”&SUBSTITUTE([@CITY],” “,”+”)&”&state=”&[@STABBR]&”&urbanCode=&postalCode=&zip=”,”USPS Lookup”)

I should note that this seems to handle having address1 and address2 combined in the same field of ADDR.  You could modify the formula a little if you have separate fields in your spreadsheet for address1 and address2.

Update: It looks like if you do this too many times you get the following message:

To learn about integrating the free Postal Service® Address and Tracking API’s into your application, please visit www.usps.com/webtools.

I will investigate this more, and see how the API might work with Excel.

Written by Jacob Walker

November 19th, 2013 at 5:39 am

Exporting my Wii Fit Data using ScraperWiki

without comments

I have been using Wii Fit Plus (Off and on) for several years, and thus it has some the best longitudinal data about my fitness (especially my weight).  I would like to export the data from the Wii and be able to analyze it in Excel.  Several people have worked on doing this, and one person started to put an easy to use converter on the web, but it looks like it has been taken down.   So, basically at this point, it looks like all the components exist, but that it will take a little coding to get them together for myself.

So given this fact,  I’m starting a new scraper on ScraperWiki that will ultimately extract My Wii Fit Plus Data, and allow others to extract their data also.  To create this scraper, I’m going to try and use the code from the wiifit project, along with other code.

Written by Jacob Walker

May 25th, 2013 at 3:41 pm

My Python Learning Plan

without comments

In my last post I discussed about how I plan to integrate Excel and Python.  With that knowledge in mind, I now have been able to come up with the general order that I should proceed to learn Python while developing several data analysis projects (with my doctoral thesis from UNISA being one of these projects).

Here is the general order and process that I will likely do things:

  1. Learn how to write a web crawling and scraping Python script.  I bought the eBook of the Bad Data Handbook from O’Reilly, in which Chapter 5 is all about Web Scraping best practices, and has Python Code.  I will also use the ScraperWiki to help as a resource.
  2. Learn how to connect this scraping to Excel using a Python COM Server.  I will use the ShowMeDo Video about COM and Python with pyWin32, along with O’Reilly’s Python Programming on Win32.
  3. Learn how to embed these Python Scripts as OLEObjects in Excel, and how to call them using VBA.
  4. Learn more about how Python can interact with databases. Although this is not directly related to my first set of projects, as they can have all data stored in the Excel file, it would be nice to ultimately become less dependent upon Excel.

Written by Jacob Walker

May 14th, 2013 at 7:19 am

Integrating Python into Excel for Web Scraping, Advanced Functions, and More

with one comment

For some time I’ve been wanting to learn Python.  It seems to be one of the best rapid-development open source languages to do all types of scripting activities and integration between different languages.  It also is very commonly used for scraping and advanced data processing.  And I also have realized from my past, that I want to invest my time in learning things that have long-term usability, and proprietary software, in general, does not necessarily have this benefit.

But I still want to use Excel.  While there are open source data analysis tools that are out there, they either don’t have as many features as Excel (as unfortunately Open & Libre Office don’t), or I haven’t learned them (as some of the open source math packages and languages are).  The feature problem is especially the case with data presentation, Excel can make nice looking charts, diagrams, etc, while most of the others take much more work to do these things.  (Maybe some day I will be able to help with the Open Source projects, and make them better, but for right now, I don’t have the time.)

So, my next incremental step to moving towards using more open source software, is to start to do any advanced programming for Excel spreadsheets in Python instead of VBA.  This is especially relevant to me right now, as there are many spreadsheets that I have, that I need to do web scraping for, and while I have learned to do some of this in VBA, I want to invest my energies into learning Python, which has a lot more code pre-written to do this also.

I also want to be able to create Excel files that can be easily distributed.  And this is where the rub comes.  There are several methods that have been developed to integrate Python and Excel, including using a DLL file, 2 Add-ins that have been developed, and the traditional method of using a Python COM server.  Generally all of these require external files, and sometimes external installations.  But I think I can solve this problem by embedding Python Scripts as OLEObjects and call these from Excel using VBA.   These can then setup the COM Server, and do everything else necessary.

While I’m sure I will run into some walls along the way, I think the idea is sound, and as I develop this more, I will share the VBA code that can be used to do this technique.

Written by Jacob Walker

May 14th, 2013 at 6:51 am

Excel VBA Macro Code to Find Word-Level N-Grams in a Text Entry

with 2 comments

Excel has a lot of quantitative power, but it has always had issues with parsing text.  While there are the basic text functions (left, mid, right) and a useful substitution function, sometimes something that seems like it should be easy isn’t.  As a case in point, if you want to extract individual words from text, while it can be done with the Text to Columns feature using the space as a delimiter, it is not very easy to do with a formula, because you basically need to find where a certain space is within the text, which means counting where the previous spaces are, which requires a loop.  Which then requires VBA.  So, I did a little VBA coding tonight, and tried to make a function that not only can find single words, but also phrases, as n-grams.  I have now used this to do some qualitative analysis of text, where combining these functions with Pivot Tables, I have been able to have Excel be a decent basic qualitative analysis tool for text.

Here is the VBA file for the code, since when I tried to post the code inline with this message, my WordPress host thought I was trying to do some sort of injection attack or something.  Also you can find an example of how I used the code in the analysis I recently did of Literature and Medicine college courses.

Update: I have improved the efficiency of the code by using VBA’s split function instead of using a lot of extra loops.



Written by Jacob Walker

November 18th, 2012 at 9:31 pm

Obfuscation of PII in Released Databases via Randomization

without comments

For the past year, my school, Twin Rivers Adult School, has been using an Excel spreadsheet I created as the main database for administering Pell Grants.  While Excel was a good tool to get our system up and running right away, it is not a good long term solution, and a database solution needs to be implemented.  As I talked about in other posts, the commercial solutions are expensive, and do not really meet our needs as a clock-hour based school.  Thus, this summer, myself and Steve Jensen, our Office Technician instructor, are having a special course for our advanced students to work with us to develop a new database.

This I believe will be a good “win-win”.  The students will get real experience working with a real development project, and get some specialized knowledge in Federal Student Aid, which in my opinion is an untapped vertical market.  Further, they will be able to earn lower-division credits from our school, and the adult school will be able to partially reduce its development cost.  (Although, these types of projects are never “free labor” on the part of the students.  There will be an extra investment of time by me and Steve to help the students, and while we hope this time will be less than the equivalent time we would need to do the development ourselves, we are not assured of this.)

But, one of the critical components for us as a school is security.  While we will have our students take a pledge and sign an agreement to not share any personally identifiable information (PII) they should incidentally contact, I still did not want the original data with PII to get disbursed via copying, etc.   So I created a spreadsheet that helped me to randomly assign PII in place of the real information, such as having fake names, emails, etc.   This way I can distribute the real Pell grant spreadsheet to the students, so they can work with real data, and see all the real scenarios the database will need to create, but at the same time, they don’t receive any real personal information about students.

I have placed this randomizing names spreadsheet online, for other database administrators and researchers who wish to obfuscate PII to be able to use.   Currently the spreadsheet does a good job with first and last names, and I hope to improve the middle name algorithm in the future, and also to define the probability distribution function of the the weighted method I have of determining first names and last names.

Written by Jacob Walker

July 1st, 2012 at 3:39 pm

Lesson Study with Teaching Password Cracking Calculator

without comments

I had a much better lesson this week on teaching my students about how password security works, and how to calculate how secure a password is, using Microsoft Excel.  I would like to post more about this soon, but before the details in my memory fade more, this is the basic process I went through, that worked:

  1. Discussion first about how would they hack a password
  2. Go through lower case letter example first
  3. Give pre-filled out handout of Excel
  4. Use partly filled in saved copy to speed up data entry
  5. Have teams come up with a password in the end, and test security with calculator
  6. Possibly next time, we should have a contest within the groups

Written by Jacob Walker

January 15th, 2012 at 7:08 am