Using Google Sheets as a database to extract data converted to Markdown


~6 minutes to read

Table of contents


Introduction

The purpose of this tool is to parse and convert the content of a Google Sheets spreadsheet to Markdown. In this specific case, I just wanted a simple script to reduce the friction in getting a valid output I could use to maintain a “learning log” on my website. So that gives me a quick way to input data in a spreadsheet and then I can take the time whenever needed to update the learning progress as desired by reformatting any of the entries and optionally adding notes to them.


General software requirements

This tool should:

  • Be able to read a private spreadsheet with a service account configured with the Google Sheets API.
  • Convert rows from a Google Sheets spreadsheet into Markdown (see Input and Output sections below for details).
  • Skip rendering rows that are missing a value for the Title or Date columns (it couldn’t be rendered in the correct place properly without this information).
  • Other columns are optional and the corresponding level of nesting will be skipped if no value was provided (e.g. a category Articles may have an item with no sub-category and could be missing any of the other values except for Title and Date).
  • Separate entries by month by parsing the Date column.
  • A command-line parameter should exist to specify a year from which the date should be extracted.
  • The output should be in reverse chronological order (i.e. for a full year with entries for each month, December will appear at the top and January at the bottom).
  • Output a hierarchical format like the following: Category > Sub-category > Title > Activity & notes
  • Group entries by category and sub-category (e.g. if a category is named Articles and there is a sub-category named Python, then Python must be nested under Articles for the corresponding month).
  • Sort on the Date column before doing any parsing on other columns (rows in the spreadsheet can be in non-chronological order).
  • Dates from a single year will be kept (either the year received as a parameter or the most recent year found).
  • Sort Category, Sub-category and Title alphabetically (ascending order from top to bottom).
  • Not sort the Activity column to preserve the order in which rows were added to the spreadsheet.
  • Render links only if the Link column is a valid URL.
  • Not validate the content of columns except for Date and Link (i.e. render the other columns as is).
  • Display items with no category above those having a category. The same logic would be true for sub-categories.
  • Format the output:
  • Date should become headers (##).
  • Category should be bolded.
  • Sub-category should be emphasized.
  • Title should become a link if the Link column contains a valid URL.
  • Activity should be nested under Title, occupying a new line for each activity.
  • Notes should appear (italicized and inside parentheses) next to the activity.
    • If there is a value for Notes but no value for Activity, the output would be next to the Title.

Input

A spreadsheet with the following columns (starting with Date):

[additional info] Date Category Sub-category Title Activity Link Notes
November 13/11/2021 Articles Python Python slots, slots, and object layout https://www.youtube.com/watch?v=Iwf17zsDAnY understood that X does Y
November 14/11/2021 Books Software engineering The Pragmatic Programmer ch. 1 https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/ note 1 here
November 15/11/2021 Books Software engineering The Pragmatic Programmer ch. 2 note 2 here too
November 16/11/2021 Books Software engineering Clean Code
December 1/12/2021 Documentation Python The Python Tutorial Sections 1-4 https://docs.python.org/3/tutorial/index.html
December 2/12/2021 Books Software engineering Clean Code ch. 2-10
Render available values 2/12/2021 Books Clean Coder book note
Render available values 2/12/2021 Python/C API Reference Manual https://docs.python.org/3/c-api/index.html
Title missing: skip rendering 3/12/2021 Articles
Title missing: skip rendering 3/12/2021 Articles Software engineering ch. 3 https://docs.python.org/3/c-api/index.html note
Date missing: skip rendering Articles Python Python slots, slots, and object layout Section 2 https://www.youtube.com/watch?v=Iwf17zsDAnY note not rendered

Output

A Markdown output to the terminal in the following format:

## December

- [Python/C API Reference Manual](https://docs.python.org/3/c-api/index.html)
- **Books**
  - Clean Coder (_book note_)
  - _Software engineering_
    - Clean Code
      - ch. 2-10
- **Documentation**
  - _Python_
    - [The Python Tutorial](https://docs.python.org/3/tutorial/index.html)
      - Sections 1-4

## November

- **Articles**
  - _Python_
    - [Python slots, slots, and object layout](https://www.youtube.com/watch?v=Iwf17zsDAnY) (_understood that X does Y_)
- **Books**
  - _Software engineering_
    - Clean Code
    - [The Pragmatic Programmer](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/)
      - ch. 1 (_note 1 here_)
      - ch. 2 (_note 2 here too_)
input_output_example
Input/Output example when using this tool.

Installing this tool

Tested only under Python 3.9.7, requires at least Python 3.7+.

Using pip

  • [recommended] Activate a virtual environment.
  • To install dependencies, run from this directory: pip install -r requirements.txt
  • Then execute the script to run the tool: python3 get_learning_logs

Using pipenv

  • Run from this directory: pipenv install.
  • Activate the newly created virtual environment with pipenv shell.
  • Then execute the script to run the tool: python3 get_learning_logs

How to use this tool

You will need to create an environment file (default path: ~/.learning-logs or edit the path for LEARNING_LOGS_ENV_PATH in get_learning_logs.py). It should look as follows:

SERVICE_ACCOUNT_FILEPATH="/path/to/service_account_learning_logs.json"
SPREADSHEET_ID=SPREADSHEET_ID  # Found in the URL of the document
WORKSHEET_ID=0  # First sheet is 0 by default, it comes after the URL parameter `gid`

To set up this project, you will have to:

  • Enable the Google Sheets API for your project (https://support.google.com/googleapi/answer/6158841?hl=en).
  • In the project, search for “service account” in the search bar and create new credentials.
  • Download the credentials and put them at the path SERVICE_ACCOUNT_FILEPATH referenced in the environment file.
  • Create a spreadsheet with the following columns (not necessarily in that order): Date, Category, Sub-category, Title, Activity, Link, Notes.
  • The column Date Should match a date format: select all cells from that column, go to Format > Number > Date to apply the expected format (mm/dd/yyyy). You can then input a date (e.g. 11/1/2021) and you will see a calendar pop up when double-clicking on it. Dragging it down to a new cell will create a new date for the next day (in this example, 11/2/2021).
  • Get its SPREADSHEET_ID (https://docs.google.com/spreadsheets/d/SPREADSHEET_ID_HERE/edit#gid=0) to put it in the environment file.
  • Get the WORKSHEET_ID (the integer after gid= in the URL of the document).
  • From the spreadsheet document (or from the folder view in Google Drive), share with the email associated with the service account (e.g. learning-logs-something@project-name-12345.iam.gserviceaccount.com). Read access (“Viewer” permission) is enough.

By now, the script is ready to be called. I just created an alias to execute it more easily by putting the following in ~/.bash_aliases (you will have to adapt the paths of course):

alias learning-logs='~/.local/share/virtualenvs/learning-logs-to-markdown-XJLvhmzn/bin/python3.9 \
  ~/dev/sglavoie/dev-helpers/learning-logs-to-markdown/get_learning_logs.py'

This will output the converted data from the spreadsheet to Markdown as shown above.


Conclusion

This little tool scratched an itch and will be helpful in making the process of updating the learning progress of this website more straightforward, more convenient and less hands-on! You can find its source code on GitHub.