Parse YAML and push to Confluence in Python

I recently rewrote a system to output a YAML to get a bunch of information for internal users. However we use Confluence as our primary information sharing system. So I needed to parse the YAML file on GitHub (where I was pushing it after every generation), generate some HTML and then push this up to Confluence on a regular basis. This was surprisingly easy to do and so I wanted to share how I did it.

from atlassian import Confluence
from bs4 import BeautifulSoup
import yaml
import requests
import os

git_username = "github-username"
git_token = os.environ['GIT-TOKEN']
confluence_password = os.environ['CONFLUENCE-PASSWORD']
url = ''
original_html =  '''<table>
    <th>Column Header 1</th>
    <th>Column Header 2</th>
    <th>Column Header 3</th>
    <th>Column Header 4</th>

def get_file_from_github(url, username, password):
    response = requests.get(url, stream=True, auth=(username,password))
    with open(path, 'wb') as out_file:
        print('The file was saved successfully')

def update_confluence(path, page_id, page_title, original_html):
    with open(path, 'r') as yamlfile:
        current_yaml = yaml.safe_load(yamlfile)

    confluence = Confluence(
            username='[email protected]',
    soup = BeautifulSoup(original_html, 'html5lib')
    table = soup.find('table')
    #This part is going to change based on what you are parsing but hopefully provides a template. 

    for x in current_yaml['top-level-yaml-field']:
        dump = '\n'.join(x['list-of-things-you-want'])
        pieces = x['desc'].split("-")

                                  <td style="white-space:pre-wrap; word-									wrap:break-word">{dump}</td>
                                </tr>''', 'html.parser'))
    body = str(soup)
    update = confluence.update_page(page_id, page_title, body, parent_id=None, type='page', representation='storage', minor_edit=False, full_width=True)

def main(request):
    if confluence_password is None:
        print("There was an issue accessing the secret.")
    get_file_from_github(url, git_username, git_token)
    update_confluence(path, page_id, page_title, original_html)
    return "Confluence is updated"

Some things to note:

  • obviously the YAML parsing depends on the file you are going to parse
  • The Confluence Page ID is most easily grabbed from the URL in Confluence when you make the page. You can get instructions on how to grab the Page ID here.
  • I recommend making the Confluence page first, grabbing the ID and then running it as an update.
  • I'm running logging through a different engine.
  • The github token should be a read-only token scoped to just the repo you need. Don't make a large token.

The deployment process on GCP couldn't have been easier.  Put your secrets in the GCP secret manager and then run:

gcloud functions deploy confluence_updater --entry-point main --runtime python310 --trigger-http --allow-unauthenticated --region=us-central1 --service-account --set-secrets 'GIT-TOKEN=confluence_git_token:1,CONFLUENCE-PASSWORD=confluence_password:1'
  • I have --allow-unauthenticated just for testing purposes. You'll want to put it behind auth
  • The set-secrets loads them an environmental variables.

There you go! You'll have a free function you can use forever to parse YAML or any other file format from GitHub and push to Confluence as HTML for non-technical users to consume.

The requirements.txt I used is below:


Problems? Hit me up on Mastodon: