Parse YAML and push to Confluence in Python

I recently rewrote a system to output a YAML to get a bunch of information for internal users. However we use Confluence as our primary information sharing system. So I needed to parse the YAML file on GitHub (where I was pushing it after every generation), generate some HTML and then push this up to Confluence on a regular basis. This was surprisingly easy to do and so I wanted to share how I did it.

from atlassian import Confluence
from bs4 import BeautifulSoup
import yaml
import requests
import os

print(os.environ)
git_username = "github-username"
git_token = os.environ['GIT-TOKEN']
confluence_password = os.environ['CONFLUENCE-PASSWORD']
url = 'https://raw.githubusercontent.com/org/repo/file.yaml'
page_id=12345678
page_title='Title-Of-Confluence-Page'
path='/tmp/file.yaml'
original_html =  '''<table>
  <tr>
    <th>Column Header 1</th>
    <th>Column Header 2</th>
    <th>Column Header 3</th>
    <th>Column Header 4</th>
  </tr>
</table>'''

def get_file_from_github(url, username, password):
    response = requests.get(url, stream=True, auth=(username,password))
    print(response)
    with open(path, 'wb') as out_file:
        out_file.write(response.content)
        print('The file was saved successfully')

def update_confluence(path, page_id, page_title, original_html):
    with open(path, 'r') as yamlfile:
        current_yaml = yaml.safe_load(yamlfile)

    confluence = Confluence(
            url='https://your-hosted-confluence.atlassian.net',
            username='[email protected]',
            password=confluence_password,
            cloud=True)
    soup = BeautifulSoup(original_html, 'html5lib')
    table = soup.find('table')
    
    #This part is going to change based on what you are parsing but hopefully provides a template. 

    for x in current_yaml['top-level-yaml-field']:
        dump = '\n'.join(x['list-of-things-you-want'])
        pieces = x['desc'].split("-")

        table.append(BeautifulSoup(f'''
                                <tr>
                                  <td>{name}</td>
                                  <td>{x['role']}</td>
                                  <td>{x['assignment']}</td>
                                  <td style="white-space:pre-wrap; word-									wrap:break-word">{dump}</td>
                                </tr>''', 'html.parser'))
    
    body = str(soup)
    update = confluence.update_page(page_id, page_title, body, parent_id=None, type='page', representation='storage', minor_edit=False, full_width=True)
    
    print(update)

def main(request):
    if confluence_password is None:
        print("There was an issue accessing the secret.")
    get_file_from_github(url, git_username, git_token)
    update_confluence(path, page_id, page_title, original_html)
    return "Confluence is updated"

Some things to note:

  • obviously the YAML parsing depends on the file you are going to parse
  • The Confluence Page ID is most easily grabbed from the URL in Confluence when you make the page. You can get instructions on how to grab the Page ID here.
  • I recommend making the Confluence page first, grabbing the ID and then running it as an update.
  • I'm running logging through a different engine.
  • The github token should be a read-only token scoped to just the repo you need. Don't make a large token.

The deployment process on GCP couldn't have been easier.  Put your secrets in the GCP secret manager and then run:

gcloud functions deploy confluence_updater --entry-point main --runtime python310 --trigger-http --allow-unauthenticated --region=us-central1 --service-account serverless-function-service-account@gcp-project-name.iam.gserviceaccount.com --set-secrets 'GIT-TOKEN=confluence_git_token:1,CONFLUENCE-PASSWORD=confluence_password:1'
  • I have --allow-unauthenticated just for testing purposes. You'll want to put it behind auth
  • The set-secrets loads them an environmental variables.

There you go! You'll have a free function you can use forever to parse YAML or any other file format from GitHub and push to Confluence as HTML for non-technical users to consume.

The requirements.txt I used is below:

atlassian-python-api==3.34.0
beautifulsoup4==4.11.2
functions-framework==3.3.0
install==1.3.5
html5lib==1.1

Problems? Hit me up on Mastodon: https://c.im/@matdevdug