Parse YAML and push to Confluence in Python

I recently rewrote a system to output a YAML to get a bunch of information for internal users. However we use Confluence as our primary information sharing system. So I needed to parse the YAML file on GitHub (where I was pushing it after every generation), generate some HTML and then push this up to Confluence on a regular basis. This was surprisingly easy to do and so I wanted to share how I did it.

from atlassian import Confluence
from bs4 import BeautifulSoup
import yaml
import requests
import os

print(os.environ)
git_username = "github-username"
git_token = os.environ['GIT-TOKEN']
confluence_password = os.environ['CONFLUENCE-PASSWORD']
url = 'https://raw.githubusercontent.com/org/repo/file.yaml'
page_id=12345678
page_title='Title-Of-Confluence-Page'
path='/tmp/file.yaml'
original_html =  '''<table>
  <tr>
    <th>Column Header 1</th>
    <th>Column Header 2</th>
    <th>Column Header 3</th>
    <th>Column Header 4</th>
  </tr>
</table>'''

def get_file_from_github(url, username, password):
    response = requests.get(url, stream=True, auth=(username,password))
    print(response)
    with open(path, 'wb') as out_file:
        out_file.write(response.content)
        print('The file was saved successfully')

def update_confluence(path, page_id, page_title, original_html):
    with open(path, 'r') as yamlfile:
        current_yaml = yaml.safe_load(yamlfile)

    confluence = Confluence(
            url='https://your-hosted-confluence.atlassian.net',
            username='[email protected]',
            password=confluence_password,
            cloud=True)
    soup = BeautifulSoup(original_html, 'html5lib')
    table = soup.find('table')
    
    #This part is going to change based on what you are parsing but hopefully provides a template. 

    for x in current_yaml['top-level-yaml-field']:
        dump = '\n'.join(x['list-of-things-you-want'])
        pieces = x['desc'].split("-")

        table.append(BeautifulSoup(f'''
                                <tr>
                                  <td>{name}</td>
                                  <td>{x['role']}</td>
                                  <td>{x['assignment']}</td>
                                  <td style="white-space:pre-wrap; word-									wrap:break-word">{dump}</td>
                                </tr>''', 'html.parser'))
    
    body = str(soup)
    update = confluence.update_page(page_id, page_title, body, parent_id=None, type='page', representation='storage', minor_edit=False, full_width=True)
    
    print(update)

def main(request):
    if confluence_password is None:
        print("There was an issue accessing the secret.")
    get_file_from_github(url, git_username, git_token)
    update_confluence(path, page_id, page_title, original_html)
    return "Confluence is updated"

Some things to note:

obviously the YAML parsing depends on the file you are going to parse
The Confluence Page ID is most easily grabbed from the URL in Confluence when you make the page. You can get instructions on how to grab the Page ID here.
I recommend making the Confluence page first, grabbing the ID and then running it as an update.
I'm running logging through a different engine.
The github token should be a read-only token scoped to just the repo you need. Don't make a large token.

The deployment process on GCP couldn't have been easier. Put your secrets in the GCP secret manager and then run:

gcloud functions deploy confluence_updater --entry-point main --runtime python310 --trigger-http --allow-unauthenticated --region=us-central1 --service-account serverless-function-service-account@gcp-project-name.iam.gserviceaccount.com --set-secrets 'GIT-TOKEN=confluence_git_token:1,CONFLUENCE-PASSWORD=confluence_password:1'

I have --allow-unauthenticated just for testing purposes. You'll want to put it behind auth
The set-secrets loads them an environmental variables.

There you go! You'll have a free function you can use forever to parse YAML or any other file format from GitHub and push to Confluence as HTML for non-technical users to consume.

The requirements.txt I used is below:

atlassian-python-api==3.34.0
beautifulsoup4==4.11.2
functions-framework==3.3.0
install==1.3.5
html5lib==1.1

Problems? Hit me up on Mastodon: https://c.im/@matdevdug