I recently rewrote a system to output a YAML to get a bunch of information for internal users. However we use Confluence as our primary information sharing system. So I needed to parse the YAML file on GitHub (where I was pushing it after every generation), generate some HTML and then push this up to Confluence on a regular basis. This was surprisingly easy to do and so I wanted to share how I did it.
from atlassian import Confluence
from bs4 import BeautifulSoup
import yaml
import requests
import os
print(os.environ)
git_username = "github-username"
git_token = os.environ['GIT-TOKEN']
confluence_password = os.environ['CONFLUENCE-PASSWORD']
url = 'https://raw.githubusercontent.com/org/repo/file.yaml'
page_id=12345678
page_title='Title-Of-Confluence-Page'
path='/tmp/file.yaml'
original_html = '''<table>
<tr>
<th>Column Header 1</th>
<th>Column Header 2</th>
<th>Column Header 3</th>
<th>Column Header 4</th>
</tr>
</table>'''
def get_file_from_github(url, username, password):
response = requests.get(url, stream=True, auth=(username,password))
print(response)
with open(path, 'wb') as out_file:
out_file.write(response.content)
print('The file was saved successfully')
def update_confluence(path, page_id, page_title, original_html):
with open(path, 'r') as yamlfile:
current_yaml = yaml.safe_load(yamlfile)
confluence = Confluence(
url='https://your-hosted-confluence.atlassian.net',
username='[email protected]',
password=confluence_password,
cloud=True)
soup = BeautifulSoup(original_html, 'html5lib')
table = soup.find('table')
#This part is going to change based on what you are parsing but hopefully provides a template.
for x in current_yaml['top-level-yaml-field']:
dump = '\n'.join(x['list-of-things-you-want'])
pieces = x['desc'].split("-")
table.append(BeautifulSoup(f'''
<tr>
<td>{name}</td>
<td>{x['role']}</td>
<td>{x['assignment']}</td>
<td style="white-space:pre-wrap; word- wrap:break-word">{dump}</td>
</tr>''', 'html.parser'))
body = str(soup)
update = confluence.update_page(page_id, page_title, body, parent_id=None, type='page', representation='storage', minor_edit=False, full_width=True)
print(update)
def main(request):
if confluence_password is None:
print("There was an issue accessing the secret.")
get_file_from_github(url, git_username, git_token)
update_confluence(path, page_id, page_title, original_html)
return "Confluence is updated"
Some things to note:
- obviously the YAML parsing depends on the file you are going to parse
- The Confluence Page ID is most easily grabbed from the URL in Confluence when you make the page. You can get instructions on how to grab the Page ID here.
- I recommend making the Confluence page first, grabbing the ID and then running it as an update.
- I'm running logging through a different engine.
- The github token should be a read-only token scoped to just the repo you need. Don't make a large token.
The deployment process on GCP couldn't have been easier. Put your secrets in the GCP secret manager and then run:
gcloud functions deploy confluence_updater --entry-point main --runtime python310 --trigger-http --allow-unauthenticated --region=us-central1 --service-account serverless-function-service-account@gcp-project-name.iam.gserviceaccount.com --set-secrets 'GIT-TOKEN=confluence_git_token:1,CONFLUENCE-PASSWORD=confluence_password:1'
- I have --allow-unauthenticated just for testing purposes. You'll want to put it behind auth
- The set-secrets loads them an environmental variables.
There you go! You'll have a free function you can use forever to parse YAML or any other file format from GitHub and push to Confluence as HTML for non-technical users to consume.
The requirements.txt I used is below:
atlassian-python-api==3.34.0
beautifulsoup4==4.11.2
functions-framework==3.3.0
install==1.3.5
html5lib==1.1
Problems? Hit me up on Mastodon: https://c.im/@matdevdug