In the ever-evolving landscape of content creation, efficiency is key. Recently at the OpenAI Applications Explorers Meetup, Mike Pehel, a marketing consultant specializing in the drone industry and a contractor for the Linux Foundation, introduced an innovative approach to streamline the process of transforming YouTube videos into well-structured, informative blog posts. This method leverages the power of artificial intelligence, specifically OpenAI’s GPT-4, to automate what was once a labor-intensive task, and fun fact we use this tool to create the blog posts here on riis.com.
As always, you can read along here or follow along with the video version of the meetup.
The Old Way vs. The New Way
Traditionally, creating a blog post from a YouTube video involved a time-consuming process. To take a video from blog to article, content creators would produce the video, transcribe the content, review the GitHub repository, and re-watch the video at increased speed to capture all the necessary information. This method was not only inefficient but also redundant, as all the required content already existed in various forms.
The new approach simplifies this process dramatically. By utilizing a combination of APIs and AI tools, content creators can now generate a comprehensive blog post with minimal manual input. This method involves:
Inputting basic information into a form
Sending the data to various APIs
Using AI to format and generate a well-structured output
The result is a foundational draft that can be quickly refined into a polished blog post.
Overcoming AI Hallucinations
One of the primary challenges in using AI for content generation is the occurrence of hallucinations - instances where the AI produces confident but incorrect or nonsensical information. Here’s a helpful metaphor:
Imagine compressing every word on the internet into a 2D grid. When the AI needs to predict the next word in a phrase, it’s essentially finding the closest vector on this grid. This simplification helps explain why AI might sometimes produce unexpected or incorrect responses.
To mitigate hallucinations, the new approach involves:
Using highly detailed prompts
Creating multiple guideposts from our original content within the prompt to constrain the AI’s responses
The Technical Implementation
The core of this system is built using Flask, a lightweight web application framework in Python. Here’s an overview of our directory structure:
Feel free to make these files ahead of time and start filling them in as we go through the tutorial.
First thing to do is fill up your requirements.txt files with these:
anthropic==0.34.1
Flask==1.1.2
GitPython==3.1.43
langchain==0.2.14
markdown2==2.5.0
openai==1.42.0
PyGithub==2.3.0
youtube_transcript_api==0.6.2
Then issue the command pip install -r requirements.txt
.
Flask Application Structure
The application follows a typical Flask structure:
--- run.py ---
from app import create_app
app = create_app()
if __name__ == '__main__':
app.run(debug=True)
--- __init__.py ---
from flask import Flask
from config import Config
def create_app():
app = Flask(__name__)
app.config.from_object(Config)
from app import routes
app.register_blueprint(routes.main)
return app
This setup creates a modular application, making it easier to manage different components and scale the project as needed.
Handling User Input
Flask uses what it calls templates for its pages. This structure allows us to pass information from our API calls into a fixed-format HTML file.
Our first file is index.html. The application uses a simple HTML form to collect user input:
--- index.html ---
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Article Generator</title>
</head>
<body>
<h1>Article Generator</h1>
<form action="/" method="post" enctype="multipart/form-data">
<div id="speakersContainer">
<div class="speaker">
<h3>Speaker</h3>
<label for="speaker_name">Speaker Name:</label>
<input type="text" id="speaker_name" name="speaker_name" required><br><br>
<label for="speaker_bio">Speaker Bio:</label>
<textarea id="speaker_bio" name="speaker_bio"></textarea><br><br>
</div>
</div>
<label for="video_title">Video Title:</label>
<input type="text" id="video_title" name="video_title" required><br><br>
<label for="video_description">Video Description:</label>
<textarea id="video_description" name="video_description"></textarea><br><br>
<label for="youtube_url">YouTube URL:</label>
<input type="text" id="youtube_url" name="youtube_url"><br><br>
<label for="github_url">GitHub Repository URL:</label>
<input type="url" id="github_url" name="github_url"><br><br>
<input type="submit" value="Generate Article">
</form>
</body>
</html>
This form collects essential information such as speaker details, video title and description, YouTube URL, and GitHub repository URL.
We also need to make our Article.html file for out generated article.
--- article.html ---
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Generated Article</title>
<style>
body {
font-family: Arial, sans-serif;
line-height: 1.6;
padding: 20px;
max-width: 800px;
margin: 0 auto;
}
h1 {
color: #333;
}
.article-section {
margin-bottom: 30px;
padding: 20px;
background-color: #f9f9f9;
border-radius: 5px;
}
</style>
</head>
<body>
<h1>Generated Article</h1>
<div class="article-section">
{{ article | safe }}
</div>
</body>
</html>
Processing YouTube Transcripts
A crucial part of the application is extracting and processing the YouTube video transcript. This is achieved using the youtube_transcript_api
library:
--- youtube_retriever.py ---
from youtube_transcript_api import YouTubeTranscriptApi
from urllib.parse import urlparse, parse_qs
import re
def get_youtube_id(url):
# Patterns for different types of YouTube URLs patterns = [
r'^https?:\/\/(?:www\.)?youtube\.com\/watch\?v=([^&]+)',
r'^https?:\/\/(?:www\.)?youtube\.com\/embed\/([^?]+)',
r'^https?:\/\/(?:www\.)?youtube\.com\/v\/([^?]+)',
r'^https?:\/\/youtu\.be\/([^?]+)',
r'^https?:\/\/(?:www\.)?youtube\.com\/shorts\/([^?]+)',
r'^https?:\/\/(?:www\.)?youtube\.com\/live\/([^?]+)' ]
# Try to match the URL against each pattern
for pattern in patterns:
match = re.match(pattern, url)
if match:
return match.group(1)
# If no pattern matches, try parsing the URL
parsed_url = urlparse(url)
if parsed_url.netloc in ('youtube.com', 'www.youtube.com'):
query = parse_qs(parsed_url.query)
if 'v' in query:
return query['v'][0]
# If we still haven't found an ID, raise an exception
raise ValueError("Could not extract YouTube video ID from URL")
def get_youtube_transcript(url):
try:
# Extract video ID from the URL
video_id = get_youtube_id(url)
# Get the transcript
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# Combine all text parts
full_transcript = " ".join([entry['text'] for entry in transcript])
return full_transcript
except ValueError as e:
raise ValueError(f"Invalid YouTube URL: {str(e)}")
except Exception as e:
raise Exception(f"Error fetching YouTube transcript: {str(e)}")
This code handles various YouTube URL formats and extracts the video transcript, providing a solid foundation for the content generation process.
Analyzing GitHub Repositories
To incorporate code examples and additional context, the application analyzes the provided GitHub repository:
--- github_analyzer.py ---
import os
import tempfile
from git import Repo
from urllib.parse import urlparse
import base64
from github import Github
def analyze_github_repo(repo_url):
# Extract owner and repo name from URL
parsed_url = urlparse(repo_url)
path_parts = parsed_url.path.strip('/').split('/')
owner, repo_name = path_parts[0], path_parts[1]
# Initialize GitHub API client using the token from config.py
g = Github(os.environ.get('GITHUB_TOKEN'))
repo = g.get_repo(f"{owner}/{repo_name}")
all_code = ""
# Try to clone the repository first
try:
with tempfile.TemporaryDirectory() as temp_dir:
Repo.clone_from(repo_url, temp_dir)
for root, _, files in os.walk(temp_dir):
for file in files:
if file.endswith(('.py', '.js', '.html', '.css', '.java', '.cpp', '.toml', '.xml', '.json', '.jsonl')):
file_path = os.path.join(root, file)
with open(file_path, 'r', encoding='utf-8') as f:
all_code += f"\n\n--- {file} ---\n{f.read()}" except Exception as e:
print(f"Cloning failed: {str(e)}. Falling back to API method.")
# If cloning fails, fall back to using the GitHub API
def get_contents(path=''):
nonlocal all_code
contents = repo.get_contents(path)
for content in contents:
if content.type == 'dir':
get_contents(content.path)
elif content.name.endswith(('.py', '.js', '.html', '.css', '.java', '.cpp', '.toml', '.xml', '.json', '.jsonl')):
file_content = base64.b64decode(content.content).decode('utf-8')
all_code += f"\n\n--- {content.path} ---\n{file_content}" get_contents()
# Fetch README content
try:
readme = repo.get_readme()
readme_content = base64.b64decode(readme.content).decode('utf-8')
except:
readme_content = "README not found"
return all_code, readme_content
This function attempts to clone the repository locally or falls back to using the GitHub API if cloning fails. It extracts relevant code files and the README content, providing valuable context for the article generation process.
By combining these components with AI-powered content generation, the system offers a powerful solution for efficiently transforming YouTube videos into comprehensive blog posts. This approach not only saves time but also ensures that the generated content accurately reflects the original video material while incorporating relevant code examples and additional context from associated GitHub repositories.
Leveraging LangChain for Topic Extraction
LangChain, a powerful framework for developing language model-powered applications, is used to derive topics from the transcript. This helps in structuring the generated article:
import os
from openai import OpenAI
from anthropic import Anthropic
from langchain.text_splitter import RecursiveCharacterTextSplitter
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def parse_transcript(file_path):
with open(file_path, 'r') as file:
return file.read()
def derive_topics_from_transcript(transcript):
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=3000,
chunk_overlap=100,
length_function=len
)
chunks = text_splitter.split_text(transcript)
topics = [
for chunk in chunks:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates concise and relevant topic titles."},
{"role": "user", "content": f"Given the following chunk of text from a transcript, generate a concise and relevant topic title:\n\nChunk:\n{chunk}\n\nTopic Title:"}
],
max_tokens=100
)
print(response.choices[0].message.content.strip())
topics.append(response.choices[0].message.content.strip())
return topics
This function splits the transcript into manageable chunks and uses OpenAI’s GPT model to generate relevant topic titles for each chunk.
Connecting the Frontend and Backend
Next, let’s create the main route in routes.py
:
from flask import Blueprint, render_template, request, jsonify
from app.utils.file_parser import derive_topics_from_transcript
from app.utils.github_analyzer import analyze_github_repo
from app.utils.article_generator import generate_article
from app.utils.youtube_retriever import get_youtube_transcript
import markdown2
main = Blueprint('main', __name__)
@main.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
speaker_name = request.form['speaker_name']
speaker_bio = request.form['speaker_bio']
video_title = request.form['video_title']
video_description = request.form['video_description']
github_url = request.form.get('github_url', '')
youtube_url = request.form.get('youtube_url', '')
transcript_text = ""
if youtube_url:
try:
transcript_text = get_youtube_transcript(youtube_url)
except Exception as e:
return jsonify({'error': str(e)}), 400
if not transcript_text:
return jsonify({'error': 'No transcript provided or file not found'}), 400
topics, topic_summaries = [], []
topics = derive_topics_from_transcript(transcript_text)
topic_summaries = [""] * len(topics)
github_code = ""
readme_content = ""
repo_size_mb = 0
if github_url:
github_code, readme_content = analyze_github_repo(github_url)
speaker_info = speaker_name + "\n " + speaker_bio
article = generate_article(
transcript_text, topics, topic_summaries,
github_code, readme_content,
speaker_info,
video_title, video_description,
)
article_html = markdown2.markdown(article)
return render_template('article.html', article=article_html)
return render_template('index.html')
This route handles both GET and POST requests. When a POST request is received, it processes the form data, retrieves the YouTube transcript, derives topics and topic summaries from it, analyzes the GitHub repository (if provided), and then calls our genrate_article() which is up next.
Implementing the Article Generation Process
With the groundwork laid for handling user input we can now focus on the core functionality of generating the article. This process involves several key steps:
Combining data from various sources
Using AI to generate the article content
Performing final checks and refinements
Combining Data Sources
The generate_article
function in article_generator.py
serves as the central point for combining all the gathered information:
import os
from openai import OpenAI
from anthropic import Anthropic
from langchain.text_splitter import RecursiveCharacterTextSplitter
def generate_article(transcript, topics, topic_summaries, combined_code, readme_content, speaker_info, video_title, video_description):
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
system_message = """ You are a highly skilled technical writer with experience in the PX4 ecosystem including MAVSDK, MAVLink, uORB, QGroundControl ROS, ROS 2, Gazebo, and the Pixhawk open hardware standards. Your task is to write a well-structured, engaging, and informative article or tutorial. """
prompt = f"""
Write an 800-word article based on the following information:
Speaker Information: {speaker_info}
Session Information:
Title: {video_title}
Description: {video_description}
Transcript: {transcript}
Topics: {', '.join(topics)}
Topic Summaries: {', '.join(topic_summaries)}
README Content: {readme_content}
Relevant Code: {combined_code if combined_code else "No relevant code found."}
Instructions:
1. Include an introduction and a conclusion.
2. Use the topics and topic summaries as a framework for the article's content.
3. Include relevant code snippets from the provided code, explaining each snippet's purpose and functionality.
4. Avoid code blocks longer than 14 lines. Break them into smaller, logical sections when necessary.
5. Format the output in markdown.
6. Aim for a well-structured, engaging, and informative article of approximately 800 words.
IMPORTANT: Avoid using too many bulleted lists. Consolidate some lists into descriptive paragraphs if possible.
Use ONLY the Relevant Code provided in the prompt. Do not reference or use any code from your training data or external sources.
When code is relevant, introduce the concepts behind the code, then present the code, and finally describe how it works. """
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": prompt}
],
max_tokens=4000
)
article = response.choices[0].message.content
return article
This function takes all the collected data and constructs a detailed prompt for the AI model. The prompt includes specific instructions on how to structure the article, incorporate code snippets, and maintain a balance between technical depth and readability.
Notice in the system_message we are telling the LLM what it should focus on. You can swap out the subject matters listed here with those relevant to the target industry you are writing a tutorial for.
Take special note of the redundancies within the prompt. LLMs need to prompted multiple times with similar, but duplicate information to fully hit the target.
The last thing to pay attention to is the request to have the output in markdown. This gives us formatting without the token costs of outputting in the much more verbose HTML.
Generating the Article Content
The article generation process leverages OpenAI’s GPT-4 model to create the initial draft. The system message sets the context for the AI, positioning it as a technical writer with expertise in the PX4 ecosystem. This approach helps ensure that the generated content is both technically accurate and well-structured. In our routes.py
Final Checks and Refinements
After the initial article generation, an additional check is performed to enhance the quality and accuracy of the content. The check_code
function in article_checker.py
verifies that the code snippets in the article match the provided GitHub repository code:
def check_code(article, combined_code):
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
prompt = f"""
Audit this Article for the correct and accurate use of the code within it. Use the Combined Code as your primary code reference. If code in the Article looks similar or calls similar functions but is slightly different than the Combined Code, replace it in the final output with the relevant Combined Code snippet supplied in part or in whole, whichever fits best for the task.
If the article's content needs to be slightly modified to fit the code you are replacing it with from Combined Code, do so and only add content that is necessary.
Article: {article}
Combined Code: {combined_code if combined_code else "No relevant code found."}
Return the edited article as your final output. """
system_message = """ You are a highly skilled technical writer with experience in the PX4 ecosystem including MAVSDK, MAVLink, uORB, QGroundControl ROS, ROS 2, Gazebo, and the Pixhawk open hardware standards. You are editing the content in the article for accuracy and correctness. IMPORTANT: Your focus is auditing the code and replacing it if necessary. Audit the code based on the Combined Code supplied. If no code is supplied, do nothing. Return the whole edited article. Do not add any llm assistant language such as and inclduing "no other edits were made", "Here's the edited article..", "Here's the edited article with the code snippets updated based on the Combined Code provided:" or the "The rest of the article remains unchanged". Only return the edited or non-edited article copy and nothing else. """
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": prompt}
],
max_tokens=4000 )
return response.choices[0].message.content
Conclusion
By leveraging various AI technologies and APIs, it automates the process of converting YouTube videos into well-structured, informative blog posts.
The system’s key strengths lie in its ability to:
Extract and process information from multiple sources (YouTube transcripts, GitHub repositories, user input).
Use AI to derive topics and generate coherent, context-aware content.
Implement rigorous checks to ensure code accuracy and proper use of industry terms.
Present the generated content in a clean, readable format.
While the current implementation focuses on technical content related to the PX4 ecosystem, the underlying principles and architecture can be adapted to various domains and content types. Future enhancements could include support for longer articles, integration with slide decks, and the incorporation of images to further enrich the generated content.
As AI technologies continue to evolve, systems like this will play an increasingly important role in content creation and management, enabling content creators to maximize the value of their work across multiple platforms efficiently and effectively.
If you had fun with this tutorial be sure to join the OpenAI Application Explorers Meetup Group to learn more about awesome apps you can build with AI