GPT-4o API 사용법 총정리: 간단한 대화, 이미지 처리부터 영상 요약까지

1. GPT-4o란?

GPT-4o는 텍스트, 오디오, 비디오 입력을 통합적으로 처리하고, 텍스트, 오디오, 이미지 형식으로 출력을 생성할 수 있는 강력한 AI 모델입니다. 기존 모델보다 더 빠르고 효율적으로 다양한 작업을 수행할 수 있어, AI 활용의 새로운 패러다임을 제시합니다.

이 글에서는 GPT-4o API를 활용하여 대화, 이미지 처리, 영상 요약 등 다양한 작업을 수행하는 방법을 단계별로 정리하겠습니다.

2. GPT-4o API 시작하기

2.1 OpenAI SDK 설치

GPT-4o API를 사용하기 위해서는 먼저 OpenAI SDK를 설치해야 합니다.

pip install --upgrade openai

2.2 API 키 설정

API 키를 발급받은 후 환경 변수로 설정합니다.

import os

os.environ["OPENAI_API_KEY"] = "<your OpenAI API key>"

3. GPT-4o 활용 예제

3.1 간단한 대화 및 수학 문제 풀이

GPT-4o API를 사용하여 간단한 대화를 수행하고, 수학 문제를 해결하는 예제입니다.

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "You are a helpful math tutor. Help me with my math homework!"},
    {"role": "user", "content": "Hello! Could you solve 2+2?"}  
  ]
)

print("Assistant: " + completion.choices[0].message.content)

출력 결과:

Assistant: Of course!

2 + 2 = 4

If you have any other questions, feel free to ask!

4. 이미지 처리

GPT-4o는 이미지를 직접 처리할 수 있으며, Base64로 인코딩된 이미지나 URL을 통해 입력을 받을 수 있습니다.

4.1 Base64 인코딩 이미지 처리

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("triangle.png")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown."},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the triangle?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
        ]}
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)

4.2 URL을 통한 이미지 처리

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown."},
        {"role": "user", "content": [
            {"type": "text", "text": "How does the cat look like?"},
            {"type": "image_url", "image_url": {"url": "https://www.example.com/cat.jpg"}}
        ]}
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)

5. 영상 처리

현재 GPT-4o는 오디오 입력을 직접 지원하지 않으므로, 영상 처리를 위해 프레임을 샘플링하여 이미지로 변환하고, 오디오는 Whisper를 사용하여 처리해야 합니다.

5.1 프레임 추출 및 오디오 분리

import cv2
from moviepy.editor import VideoFileClip

def process_video(video_path, seconds_per_frame=2):  
    base64Frames = []
    video = cv2.VideoCapture(video_path)
    fps = video.get(cv2.CAP_PROP_FPS)   
    frames_to_skip = int(fps * seconds_per_frame)
    curr_frame = 0

    while video.isOpened():
        video.set(cv2.CAP_PROP_POS_FRAMES, curr_frame)
        success, frame = video.read()
        if not success:
            break  
        _, buffer = cv2.imencode(".jpg", frame)
        base64Frames.append(base64.b64encode(buffer).decode("utf-8"))
        curr_frame += frames_to_skip
    video.release()
    
    audio_path = f"{video_path}.mp3"
    clip = VideoFileClip(video_path)
    clip.audio.write_audiofile(audio_path, bitrate="32k")
    return base64Frames, audio_path

5.2 영상 요약 (비주얼 + 오디오)

transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open(audio_path, "rb"),  
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Generate a video summary using both frames and transcript."},
        {"role": "user", "content": [
            "These are the frames from the video.",
            *[{"type": "image_url", "image_url": {"url": f'data:image/jpg;base64,{x}', "detail": "low"}} for x in base64Frames],
            {"type": "text", "text": f"The audio transcription is: {transcription.text}"}
        ]}
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)

6. 결론

GPT-4o API는 텍스트, 이미지, 오디오 등 다양한 입력 모달리티를 지원하며, 인간의 정보 처리 방식과 유사한 멀티모달 접근 방식을 제공합니다. 이를 활용하여 다양한 자동화 작업을 수행하고, AI의 강력한 가능성을 체험해 보시기 바랍니다.

저작자표시 비영리 변경금지 (새창열림)

'IT' 카테고리의 다른 글

아마존 베드록(Bedrock)이란? 대규모 AI 모델 개발을 위한 혁신적인 플랫폼 (0)	2025.02.25
Ollama란? 내 PC에서 무료로 LLM 실행하기 (0)	2025.02.25
Ollama 모델 설치 폴더 변경 방법 (1)	2025.02.25
Chatbox AI로 로컬에서 오픈소스 LLM 실행하기 (feat. Ollama) (0)	2025.02.25
Ollama 설치 방법: 초보자도 따라할 수 있는 가이드 (DeepSeek-R1 로컬에서 사용하기) (0)	2025.02.25

유용한 필수 상식

GPT-4o API 사용법 총정리: 간단한 대화, 이미지 처리부터 영상 요약까지

1. GPT-4o란?

2. GPT-4o API 시작하기

2.1 OpenAI SDK 설치

2.2 API 키 설정

3. GPT-4o 활용 예제

3.1 간단한 대화 및 수학 문제 풀이

4. 이미지 처리

4.1 Base64 인코딩 이미지 처리

4.2 URL을 통한 이미지 처리

5. 영상 처리

5.1 프레임 추출 및 오디오 분리

5.2 영상 요약 (비주얼 + 오디오)

6. 결론

'IT' 카테고리의 다른 글

티스토리툴바

GPT-4o API 사용법 총정리: 간단한 대화, 이미지 처리부터 영상 요약까지

1. GPT-4o란?

2. GPT-4o API 시작하기

2.1 OpenAI SDK 설치

2.2 API 키 설정

3. GPT-4o 활용 예제

3.1 간단한 대화 및 수학 문제 풀이

4. 이미지 처리

4.1 Base64 인코딩 이미지 처리

4.2 URL을 통한 이미지 처리

5. 영상 처리

5.1 프레임 추출 및 오디오 분리

5.2 영상 요약 (비주얼 + 오디오)

6. 결론

'IT' 카테고리의 다른 글

관련글

티스토리툴바