[Python] BeautifulSoup를 활용한 xml 파싱 방법

728x90

XML은 HTML과 구조가 비슷하여, BeautifulSoup을 활용하면 간단하게 데이터를 추출할 수 있습니다.

BeautifulSoup 설치 및 준비

BeautifulSoup이 설치되지 않았다면 먼저 설치해 주세요.

(Jupyter Notebook에서 BeautifulSoup 설치하는 방법)

pip install beautifulsoup4 lxml

lxml을 추가로 설치하면 XML을 더 빠르고 안정적으로 파싱할 수 있습니다.

기본적인 XML 파싱

XML 데이터를 BeautifulSoup을 사용하여 파싱하는 기본적인 방법입니다.

✅ 예제 코드:

from bs4 import BeautifulSoup

# XML 데이터
xml_data = """
<data>
    <item>
        <name>사과</name>
        <price>1000</price>
    </item>
    <item>
        <name>바나나</name>
        <price>2000</price>
    </item>
</data>
"""

# BeautifulSoup으로 XML 파싱
soup = BeautifulSoup(xml_data, "xml")

# 특정 태그 찾기
first_item = soup.find("item")  # 첫 번째 <item> 요소

# 태그 안의 값 가져오기
name = first_item.find("name").text
price = first_item.find("price").text
print(f"이름: {name}, 가격: {price}")

'''
실행 결과
이름: 사과, 가격: 1000
'''

모든 <item> 태그 가져오기

XML에는 여러 개의 <item> 태그가 있을 수 있으므로, find_all()을 사용하여 모든 데이터를 가져올 수 있습니다.

✅ 예제 코드:

from bs4 import BeautifulSoup

# XML 데이터
xml_data = """
<data>
    <item>
        <name>사과</name>
        <price>1000</price>
    </item>
    <item>
        <name>바나나</name>
        <price>2000</price>
    </item>
</data>
"""

# BeautifulSoup으로 XML 파싱
soup = BeautifulSoup(xml_data, "xml")

# 모든 <item> 태그 찾기
items = soup.find_all("item")

for item in items:
    name = item.find("name").text
    price = item.find("price").text
    print(f"이름: {name}, 가격: {price}")


'''
실행 결과

이름: 사과, 가격: 1000
이름: 바나나, 가격: 2000
'''

특정 속성을 가진 태그 찾기

XML 데이터에 태그 속성이 있는 경우, 속성을 기준으로 특정 태그를 찾을 수도 있습니다.

✅ 예제 코드:

from bs4 import BeautifulSoup

# XML 데이터
xml_data = """
<data>
    <item id="1">
        <name>사과</name>
        <price>1000</price>
    </item>
    <item id="2">
        <name>바나나</name>
        <price>2000</price>
    </item>
</data>
"""

# BeautifulSoup으로 XML 파싱
soup = BeautifulSoup(xml_data, "xml")

# id="2"인 <item> 태그 찾기
banana_item = soup.find("item", {"id": "2"})

name = banana_item.find("name").text
price = banana_item.find("price").text
print(f"이름: {name}, 가격: {price}")

'''
실행 결과

이름: 바나나, 가격: 2000
'''

xml 데이터에서 특정 값 찾기 (find()와 select())

💡 find(), find_all()

find() → 첫 번째로 일치하는 태그 반환
find_all() → 모든 일치하는 태그 리스트 반환

first_item = soup.find("item")  # 첫 번째 item 태그만 가져옴
all_items = soup.find_all("item")  # 모든 item 태그 가져옴

💡 select(), select_one()

select_one() → 첫 번째로 일치하는 요소 반환
select() → 모든 일치하는 요소 리스트 반환

name = soup.select_one("item name") #item 태그 안에 첫 번째 name을 가져옴.
names = soup.select("item name") #item 태그 안에 모든 name을 가져옴.

웹 페이지 xml 파싱 예제

티스토리 RSS 웹 페이지 xml 데이터를 가지고 xml 파싱하는 예제 입니다. (https://spirit0833.tistory.com/rss)

💡 티스토리 RSS 파싱 (find(), find_all())

✅ 예제 코드:

import requests
from bs4 import BeautifulSoup

# 크롤링할 웹페이지 (티스토리 rss 예제)
URL = "https://spirit0833.tistory.com/rss"

# HTTP 요청
response = requests.get(URL)
response.raise_for_status()  # 응답 코드 확인 (문제가 있을 경우 예외 발생)

# XML 파싱
soup = BeautifulSoup(response.text, "xml")
xmlData = soup.find_all("item")
print(f"데이터 수: {len(xmlData)}")

# 모든 item 태그 찾기
for item in xmlData:
    title = item.find("title").text
    link = item.find("link").text
    print(f"제목: {title}, URL주소: {link}")

🎯 실행 결과:

💡 티스토리 RSS 파싱 (select_one(), select())

✅ 예제 코드:

import requests
from bs4 import BeautifulSoup

# 크롤링할 웹페이지 (티스토리 rss 예제)
URL = "https://spirit0833.tistory.com/rss"

# HTTP 요청
response = requests.get(URL)
response.raise_for_status()  # 응답 코드 확인 (문제가 있을 경우 예외 발생)

# XML 파싱
soup = BeautifulSoup(response.text, "xml")
xmlData = soup.select("item title")
print(f"데이터 수: {len(xmlData)}")

# 모든 item 태그 찾기
for item in xmlData:
    title = item.text
    print(f"제목: {title}")

🎯 실행 결과:

저작자표시 비영리 변경금지 (새창열림)

'프로그래밍 > [Python] 파이썬' 카테고리의 다른 글

[Python] 파이썬 CSV 파일 한글 깨지는 문제 3가지 해결 방법 (75)	2025.03.09
[Python] 파이썬 주피터 노트북 라이브러리 설치 및 제거하는 명령어 (74)	2025.03.02
[Python] 파이썬 with문 사용법 핵심 요약 정리 (192)	2025.01.21
[python] 파이썬 아스키코드(ASCII) 변환 함수 (42)	2025.01.09
[Python] 파이썬 날짜, 시간 더하기 빼기 (80)	2025.01.08

취미 탈출 프로젝트

[Python] BeautifulSoup를 활용한 xml 파싱 방법

BeautifulSoup 설치 및 준비

기본적인 XML 파싱

모든 <item> 태그 가져오기

특정 속성을 가진 태그 찾기

xml 데이터에서 특정 값 찾기 (find()와 select())

💡 find(), find_all()

💡 select(), select_one()

웹 페이지 xml 파싱 예제

💡 티스토리 RSS 파싱 (find(), find_all())

💡 티스토리 RSS 파싱 (select_one(), select())

'프로그래밍 > [Python] 파이썬' 카테고리의 다른 글

티스토리툴바

[Python] BeautifulSoup를 활용한 xml 파싱 방법

BeautifulSoup 설치 및 준비

기본적인 XML 파싱

모든 <item> 태그 가져오기

특정 속성을 가진 태그 찾기

xml 데이터에서 특정 값 찾기 (find()와 select())

💡 find(), find_all()

💡 select(), select_one()

웹 페이지 xml 파싱 예제

💡 티스토리 RSS 파싱 (find(), find_all())

💡 티스토리 RSS 파싱 (select_one(), select())

'프로그래밍 > [Python] 파이썬' 카테고리의 다른 글

관련글

티스토리툴바