[파이썬] 티스토리 API - 전체 포스트 목록 가져오기

Study/Python

[파이썬] 티스토리 API - 전체 포스트 목록 가져오기

Taedi

2021. 2. 15. 02:00

오늘은 티스토리 API를 활용해보는 첫 단계로 블로그에 존재하는 전체 포스트 목록을 가져오는 코드를 짜보았습니다.

도전

우선 티스토리에서 제공하는 '오픈 API 가이드'를 확인하여 보았습니다.(며칠 동안)

output에 대해서는 설명이 없어 당황했는데 json, xml 두 가지의 형태로 출력이 가능한 것으로 보였습니다. (default는 xml)

. json()을 활용하면 dict 형식으로 변환되기 때문에 output을 json 형태로 지정하였습니다.

import requests, re, math
import pandas as pd
from tabulate import tabulate


appid = "<발급받은 앱 아이디>"
access_token = "<발급받은 access token>"
callback_url = "<등록한 CallBack 주소>"
blogName = "<().tistory.com 에서 ()또는 블로그 주소 전체>>"

def list_of_Posts():
    url = "https://www.tistory.com/apis/post/list"

    params = {
        'access_token': access_token,
        'output': 'json', # json, xml 두 가지 형식 지원
        'blogName': blogName,
        'page': '1' # 페이지 번호
    }

    res = requests.get(url, params=params)

    if res.status_code == 200:
        res_json = res.json()

그리고 page 파라미터가 있어 전체 포스트를 한 번에 얻을 수 없는 것이 좀 골치였는데(비워도 1로 default가 잡혀있음) 응답 아이템 중 totalCount, count를 활용하여 "전체 글 수 / 페이지당 표출되는 글 수"를 올림 한 값으로 총 페이지 수를 계산했습니다.

        # 페이지 당 포스트 수
        count = int(res_json['tistory']['item']['count'])
        # 전체 포스트 수
        totalCount = int(res_json['tistory']['item']['totalCount'])

        # 전체 페이지 수
        total_page = math.ceil(totalCount/count)

그리고 딕셔너리 형태로 변환된 내용 중 포스트 정보에 해당하는 key인 ['tistory']['item']['posts']에 해당하는 값들을 리스트 형태로 저장하고 나머지 페이지들의 포스트 정보들도 찾아서 합쳐주었습니다. 병합은 간단하게 '+'로 처리했습니다.

        data = res_json['tistory']['item']['posts']

        for x in range(2, total_page + 1):

            params = {
                'access_token': access_token,
                'output': 'json', # json, xml 두 가지 형식 지원
                'blogName': blogName,   # ().tistory.com 또는 블로그 주소 전체
                'page': x # 페이지 번호
            }

            res = requests.get(url, params=params)
            res_json = res.json()

            data = data + res_json['tistory']['item']['posts']

작성된 리스트를 dataframe로 옮길 때 columns를 활용해 원치 않는 컬럼을 제외시키거나 순서를 변경시킬 수 있습니다. 댓글 수와 트랙백은 당장은 크게 필요 없을 것 같아 제외했으며 주석 부분을 활용하면 전체 아이템을 가져올 수 있습니다.

        # columns = ['id', 'title', 'postUrl', 'visibility', 'categoryId', 'comments', 'trackbacks', 'date']
        columns = ['id', 'title', 'postUrl', 'visibility', 'categoryId', 'date']
        df = pd.DataFrame(data, columns=columns)

        print(tabulate(df, headers='keys', tablefmt='grid'))

        df.to_csv('./result.csv', sep=',', na_rep='NaN', encoding='utf-8-sig')

전체 코드

import requests, re, math
import pandas as pd
from tabulate import tabulate


appid = "<발급받은 앱 아이디>"
access_token = "<발급받은 access token>"
callback_url = "<등록한 CallBack 주소>"
blogName = "<().tistory.com 에서 ()또는 블로그 주소 전체>"

def list_of_Posts():
    url = "https://www.tistory.com/apis/post/list"

    params = {
        'access_token': access_token,
        'output': 'json', # json, xml 두 가지 형식 지원
        'blogName': blogName,
        'page': '1' # 페이지 번호
    }

    res = requests.get(url, params=params)

    if res.status_code == 200:
        res_json = res.json()

        count = int(res_json['tistory']['item']['count'])
        totalCount = int(res_json['tistory']['item']['totalCount'])

        total_page = math.ceil(totalCount/count)

        data = res_json['tistory']['item']['posts']

        for x in range(2, total_page + 1):

            params = {
                'access_token': access_token,
                'output': 'json', # json, xml 두 가지 형식 지원
                'blogName': blogName,   # ().tistory.com 또는 블로그 주소 전체
                'page': x # 페이지 번호
            }

            res = requests.get(url, params=params)
            res_json = res.json()

            data = data + res_json['tistory']['item']['posts']


        # columns = ['id', 'title', 'postUrl', 'visibility', 'categoryId', 'comments', 'trackbacks', 'date']
        columns = ['id', 'title', 'postUrl', 'visibility', 'categoryId', 'date']
        df = pd.DataFrame(data, columns=columns)

        print(tabulate(df, headers='keys', tablefmt='grid'))

        df.to_csv('./result.csv', sep=',', na_rep='NaN', encoding='utf-8-sig')


if __name__ == '__main__':
    list_of_Posts()

결과

tabulate 테이블

csv 파일

참고 : https://tistory.github.io/document-tistory-apis/

저작자표시 비영리 변경금지 (새창열림)

Taedi's Log