ETC

N2T(Notion2Tistory) ์‚ฌ์šฉ๊ธฐ

GaonHeum 2023. 5. 22.
๐Ÿ’ก
Notion์—์„œ Tistory๋กœ ๊ฒŒ์‹œ๊ธ€์„ ์—…๋กœ๋“œ ํ•ด์ฃผ๋Š” ํ”„๋กœ๊ทธ๋žจ

๊ฐœ๋ฐœ์ž ๋ธ”๋กœ๊ทธ

๊ฐœ๋ฐœ์ž ๋ธ”๋กœ๊ทธ๋Š” ์•„๋‹ˆ์ง€๋งŒโ€ฆ


์ˆ˜์ • ํ•ญ๋ชฉ

  • requirements.txt & client.py

    requirements.txt

    beautifulsoup4
    requests
    selenium
    webdriver_manager
    tqdm
    lxml
    git+https://github.com/gaonheum/notion-py.git <- ๋‚ด ๊นƒํ—ˆ๋ธŒ ์ฃผ๋กœ์†Œ ์ˆ˜์ •๋จ
    • # webdriver-manager ๊ด€๋ จ cmd_mapping = { KeyError: 'google-chrome'...} ์—๋Ÿฌ์˜ ๊ฒฝ์šฐ pip list ๋ฅผ ํ†ตํ•ด webdriver-manager ๋ฒ„์ „์ด 3.8.0์ด๋ฉด ์ง€์šฐ๊ณ  3.7.1๋กœ ์žฌ์„ค์น˜

      โ†’ 3.8 ๋ฒ„์ „๋„ ์ •์ƒ ์ž‘๋™ ํ™•์ธ ํ•จ

      โ†’ ๊ตณ์ด zip ํŒŒ์ผ์„ ๋ฐ›์•„์„œ cmd ์ฐฝ์„ ์—ฐ ํ›„ python setup.py install์„ ์ž…๋ ฅํ•ด ์‹คํ–‰ํ•˜์ง€ ์•Š์•„๋„ ๋จ.


    notion -> client.py

    • notion-py๋ฅผ ๋‚ด ๊นƒํ—ˆ๋ธŒ๋กœ ํฌํฌํ•˜์—ฌ ์ˆ˜์ •ํ•œ ์ด์œ 
      1. setup.py ํ˜ธํ™˜์„ฑ ๋ฌธ์ œ
        • ๋ณ€๊ฒฝ ์ „
          with open("README.md", "r") as fh:
              long_description = fh.read()
        • ๋ณ€๊ฒฝ ํ›„
          with open("README.md", "r", encoding='utf-8') as fh:
              long_description = fh.read()
      1. โ€˜method_whitelistโ€™๊ฐ€ deprecated ๋จ
        • ๋ณ€๊ฒฝ ์ „
          def create_session(client_specified_retry=None):
              """
              retry on 502
              """
              session = Session()
              if client_specified_retry:
                  retry = client_specified_retry
              else:
                  retry = Retry(
                      5,
                      backoff_factor=0.3,
                      status_forcelist=(502, 503, 504),
                      # CAUTION: adding 'POST' to this list which is not technically idempotent
                      method_whitelist=(
                          "POST",
                          "HEAD",
                          "TRACE",
                          "GET",
                          "PUT",
                          "OPTIONS",
                          "DELETE",
                      ),
                  )
              adapter = HTTPAdapter(max_retries=retry)
              session.mount("https://", adapter)
              return session
        • ๋ณ€๊ฒฝ ํ›„
          def create_session(client_specified_retry=None):
              """
              retry on 502
              """
              session = Session()
              if client_specified_retry:
                  retry = client_specified_retry
              else:
                  retry = Retry(
                      5,
                      backoff_factor=0.3,
                      status_forcelist=(502, 503, 504),
                      # CAUTION: adding 'POST' to this list which is not technically idempotent
                      allowed_methods=(
                          "POST",
                          "HEAD",
                          "TRACE",
                          "GET",
                          "PUT",
                          "OPTIONS",
                          "DELETE",
                      ),
                  )
              adapter = HTTPAdapter(max_retries=retry)
              session.mount("https://", adapter)
              return session
  • NotionClient.py
    • 17 Line: token_v2 โ†’ notion_token ์œผ๋กœ ๋ณ€๊ฒฝ
      • ๋ณ€๊ฒฝ ์ „
        raise ValueError('[Error] notion token๊ฐ’์ด ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค์‹œ ํ™•์ธ ํ•ด ์ฃผ์„ธ์š”. [{}]'.format(token_v2))
      • ๋ณ€๊ฒฝ ํ›„
        raise ValueError('[Error] notion token๊ฐ’์ด ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค์‹œ ํ™•์ธ ํ•ด ์ฃผ์„ธ์š”. [{}]'.format(notion_token))

  • parse.py

    ํ† ๊ธ€ ์ ‘์–ด์„œ ํฌ์ŠคํŒ… ๊ธฐ๋Šฅ ์ถ”๊ฐ€

    • 79 Line ์ถ”๊ฐ€
      # ํ† ๊ธ€ ๋‹ซ์€ ์ฑ„๋กœ ํฌ์ŠคํŒ…
      details = soup.find_all('details')
      
      for detail in enumerate(details):
      	l_detail = list(detail)
        del l_detail[1]['open']

    ํƒœ๊ทธ ๋ณ€ํ™˜

    ๐Ÿ’ก
    Notion์˜ ํƒœ๊ทธ: ์ œ๋ชฉ1=h1, ์ œ๋ชฉ2=h2, ์ œ๋ชฉ3=h3

    Tistory์˜ ํƒœ๊ทธ: ์ œ๋ชฉ1=h2, ์ œ๋ชฉ2=h3, ์ œ๋ชฉ3=h4

    ์ด๊ธฐ ๋•Œ๋ฌธ์— ํƒœ๊ทธ๋ฅผ ๋ณ€ํ™˜ํ•ด์ค„ ํ•„์š”๊ฐ€ ์žˆ์Œ

    • 33Line ์ถ”๊ฐ€: ํ•จ์ˆ˜ ์ƒ์„ฑ
      def changeTag(soup, tagName, changeTagName):
          while True:
              tag = soup.find(tagName)
              if not tag:
                  break
              tag.name = changeTagName
    • 87Line (# ์ œ๋ชฉ ์ œ๊ฑฐ ์ดํ›„) ์ถ”๊ฐ€: ํ•จ์ˆ˜ ์‹คํ–‰
      # h3 -> h4, h2 -> h3, h1 -> h2๋กœ ํƒœ๊ทธ ๋ณ€๊ฒฝ
          changeTag(article, 'h3', 'h4')
          changeTag(article, 'h2', 'h3')
          changeTag(article, 'h1', 'h2')
  • SeleniumClient.py
    • 66 Line: ์ถ”๊ฐ€ โ†’ ๋”œ๋ ˆ์ด๋ฅผ 3์ดˆ์—์„œ 10์ดˆ๋กœ ๋งŒ๋“ค์–ด 2์ฐจ ์ธ์ฆ์„ ์œ„ํ•œ ์—ฌ์œ  ์‹œ๊ฐ„ ํ™•๋ณด
      # 2์ฐจ ์ธ์ฆ์„ ์œ„ํ•ด 7์ดˆ ์ •๋„ ๋”œ๋ ˆ์ด๋ฅผ ์ถ”๊ฐ€๋กœ ์คŒ
      sleep(7)
    • 41Line: try๋ฌธ๊ณผ except๋ฌธ ์ฝ”๋“œ ๊ตํ™˜

      โ†’ ์›๋ž˜ ์ฝ”๋“œ๋Œ€๋กœ ์‹คํ–‰ํ•  ๊ฒฝ์šฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๋กœ๊ทธ์ธ์„ ๋จผ์ € ๊ฒ€์ฆํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์–ด์ฐจํ”ผ ์นด์นด์˜ค๋กœ ํ†ตํ•ฉ์ด ๋œ ์ง€๊ธˆ ๊ตณ์ด try๋ฌธ์„ ๋จผ์ € ์‹คํ–‰ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค๊ณ  ํŒ๋‹จ ํ•จ.


์ฐธ๊ณ  ์‚ฌํ•ญ

  • ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰ ์‹œ ์นด์นด์˜คํ†ก 2์ฐจ ์ธ์ฆ์„ ํ•ด์ œํ•ด์•ผ ๋กœ๊ทธ์ธ์ด ๋จ

    โ†’ 2์ฐจ ์ธ์ฆ์„ ํ•ด๋†”๋„ ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋Š”๋ฐ ์ธ์ฆ ์‹œ๊ฐ„ ๋•Œ๋ฌธ์— ์˜ค๋ฅ˜ ๋œจ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋Œ€๋‹ค์ˆ˜

    โ†’ 2์ฐจ ์ธ์ฆ์„ ์œ ์ง€ํ•˜๋ ค๋ฉด SeleniumClient.py ์ฐธ๊ณ 


N2T์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ ํ•  ํ‹ฐ์Šคํ† ๋ฆฌ ์Šคํ‚จ ์ถ”์ฒœ


์žฌ๋ฏธ์žˆ๋Š” ๊ธฐ๋Šฅ


[ Uploaded by N2T ]

๋Œ“๊ธ€