Reducing the amount of List in a WebScraper












2














At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



from requests import get
import requests
import json
from time import sleep
import pandas as pd

url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
list_name =
list_price =
list_discount =
list_stock =

response = get(url)
json_data = response.json()


def getShockingSales():
index = 0
if response.status_code is 200:
print('Response: ' + 'OK')
else:
print('Unable to access')
total_flashsale = len(json_data['data']['items'])
total_flashsale -= 1
for i in range(index, total_flashsale):
print('Getting data from site... please wait a few seconds')
while i <= total_flashsale:
flash_name = json_data['data']['items'][i]['name']
flash_price = json_data['data']['items'][i]['price']
flash_discount = json_data['data']['items'][i]['discount']
flash_stock = json_data['data']['items'][i]['stock']
list_name.append(flash_name)
list_price.append(flash_price)
list_discount.append(flash_discount)
list_stock.append(flash_stock)
sleep(0.5)
i += 1
if i > total_flashsale:
print('Task is completed...')
return

getShockingSales()
new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
'Discount': list_discount, 'Stock Available': list_stock})

print('Converting to Panda Frame....')
sleep(5)
print(new_panda)


Would one list be more than sufficient? Am I approaching this wrongly.










share|improve this question









New contributor




Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    2














    At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



    from requests import get
    import requests
    import json
    from time import sleep
    import pandas as pd

    url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
    list_name =
    list_price =
    list_discount =
    list_stock =

    response = get(url)
    json_data = response.json()


    def getShockingSales():
    index = 0
    if response.status_code is 200:
    print('Response: ' + 'OK')
    else:
    print('Unable to access')
    total_flashsale = len(json_data['data']['items'])
    total_flashsale -= 1
    for i in range(index, total_flashsale):
    print('Getting data from site... please wait a few seconds')
    while i <= total_flashsale:
    flash_name = json_data['data']['items'][i]['name']
    flash_price = json_data['data']['items'][i]['price']
    flash_discount = json_data['data']['items'][i]['discount']
    flash_stock = json_data['data']['items'][i]['stock']
    list_name.append(flash_name)
    list_price.append(flash_price)
    list_discount.append(flash_discount)
    list_stock.append(flash_stock)
    sleep(0.5)
    i += 1
    if i > total_flashsale:
    print('Task is completed...')
    return

    getShockingSales()
    new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
    'Discount': list_discount, 'Stock Available': list_stock})

    print('Converting to Panda Frame....')
    sleep(5)
    print(new_panda)


    Would one list be more than sufficient? Am I approaching this wrongly.










    share|improve this question









    New contributor




    Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      2












      2








      2







      At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



      from requests import get
      import requests
      import json
      from time import sleep
      import pandas as pd

      url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
      list_name =
      list_price =
      list_discount =
      list_stock =

      response = get(url)
      json_data = response.json()


      def getShockingSales():
      index = 0
      if response.status_code is 200:
      print('Response: ' + 'OK')
      else:
      print('Unable to access')
      total_flashsale = len(json_data['data']['items'])
      total_flashsale -= 1
      for i in range(index, total_flashsale):
      print('Getting data from site... please wait a few seconds')
      while i <= total_flashsale:
      flash_name = json_data['data']['items'][i]['name']
      flash_price = json_data['data']['items'][i]['price']
      flash_discount = json_data['data']['items'][i]['discount']
      flash_stock = json_data['data']['items'][i]['stock']
      list_name.append(flash_name)
      list_price.append(flash_price)
      list_discount.append(flash_discount)
      list_stock.append(flash_stock)
      sleep(0.5)
      i += 1
      if i > total_flashsale:
      print('Task is completed...')
      return

      getShockingSales()
      new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
      'Discount': list_discount, 'Stock Available': list_stock})

      print('Converting to Panda Frame....')
      sleep(5)
      print(new_panda)


      Would one list be more than sufficient? Am I approaching this wrongly.










      share|improve this question









      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



      from requests import get
      import requests
      import json
      from time import sleep
      import pandas as pd

      url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
      list_name =
      list_price =
      list_discount =
      list_stock =

      response = get(url)
      json_data = response.json()


      def getShockingSales():
      index = 0
      if response.status_code is 200:
      print('Response: ' + 'OK')
      else:
      print('Unable to access')
      total_flashsale = len(json_data['data']['items'])
      total_flashsale -= 1
      for i in range(index, total_flashsale):
      print('Getting data from site... please wait a few seconds')
      while i <= total_flashsale:
      flash_name = json_data['data']['items'][i]['name']
      flash_price = json_data['data']['items'][i]['price']
      flash_discount = json_data['data']['items'][i]['discount']
      flash_stock = json_data['data']['items'][i]['stock']
      list_name.append(flash_name)
      list_price.append(flash_price)
      list_discount.append(flash_discount)
      list_stock.append(flash_stock)
      sleep(0.5)
      i += 1
      if i > total_flashsale:
      print('Task is completed...')
      return

      getShockingSales()
      new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
      'Discount': list_discount, 'Stock Available': list_stock})

      print('Converting to Panda Frame....')
      sleep(5)
      print(new_panda)


      Would one list be more than sufficient? Am I approaching this wrongly.







      python python-3.x json






      share|improve this question









      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 6 hours ago







      Minial













      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 9 hours ago









      MinialMinial

      113




      113




      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          3














          Review




          1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


          2. index is not useful, and range(0, n) is the same as range(n)


          3. Using == is more appropriate than is in general, hence response.status_code == 200


          4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


          5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


          6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


          7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


          8. Speaking from this, requests.get is more explicit.


          9. You can actually create a pandas DataFrame directly from a list of dictionaries.


          10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


          11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


          12. You don't need to import json.


          13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



          14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



            from sys import stderr



            print('Information', file=stderr)




          Code



          import requests
          import pandas as pd


          def getShockingSales(url):
          response = requests.get(url)
          columns = ["name", "price", "discount", "stock"]
          response.raise_for_status()
          print("Response: OK")
          json_data = response.json()
          df = pd.DataFrame(json_data["data"]["items"])[columns]
          df.discount = df.discount.apply(lambda s: int(s[:-1]))
          print("Task is completed...")
          return df


          URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
          df = getShockingSales(URL)





          share|improve this answer








          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


























            2














            Review




            1. Remove unnecessary imports


            2. Don't work in the global namespace



              This makes it harder to track bugs



            3. constants (url) should be UPPER_SNAKE_CASE


            4. Functions (getShockingSales()) should be lower_snake_case


            5. You don't break or return when an invalid status is encountered



            6. if response.status_code is 200: should be == instead of is



              There is a function for this though



              response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




            7. Why use a while inside the for and return when finished with the while



              This is really odd!
              Either loop with a for or a while, not both! Because the while currently disregards the for loop.



              I suggest to stick with for loops, Python excels at readable for loops



              (Loop like a native)





            Would one list be more than sufficient? Am I approaching this wrongly.




            Yes.



            You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



            Code



            from requests import get
            import pandas as pd

            URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

            def get_stocking_sales():
            response = get(URL)
            response.raise_for_status()
            return [
            (item['name'], item['price'], item['discount'], item['stock'])
            for item in response.json()['data']['items']
            ]

            def create_pd():
            return pd.DataFrame(
            get_stocking_sales(),
            columns=['Name', 'Price', 'Discount', 'Stock']
            )

            if __name__ == '__main__':
            print(create_pd())





            share|improve this answer





















              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "196"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              Minial is a new contributor. Be nice, and check out our Code of Conduct.










              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211164%2freducing-the-amount-of-list-in-a-webscraper%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              Review




              1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


              2. index is not useful, and range(0, n) is the same as range(n)


              3. Using == is more appropriate than is in general, hence response.status_code == 200


              4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


              5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


              6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


              7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


              8. Speaking from this, requests.get is more explicit.


              9. You can actually create a pandas DataFrame directly from a list of dictionaries.


              10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


              11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


              12. You don't need to import json.


              13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



              14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



                from sys import stderr



                print('Information', file=stderr)




              Code



              import requests
              import pandas as pd


              def getShockingSales(url):
              response = requests.get(url)
              columns = ["name", "price", "discount", "stock"]
              response.raise_for_status()
              print("Response: OK")
              json_data = response.json()
              df = pd.DataFrame(json_data["data"]["items"])[columns]
              df.discount = df.discount.apply(lambda s: int(s[:-1]))
              print("Task is completed...")
              return df


              URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
              df = getShockingSales(URL)





              share|improve this answer








              New contributor




              Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.























                3














                Review




                1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


                2. index is not useful, and range(0, n) is the same as range(n)


                3. Using == is more appropriate than is in general, hence response.status_code == 200


                4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


                5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


                6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


                7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


                8. Speaking from this, requests.get is more explicit.


                9. You can actually create a pandas DataFrame directly from a list of dictionaries.


                10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


                11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


                12. You don't need to import json.


                13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



                14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



                  from sys import stderr



                  print('Information', file=stderr)




                Code



                import requests
                import pandas as pd


                def getShockingSales(url):
                response = requests.get(url)
                columns = ["name", "price", "discount", "stock"]
                response.raise_for_status()
                print("Response: OK")
                json_data = response.json()
                df = pd.DataFrame(json_data["data"]["items"])[columns]
                df.discount = df.discount.apply(lambda s: int(s[:-1]))
                print("Task is completed...")
                return df


                URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
                df = getShockingSales(URL)





                share|improve this answer








                New contributor




                Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





















                  3












                  3








                  3






                  Review




                  1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


                  2. index is not useful, and range(0, n) is the same as range(n)


                  3. Using == is more appropriate than is in general, hence response.status_code == 200


                  4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


                  5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


                  6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


                  7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


                  8. Speaking from this, requests.get is more explicit.


                  9. You can actually create a pandas DataFrame directly from a list of dictionaries.


                  10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


                  11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


                  12. You don't need to import json.


                  13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



                  14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



                    from sys import stderr



                    print('Information', file=stderr)




                  Code



                  import requests
                  import pandas as pd


                  def getShockingSales(url):
                  response = requests.get(url)
                  columns = ["name", "price", "discount", "stock"]
                  response.raise_for_status()
                  print("Response: OK")
                  json_data = response.json()
                  df = pd.DataFrame(json_data["data"]["items"])[columns]
                  df.discount = df.discount.apply(lambda s: int(s[:-1]))
                  print("Task is completed...")
                  return df


                  URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
                  df = getShockingSales(URL)





                  share|improve this answer








                  New contributor




                  Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  Review




                  1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


                  2. index is not useful, and range(0, n) is the same as range(n)


                  3. Using == is more appropriate than is in general, hence response.status_code == 200


                  4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


                  5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


                  6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


                  7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


                  8. Speaking from this, requests.get is more explicit.


                  9. You can actually create a pandas DataFrame directly from a list of dictionaries.


                  10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


                  11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


                  12. You don't need to import json.


                  13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



                  14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



                    from sys import stderr



                    print('Information', file=stderr)




                  Code



                  import requests
                  import pandas as pd


                  def getShockingSales(url):
                  response = requests.get(url)
                  columns = ["name", "price", "discount", "stock"]
                  response.raise_for_status()
                  print("Response: OK")
                  json_data = response.json()
                  df = pd.DataFrame(json_data["data"]["items"])[columns]
                  df.discount = df.discount.apply(lambda s: int(s[:-1]))
                  print("Task is completed...")
                  return df


                  URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
                  df = getShockingSales(URL)






                  share|improve this answer








                  New contributor




                  Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 6 hours ago









                  LaboLabo

                  1514




                  1514




                  New contributor




                  Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.

























                      2














                      Review




                      1. Remove unnecessary imports


                      2. Don't work in the global namespace



                        This makes it harder to track bugs



                      3. constants (url) should be UPPER_SNAKE_CASE


                      4. Functions (getShockingSales()) should be lower_snake_case


                      5. You don't break or return when an invalid status is encountered



                      6. if response.status_code is 200: should be == instead of is



                        There is a function for this though



                        response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




                      7. Why use a while inside the for and return when finished with the while



                        This is really odd!
                        Either loop with a for or a while, not both! Because the while currently disregards the for loop.



                        I suggest to stick with for loops, Python excels at readable for loops



                        (Loop like a native)





                      Would one list be more than sufficient? Am I approaching this wrongly.




                      Yes.



                      You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



                      Code



                      from requests import get
                      import pandas as pd

                      URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

                      def get_stocking_sales():
                      response = get(URL)
                      response.raise_for_status()
                      return [
                      (item['name'], item['price'], item['discount'], item['stock'])
                      for item in response.json()['data']['items']
                      ]

                      def create_pd():
                      return pd.DataFrame(
                      get_stocking_sales(),
                      columns=['Name', 'Price', 'Discount', 'Stock']
                      )

                      if __name__ == '__main__':
                      print(create_pd())





                      share|improve this answer


























                        2














                        Review




                        1. Remove unnecessary imports


                        2. Don't work in the global namespace



                          This makes it harder to track bugs



                        3. constants (url) should be UPPER_SNAKE_CASE


                        4. Functions (getShockingSales()) should be lower_snake_case


                        5. You don't break or return when an invalid status is encountered



                        6. if response.status_code is 200: should be == instead of is



                          There is a function for this though



                          response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




                        7. Why use a while inside the for and return when finished with the while



                          This is really odd!
                          Either loop with a for or a while, not both! Because the while currently disregards the for loop.



                          I suggest to stick with for loops, Python excels at readable for loops



                          (Loop like a native)





                        Would one list be more than sufficient? Am I approaching this wrongly.




                        Yes.



                        You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



                        Code



                        from requests import get
                        import pandas as pd

                        URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

                        def get_stocking_sales():
                        response = get(URL)
                        response.raise_for_status()
                        return [
                        (item['name'], item['price'], item['discount'], item['stock'])
                        for item in response.json()['data']['items']
                        ]

                        def create_pd():
                        return pd.DataFrame(
                        get_stocking_sales(),
                        columns=['Name', 'Price', 'Discount', 'Stock']
                        )

                        if __name__ == '__main__':
                        print(create_pd())





                        share|improve this answer
























                          2












                          2








                          2






                          Review




                          1. Remove unnecessary imports


                          2. Don't work in the global namespace



                            This makes it harder to track bugs



                          3. constants (url) should be UPPER_SNAKE_CASE


                          4. Functions (getShockingSales()) should be lower_snake_case


                          5. You don't break or return when an invalid status is encountered



                          6. if response.status_code is 200: should be == instead of is



                            There is a function for this though



                            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




                          7. Why use a while inside the for and return when finished with the while



                            This is really odd!
                            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



                            I suggest to stick with for loops, Python excels at readable for loops



                            (Loop like a native)





                          Would one list be more than sufficient? Am I approaching this wrongly.




                          Yes.



                          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



                          Code



                          from requests import get
                          import pandas as pd

                          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

                          def get_stocking_sales():
                          response = get(URL)
                          response.raise_for_status()
                          return [
                          (item['name'], item['price'], item['discount'], item['stock'])
                          for item in response.json()['data']['items']
                          ]

                          def create_pd():
                          return pd.DataFrame(
                          get_stocking_sales(),
                          columns=['Name', 'Price', 'Discount', 'Stock']
                          )

                          if __name__ == '__main__':
                          print(create_pd())





                          share|improve this answer












                          Review




                          1. Remove unnecessary imports


                          2. Don't work in the global namespace



                            This makes it harder to track bugs



                          3. constants (url) should be UPPER_SNAKE_CASE


                          4. Functions (getShockingSales()) should be lower_snake_case


                          5. You don't break or return when an invalid status is encountered



                          6. if response.status_code is 200: should be == instead of is



                            There is a function for this though



                            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




                          7. Why use a while inside the for and return when finished with the while



                            This is really odd!
                            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



                            I suggest to stick with for loops, Python excels at readable for loops



                            (Loop like a native)





                          Would one list be more than sufficient? Am I approaching this wrongly.




                          Yes.



                          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



                          Code



                          from requests import get
                          import pandas as pd

                          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

                          def get_stocking_sales():
                          response = get(URL)
                          response.raise_for_status()
                          return [
                          (item['name'], item['price'], item['discount'], item['stock'])
                          for item in response.json()['data']['items']
                          ]

                          def create_pd():
                          return pd.DataFrame(
                          get_stocking_sales(),
                          columns=['Name', 'Price', 'Discount', 'Stock']
                          )

                          if __name__ == '__main__':
                          print(create_pd())






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 6 hours ago









                          LudisposedLudisposed

                          7,13321959




                          7,13321959






















                              Minial is a new contributor. Be nice, and check out our Code of Conduct.










                              draft saved

                              draft discarded


















                              Minial is a new contributor. Be nice, and check out our Code of Conduct.













                              Minial is a new contributor. Be nice, and check out our Code of Conduct.












                              Minial is a new contributor. Be nice, and check out our Code of Conduct.
















                              Thanks for contributing an answer to Code Review Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211164%2freducing-the-amount-of-list-in-a-webscraper%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Liste der Baudenkmale in Friedland (Mecklenburg)

                              Single-Malt-Whisky

                              Czorneboh