Fruitbat 39 Posted November 25, 2020 Share Posted November 25, 2020 Argh - I self-taught myself last year to code Python in order to scrape our fantasy results from RTSports. Was brand new to Python and essentially HTML, but I managed to get code to work to pull down weekly players scores and transactions. Saved me a lot of cutting and pasting and formatting in excel. (we have a dynasty league of 28 years and I have stats and records going all the way back). Fast forward to this year, and it isnt working. And heck if I can figure out why. Some of that is just probably due to me forgetting how the heck the code works. Just getting errors and empty dataframes. Anyone out there with Python scraping experience willing to take a peek? Quote Link to post Share on other sites
ChiefD 19,795 Posted November 25, 2020 Share Posted November 25, 2020 @Arizona Ron has python scraping experience. 4 1 Quote Link to post Share on other sites
Osaurus 9,197 Posted November 25, 2020 Share Posted November 25, 2020 (edited) I write a lot of scripts in Python for data processing and can usually figure out most errors. I’m not a programmer/developer fwiw. I just use them a lot in my field to automate data processing because I’m lazy. Does your code have error/exceptions? This means it tries to run a function, the function fails, and it tells you in a log (if it’s set up that way) what the error is or it fixes it. Based on what you posted, it sounds like it runs to completion so no error handling stoppage which leads me to believe it’s possibly something with the data you’re scraping. Maybe something in regard to formatting changes or maybe the cursor function isn’t successfully grabbing text any more? I can have a look at it if you’d like. Just PM me. Edited November 25, 2020 by Osaurus Quote Link to post Share on other sites
AAABatteries 25,282 Posted November 25, 2020 Share Posted November 25, 2020 Without even looking at it my first guess is your code is expecting the table or data to be formatted in a certain way or contain maybe a set number of columns and RTSports made a change to their website. That’s somewhat typical for web scraping based on my limited knowledge. 2 Quote Link to post Share on other sites
Long Ball Larry 14,304 Posted November 25, 2020 Share Posted November 25, 2020 3 hours ago, AAABatteries said: Without even looking at it my first guess is your code is expecting the table or data to be formatted in a certain way or contain maybe a set number of columns and RTSports made a change to their website. That’s somewhat typical for web scraping based on my limited knowledge. This was my experience using other sites with r. Quote Link to post Share on other sites
Fruitbat 39 Posted November 25, 2020 Author Share Posted November 25, 2020 (edited) Yes I assume there was a subtle change, but I'm not saavy enough to figure out where and how to fix. Here is an excerpt of my code, which maybe is clunky, I don't know. It was trial and error - and it worked before. I've removed some lines and details (i.e. note in the REQURL link -- I have a loop that goes through all the teams and weeks and inside that loop it creates the correct url for that team/week.) #This URL will be the URL that your login form points to with the "action" tag. POSTURL = "https://www.rtsports.com" payload = { 'ACCOUNTID': '<ID removed>' 'PASSWORD': '<pwd removed>' } #This URL is the page you actually want to pull down with requests. REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w) with requests.Session() as session: post = session.post(POSTURL, data=payload) r = session.get(REQURL) soup = BeautifulSoup(r.text, 'lxml') # Parse the HTML as a string rt_table=soup.find('table',{'class':'table table-no-borders table-tight table-hover '}) rt_df_raw = pd.read_html(str(rt_table)) rt_df=rt_df_raw[0] #this is now the dataframe for one team and one week It looks like it performs fine through the creation "rt_table" -- which is supposed to be the html just for the one table on the page I am interested in. Here is what rt_table looks like -- I snipped out the middle section, leaving just the first 2 player rows and the last 2 player rows: (EDIT TO ADD: I think the data stored in "rt_table" is OK, because I took the below and pasted into a web-based html-to-excel converter and everything came out good -- all players with all their stats in rows with correct headers) <table class="table table-no-borders table-tight table-hover "><thead><tr><th>PLAYER</th><th>POS</th><th>NFL</th><th>BYE</th><th>INJ</th><th>OPP</th><th>DATE</th><th>LINEUP</th><th>PTS</th><th>SCORING</th></tr></thead><tbody></tbody><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(11189);">Ben Roethlisberger <span class="glyphicon glyphicon-volume-up"></span></a></td><td>QB</td><td>PIT</td><td class="text-center">4</td><td class="text-center"><span class="injury InjP" data-original-title="Probable - Quad" data-placement="top" data-toggle="tooltip">P</span></td><td class="text-center"><span class="small">@</span>NYG</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=344288&GC=20200914019">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=903586&ART=20200914221524508495508">Recap</a></td><td class="text-center"><strong>Starter</strong></td><td class="text-right">23</td><td class="text-left" style="padding-left:15px;">Ben Roethlisberger 3 passing TDs (18 pts)<br/> . . . Ben Roethlisberger 10 yd TD pass to JuJu Smith-Schuster<br/> . . . Ben Roethlisberger 13 yd TD pass to James Washington<br/> . . . Ben Roethlisberger 8 yd TD pass to JuJu Smith-Schuster<br/>Ben Roethlisberger 229 passing yds (5 pts)<br/></td></tr><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(16108);">Austin Ekeler <span class="glyphicon glyphicon-volume-up"></span></a></td><td>RB</td><td>LAC</td><td class="text-center">6</td><td class="text-center"><span class="injury InjX" data-original-title="On IR - hamstring" data-placement="top" data-toggle="tooltip">X</span></td><td class="text-center"><span class="small">@</span>CIN</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=348565&GC=20200913004">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=562403&ART=20200913193545156385708">Recap</a></td><td class="text-center"><strong>Starter</strong></td><td class="text-right">4</td><td class="text-left" style="padding-left:15px;">Austin Ekeler 84 rushing yds (4 pts)<br/></td></tr> <<<<<SNIPPED>>>>>> <tr class="PlayerRow"><td><a href="javascript:ShowPlayer(16612);">Jace Sternberger</a></td><td>TE</td><td>GNB</td><td class="text-center">5</td><td class="text-center"></td><td class="text-center"><span class="small">@</span>MIN</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=821168&GC=20200913016">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=287930&ART=20200913164324664323708">Recap</a></td><td class="text-center">Bench</td><td class="text-right">0</td><td class="text-left" style="padding-left:15px;"></td></tr><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(15244);">C.J. Uzomah</a></td><td>TE</td><td>CIN</td><td class="text-center">9</td><td class="text-center"><span class="injury InjX" data-original-title="On IR - torn right Achilles" data-placement="top" data-toggle="tooltip">X</span></td><td class="text-center">LAC</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=996392&GC=20200913004">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=195296&ART=20200913193545156385708">Recap</a></td><td class="text-center">Bench</td><td class="text-right">4</td><td class="text-left" style="padding-left:15px;">C.J. Uzomah 45 receiving yds (4 pts)<br/></td></tr></table> but, rt_df_raw only pulls in the column headers...which look fine, its just that the dataset itself is empty: rt_df_raw [Empty DataFrame Columns: [PLAYER, POS, NFL, BYE, INJ, OPP, DATE, LINEUP, PTS, SCORING] Index: []] Edited November 25, 2020 by Fruitbat Quote Link to post Share on other sites
arrow1 949 Posted November 25, 2020 Share Posted November 25, 2020 Please post your rooster Quote Link to post Share on other sites
Bracie Smathers 3,799 Posted November 25, 2020 Share Posted November 25, 2020 Anyone know Python for scraping website tables? Aha! I believe I spotted the problem. Not THIS -- REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w) Try 👉 THIS 👈 1 Quote Link to post Share on other sites
Fruitbat 39 Posted November 25, 2020 Author Share Posted November 25, 2020 36 minutes ago, arrow1 said: Please post your rooster Andersen, Morten Baxter, Brad Blades, Brian Brien, Doug Brooks, Reggie Buffalo Bills Carlson, Cody Coates, Ben Cobb, Reggie Fryar, Irving Givins, Ernest Howard, Desmond Ingram, Mark (Dolphins) Marino, Dan Rison, Andre Russell, Leonard Quote Link to post Share on other sites
Fruitbat 39 Posted November 25, 2020 Author Share Posted November 25, 2020 4 minutes ago, Bracie Smathers said: Anyone know Python for scraping website tables? Aha! I believe I spotted the problem. Not THIS -- REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w) Try 👉 THIS 👈 Quote Link to post Share on other sites
Psychopav 1,186 Posted November 25, 2020 Share Posted November 25, 2020 21 hours ago, Fruitbat said: Argh - I self-taught myself last year to code Python in order to scrape our fantasy results from RTSports. Was brand new to Python and essentially HTML, but I managed to get code to work to pull down weekly players scores and transactions. Saved me a lot of cutting and pasting and formatting in excel. (we have a dynasty league of 28 years and I have stats and records going all the way back). Fast forward to this year, and it isnt working. And heck if I can figure out why. Some of that is just probably due to me forgetting how the heck the code works. Just getting errors and empty dataframes. Anyone out there with Python scraping experience willing to take a peek? Gotta give credit where credit is due. That IS the best person to self-teach (even if you are a pirate). Quote Link to post Share on other sites
Tick 1,466 Posted November 25, 2020 Share Posted November 25, 2020 5 hours ago, arrow1 said: Please post your rooster Robin, White Chicken Chili, Thing 2, Al Cochino, Bill, Chacha, Beepers, and some other small one I can't remember the name of. Quote Link to post Share on other sites
Tick 1,466 Posted November 25, 2020 Share Posted November 25, 2020 Just now, Tick said: Robin, White Chicken Chili, Thing 2, Al Cochino, Bill, Chacha, Beepers, and some other small one I can't remember the name of. Einstein. Quote Link to post Share on other sites
CR69 1,518 Posted November 26, 2020 Share Posted November 26, 2020 I can’t really help because it’s been an long time for me but wanted to possibly warn others. I taught myself Python at one point and ended up getting IP banned by Craigslist for hammering their servers with a scraping tool I built lol. Quote Link to post Share on other sites
Fruitbat 39 Posted December 1, 2020 Author Share Posted December 1, 2020 Shameless bump Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.