What's new
Fantasy Football - Footballguys Forums

Welcome to Our Forums. Once you've registered and logged in, you're primed to talk football, among other topics, with the sharpest and most experienced fantasy players on the internet.

Anyone know Python for scraping website tables? (1 Viewer)

Fruitbat

Footballguy
Argh - I self-taught myself last year to code Python in order to scrape our fantasy results from RTSports.  Was brand new to Python and essentially HTML, but I managed to get code to work to pull down weekly players scores and transactions.  Saved me a lot of cutting and pasting and formatting in excel.  (we have a dynasty league of 28 years and I have stats and records going all the way back).

Fast forward to this year, and it isnt working.  And heck if I can figure out why.  Some of that is just probably due to me forgetting how the heck the code works.  Just getting errors and empty dataframes.

Anyone out there with Python scraping experience willing to take a peek? 

 
I write a lot of scripts in Python for data processing and can usually figure out most errors. I’m not a programmer/developer fwiw. I just use them a lot in my field to automate data processing because I’m lazy. Does your code have error/exceptions? This means it tries to run a function, the function fails, and it tells you in a log (if it’s set up that way) what the error is or it fixes it. Based on what you posted, it sounds like it runs to completion so no error handling stoppage which leads me to believe it’s possibly something with the data you’re scraping. Maybe something in regard to formatting changes or maybe the cursor function isn’t successfully grabbing text any more? I can have a look at it if you’d like. Just PM me.

 
Last edited by a moderator:
Without even looking at it my first guess is your code is expecting the table or data to be formatted in a certain way or contain maybe a set number of columns and RTSports made a change to their website. That’s somewhat typical for web scraping based on my limited knowledge.

 
Without even looking at it my first guess is your code is expecting the table or data to be formatted in a certain way or contain maybe a set number of columns and RTSports made a change to their website. That’s somewhat typical for web scraping based on my limited knowledge.
This was my experience using other sites with r.  

 
Yes I assume there was a subtle change, but I'm not saavy enough to figure out where and how to fix.  Here is an excerpt of my code, which maybe is clunky, I don't know.  It was trial and error - and it worked before.  I've removed some lines and details (i.e. note in the REQURL link -- I have a loop that goes through all the teams and weeks and inside that loop it creates the correct url for that team/week.)

#This URL will be the URL that your login form points to with the "action" tag.
POSTURL = "https://www.rtsports.com"

payload = {
'ACCOUNTID': '<ID removed>'
'PASSWORD': '<pwd removed>'
}

#This URL is the page you actually want to pull down with requests.
REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w)

with requests.Session() as session:
post = session.post(POSTURL, data=payload)
r = session.get(REQURL)

soup = BeautifulSoup(r.text, 'lxml') # Parse the HTML as a string
rt_table=soup.find('table',{'class':'table table-no-borders table-tight table-hover '})
rt_df_raw = pd.read_html(str(rt_table))
rt_df=rt_df_raw[0] #this is now the dataframe for one team and one week


It looks like it performs fine through the creation "rt_table" -- which is supposed to be the html just for the one table on the page I am interested in.  Here is what rt_table looks like -- I snipped out the middle section, leaving just the first 2 player rows and the last 2 player rows:

(EDIT TO ADD: I think the data stored in "rt_table" is OK, because I took the below and pasted into a web-based html-to-excel converter and everything came out good -- all players with all their stats in rows with correct headers)

<table class="table table-no-borders table-tight table-hover "><thead><tr><th>PLAYER</th><th>POS</th><th>NFL</th><th>BYE</th><th>INJ</th><th>OPP</th><th>DATE</th><th>LINEUP</th><th>PTS</th><th>SCORING</th></tr></thead><tbody></tbody><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(11189);">Ben Roethlisberger <span class="glyphicon glyphicon-volume-up"></span></a></td><td>QB</td><td>PIT</td><td class="text-center">4</td><td class="text-center"><span class="injury InjP" data-original-title="Probable - Quad" data-placement="top" data-toggle="tooltip">P</span></td><td class="text-center"><span class="small">@</span>NYG</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=344288&GC=20200914019">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=903586&ART=20200914221524508495508">Recap</a></td><td class="text-center"><strong>Starter</strong></td><td class="text-right">23</td><td class="text-left" style="padding-left:15px;">Ben Roethlisberger 3 passing TDs (18 pts)<br/> . . . Ben Roethlisberger 10 yd TD pass to JuJu Smith-Schuster<br/> . . . Ben Roethlisberger 13 yd TD pass to James Washington<br/> . . . Ben Roethlisberger 8 yd TD pass to JuJu Smith-Schuster<br/>Ben Roethlisberger 229 passing yds (5 pts)<br/></td></tr><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(16108);">Austin Ekeler <span class="glyphicon glyphicon-volume-up"></span></a></td><td>RB</td><td>LAC</td><td class="text-center">6</td><td class="text-center"><span class="injury InjX" data-original-title="On IR - hamstring" data-placement="top" data-toggle="tooltip">X</span></td><td class="text-center"><span class="small">@</span>CIN</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=348565&GC=20200913004">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=562403&ART=20200913193545156385708">Recap</a></td><td class="text-center"><strong>Starter</strong></td><td class="text-right">4</td><td class="text-left" style="padding-left:15px;">Austin Ekeler 84 rushing yds (4 pts)<br/></td></tr>
<<<<<SNIPPED>>>>>>
<tr class="PlayerRow"><td><a href="javascript:ShowPlayer(16612);">Jace Sternberger</a></td><td>TE</td><td>GNB</td><td class="text-center">5</td><td class="text-center"></td><td class="text-center"><span class="small">@</span>MIN</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=821168&GC=20200913016">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=287930&ART=20200913164324664323708">Recap</a></td><td class="text-center">Bench</td><td class="text-right">0</td><td class="text-left" style="padding-left:15px;"></td></tr><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(15244);">C.J. Uzomah</a></td><td>TE</td><td>CIN</td><td class="text-center">9</td><td class="text-center"><span class="injury InjX" data-original-title="On IR - torn right Achilles" data-placement="top" data-toggle="tooltip">X</span></td><td class="text-center">LAC</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&UID=rnj07n3via1124l&X=996392&GC=20200913004">Final</a> | <a href="/football/syndicated-news.php?LID=23611&UID=rnj07n3via1124l&X=195296&ART=20200913193545156385708">Recap</a></td><td class="text-center">Bench</td><td class="text-right">4</td><td class="text-left" style="padding-left:15px;">C.J. Uzomah 45 receiving yds (4 pts)<br/></td></tr></table>


but, rt_df_raw only pulls in the column headers...which look fine, its just that the dataset itself is empty:

Code:
rt_df_raw

[Empty DataFrame
 Columns: [PLAYER, POS, NFL, BYE, INJ, OPP, DATE, LINEUP, PTS, SCORING]
 Index: []]
 
Last edited by a moderator:
Anyone know Python for scraping website tables?

Aha!

I believe I spotted the problem.

Not THIS --  

REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w)

Try  👉 THIS  👈

 
Please post your rooster
Andersen, Morten
Baxter, Brad
Blades, Brian
Brien, Doug
Brooks, Reggie
Buffalo Bills
Carlson, Cody
Coates, Ben
Cobb, Reggie
Fryar, Irving
Givins, Ernest
Howard, Desmond
Ingram, Mark (Dolphins)
Marino, Dan
Rison, Andre
Russell, Leonard

 
Argh - I self-taught myself last year to code Python in order to scrape our fantasy results from RTSports.  Was brand new to Python and essentially HTML, but I managed to get code to work to pull down weekly players scores and transactions.  Saved me a lot of cutting and pasting and formatting in excel.  (we have a dynasty league of 28 years and I have stats and records going all the way back).

Fast forward to this year, and it isnt working.  And heck if I can figure out why.  Some of that is just probably due to me forgetting how the heck the code works.  Just getting errors and empty dataframes.

Anyone out there with Python scraping experience willing to take a peek? 
Gotta give credit where credit is due.  That IS the best person to self-teach (even if you are a pirate).

 
I can’t really help because it’s been an long time for me but wanted to possibly warn others. I taught myself Python at one point and ended up getting IP banned by Craigslist for hammering their servers with a scraping tool I built lol. 

 

Users who are viewing this thread

Top