Jump to content
Fantasy Football - Footballguys Forums

Anyone know Python for scraping website tables?


Recommended Posts

Argh - I self-taught myself last year to code Python in order to scrape our fantasy results from RTSports.  Was brand new to Python and essentially HTML, but I managed to get code to work to pull down weekly players scores and transactions.  Saved me a lot of cutting and pasting and formatting in excel.  (we have a dynasty league of 28 years and I have stats and records going all the way back).

Fast forward to this year, and it isnt working.  And heck if I can figure out why.  Some of that is just probably due to me forgetting how the heck the code works.  Just getting errors and empty dataframes.

Anyone out there with Python scraping experience willing to take a peek? 

Link to post
Share on other sites

I write a lot of scripts in Python for data processing and can usually figure out most errors. I’m not a programmer/developer fwiw. I just use them a lot in my field to automate data processing because I’m lazy. Does your code have error/exceptions? This means it tries to run a function, the function fails, and it tells you in a log (if it’s set up that way) what the error is or it fixes it. Based on what you posted, it sounds like it runs to completion so no error handling stoppage which leads me to believe it’s possibly something with the data you’re scraping. Maybe something in regard to formatting changes or maybe the cursor function isn’t successfully grabbing text any more? I can have a look at it if you’d like. Just PM me.

Edited by Osaurus
Link to post
Share on other sites

Without even looking at it my first guess is your code is expecting the table or data to be formatted in a certain way or contain maybe a set number of columns and RTSports made a change to their website. That’s somewhat typical for web scraping based on my limited knowledge.

  • Like 2
Link to post
Share on other sites
3 hours ago, AAABatteries said:

Without even looking at it my first guess is your code is expecting the table or data to be formatted in a certain way or contain maybe a set number of columns and RTSports made a change to their website. That’s somewhat typical for web scraping based on my limited knowledge.

This was my experience using other sites with r.  

Link to post
Share on other sites

Yes I assume there was a subtle change, but I'm not saavy enough to figure out where and how to fix.  Here is an excerpt of my code, which maybe is clunky, I don't know.  It was trial and error - and it worked before.  I've removed some lines and details (i.e. note in the REQURL link -- I have a loop that goes through all the teams and weeks and inside that loop it creates the correct url for that team/week.)

#This URL will be the URL that your login form points to with the "action" tag.
POSTURL = "https://www.rtsports.com"

payload = {
    'ACCOUNTID': '<ID removed>'
    'PASSWORD': '<pwd removed>'
}

        #This URL is the page you actually want to pull down with requests.
        REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w)

        with requests.Session() as session:
            post = session.post(POSTURL, data=payload)
            r = session.get(REQURL)

        soup = BeautifulSoup(r.text, 'lxml') # Parse the HTML as a string
        rt_table=soup.find('table',{'class':'table table-no-borders table-tight table-hover '})
        rt_df_raw = pd.read_html(str(rt_table))
        rt_df=rt_df_raw[0] #this is now the dataframe for one team and one week

It looks like it performs fine through the creation "rt_table" -- which is supposed to be the html just for the one table on the page I am interested in.  Here is what rt_table looks like -- I snipped out the middle section, leaving just the first 2 player rows and the last 2 player rows:

(EDIT TO ADD: I think the data stored in "rt_table" is OK, because I took the below and pasted into a web-based html-to-excel converter and everything came out good -- all players with all their stats in rows with correct headers)

<table class="table table-no-borders table-tight table-hover "><thead><tr><th>PLAYER</th><th>POS</th><th>NFL</th><th>BYE</th><th>INJ</th><th>OPP</th><th>DATE</th><th>LINEUP</th><th>PTS</th><th>SCORING</th></tr></thead><tbody></tbody><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(11189);">Ben Roethlisberger <span class="glyphicon glyphicon-volume-up"></span></a></td><td>QB</td><td>PIT</td><td class="text-center">4</td><td class="text-center"><span class="injury InjP" data-original-title="Probable - Quad" data-placement="top" data-toggle="tooltip">P</span></td><td class="text-center"><span class="small">@</span>NYG</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=344288&amp;GC=20200914019">Final</a> | <a href="/football/syndicated-news.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=903586&amp;ART=20200914221524508495508">Recap</a></td><td class="text-center"><strong>Starter</strong></td><td class="text-right">23</td><td class="text-left" style="padding-left:15px;">Ben Roethlisberger 3 passing TDs (18 pts)<br/> . . . Ben Roethlisberger 10 yd TD pass to JuJu Smith-Schuster<br/> . . . Ben Roethlisberger 13 yd TD pass to James Washington<br/> . . . Ben Roethlisberger 8 yd TD pass to JuJu Smith-Schuster<br/>Ben Roethlisberger 229 passing yds (5 pts)<br/></td></tr><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(16108);">Austin Ekeler <span class="glyphicon glyphicon-volume-up"></span></a></td><td>RB</td><td>LAC</td><td class="text-center">6</td><td class="text-center"><span class="injury InjX" data-original-title="On IR - hamstring" data-placement="top" data-toggle="tooltip">X</span></td><td class="text-center"><span class="small">@</span>CIN</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=348565&amp;GC=20200913004">Final</a> | <a href="/football/syndicated-news.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=562403&amp;ART=20200913193545156385708">Recap</a></td><td class="text-center"><strong>Starter</strong></td><td class="text-right">4</td><td class="text-left" style="padding-left:15px;">Austin Ekeler 84 rushing yds (4 pts)<br/></td></tr>
<<<<<SNIPPED>>>>>>
<tr class="PlayerRow"><td><a href="javascript:ShowPlayer(16612);">Jace Sternberger</a></td><td>TE</td><td>GNB</td><td class="text-center">5</td><td class="text-center"></td><td class="text-center"><span class="small">@</span>MIN</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=821168&amp;GC=20200913016">Final</a> | <a href="/football/syndicated-news.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=287930&amp;ART=20200913164324664323708">Recap</a></td><td class="text-center">Bench</td><td class="text-right">0</td><td class="text-left" style="padding-left:15px;"></td></tr><tr class="PlayerRow"><td><a href="javascript:ShowPlayer(15244);">C.J. Uzomah</a></td><td>TE</td><td>CIN</td><td class="text-center">9</td><td class="text-center"><span class="injury InjX" data-original-title="On IR - torn right Achilles" data-placement="top" data-toggle="tooltip">X</span></td><td class="text-center">LAC</td><td><a href="/football/nfl-live-boxscore.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=996392&amp;GC=20200913004">Final</a> | <a href="/football/syndicated-news.php?LID=23611&amp;UID=rnj07n3via1124l&amp;X=195296&amp;ART=20200913193545156385708">Recap</a></td><td class="text-center">Bench</td><td class="text-right">4</td><td class="text-left" style="padding-left:15px;">C.J. Uzomah 45 receiving yds (4 pts)<br/></td></tr></table>

but, rt_df_raw only pulls in the column headers...which look fine, its just that the dataset itself is empty:

rt_df_raw

[Empty DataFrame
 Columns: [PLAYER, POS, NFL, BYE, INJ, OPP, DATE, LINEUP, PTS, SCORING]
 Index: []]

 

 

 

Edited by Fruitbat
Link to post
Share on other sites
36 minutes ago, arrow1 said:

Please post your rooster

Andersen, Morten
Baxter, Brad
Blades, Brian
Brien, Doug
Brooks, Reggie
Buffalo Bills
Carlson, Cody
Coates, Ben
Cobb, Reggie
Fryar, Irving
Givins, Ernest
Howard, Desmond
Ingram, Mark (Dolphins)
Marino, Dan
Rison, Andre
Russell, Leonard

Link to post
Share on other sites
4 minutes ago, Bracie Smathers said:

Anyone know Python for scraping website tables?

Aha!

I believe I spotted the problem.

Not THIS --  

REQURL = 'https://www.rtsports.com/football/team-capsules.php?LID=23611&UID=rnj07n3via1124l&TID='+team_rt_dict[t]+'&FWK='+str(w)

Try  👉 THIS  👈

:whoosh:

Link to post
Share on other sites
21 hours ago, Fruitbat said:

Argh - I self-taught myself last year to code Python in order to scrape our fantasy results from RTSports.  Was brand new to Python and essentially HTML, but I managed to get code to work to pull down weekly players scores and transactions.  Saved me a lot of cutting and pasting and formatting in excel.  (we have a dynasty league of 28 years and I have stats and records going all the way back).

Fast forward to this year, and it isnt working.  And heck if I can figure out why.  Some of that is just probably due to me forgetting how the heck the code works.  Just getting errors and empty dataframes.

Anyone out there with Python scraping experience willing to take a peek? 

Gotta give credit where credit is due.  That IS the best person to self-teach (even if you are a pirate).

Link to post
Share on other sites

I can’t really help because it’s been an long time for me but wanted to possibly warn others. I taught myself Python at one point and ended up getting IP banned by Craigslist for hammering their servers with a scraping tool I built lol. 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...