Motivation

Devon Cantwell had a fun idea.

This reminded me of playing Pictionary with my research lab.

Methods

Let’s grab some words from the APSA 2020 program.

Scraping websites with rvest is easy! (I’ll also use tidyverse and magritter functions here.)

A fun minimal example is the UN website:

library(rvest)

html <- read_html("https://UN.org") # The UN homepage
links <- html_nodes(html, "a") # "a" nodes are linked text
html_text(links)
##  [1] "مرحبا بكم في موقع الأمم المتحدة"                 
##  [2] "欢迎来到联合国网站"                              
##  [3] "Welcome to the United Nations website"           
##  [4] "Bienvenue sur le site Internet des Nations Unies"
##  [5] "Добро пожаловать в ООН!"                         
##  [6] "Bienvenido al sitio web de las Naciones Unidas"  
##  [7] "\r\n              عربي\r\n              "        
##  [8] "\r\n              中文\r\n              "        
##  [9] "\r\n              English\r\n              "     
## [10] "\r\n              Français\r\n              "    
## [11] "\r\n              Русский\r\n              "     
## [12] "\r\n              Español\r\n              "

APSA complicates things by making us select a timezone:

url <- "https://convention2.allacademic.com/one/apsa/apsa20/"

read_html(url) %>% html_text()
## [1] "APSA Annual Meeting & Exhibition 2020\n<!--\nform, ul, dl, dt, dd, li, h1, h2, h3, h4, h5, h6 { margin: 0; padding: 0; }\n\n\n-->\nSet your timezoneThe online program contains content that is remote in nature.  Setting your timezone will allow for localization of the times.To change your timezone later, select \"Change Preferences\" from the \"Navigation Menu\" on the left side of the online program home page.Please select your timezone:-- Please select a timezone --[UTC+00:00] - Africa/Abidjan[UTC+00:00] - Africa/Accra[UTC+03:00] - Africa/Addis_Ababa[UTC+01:00] - Africa/Algiers[UTC+03:00] - Africa/Asmara[UTC+00:00] - Africa/Bamako[UTC+01:00] - Africa/Bangui[UTC+00:00] - Africa/Banjul[UTC+00:00] - Africa/Bissau[UTC+02:00] - Africa/Blantyre[UTC+01:00] - Africa/Brazzaville[UTC+02:00] - Africa/Bujumbura[UTC+02:00] - Africa/Cairo[UTC+01:00] - Africa/Casablanca[UTC+02:00] - Africa/Ceuta[UTC+00:00] - Africa/Conakry[UTC+00:00] - Africa/Dakar[UTC+03:00] - Africa/Dar_es_Salaam[UTC+03:00] - Africa/Djibouti[UTC+01:00] - Africa/Douala[UTC+01:00] - Africa/El_Aaiun[UTC+00:00] - Africa/Freetown[UTC+02:00] - Africa/Gaborone[UTC+02:00] - Africa/Harare[UTC+02:00] - Africa/Johannesburg[UTC+03:00] - Africa/Juba[UTC+03:00] - Africa/Kampala[UTC+02:00] - Africa/Khartoum[UTC+02:00] - Africa/Kigali[UTC+01:00] - Africa/Kinshasa[UTC+01:00] - Africa/Lagos[UTC+01:00] - Africa/Libreville[UTC+00:00] - Africa/Lome[UTC+01:00] - Africa/Luanda[UTC+02:00] - Africa/Lubumbashi[UTC+02:00] - Africa/Lusaka[UTC+01:00] - Africa/Malabo[UTC+02:00] - Africa/Maputo[UTC+02:00] - Africa/Maseru[UTC+02:00] - Africa/Mbabane[UTC+03:00] - Africa/Mogadishu[UTC+00:00] - Africa/Monrovia[UTC+03:00] - Africa/Nairobi[UTC+01:00] - Africa/Ndjamena[UTC+01:00] - Africa/Niamey[UTC+00:00] - Africa/Nouakchott[UTC+00:00] - Africa/Ouagadougou[UTC+01:00] - Africa/Porto-Novo[UTC+00:00] - Africa/Sao_Tome[UTC+02:00] - Africa/Tripoli[UTC+01:00] - Africa/Tunis[UTC+02:00] - Africa/Windhoek[UTC-09:00] - America/Adak[UTC-08:00] - America/Anchorage[UTC-04:00] - America/Anguilla[UTC-04:00] - America/Antigua[UTC-03:00] - America/Araguaina[UTC-03:00] - America/Argentina/Buenos_Aires[UTC-03:00] - America/Argentina/Catamarca[UTC-03:00] - America/Argentina/Cordoba[UTC-03:00] - America/Argentina/Jujuy[UTC-03:00] - America/Argentina/La_Rioja[UTC-03:00] - America/Argentina/Mendoza[UTC-03:00] - America/Argentina/Rio_Gallegos[UTC-03:00] - America/Argentina/Salta[UTC-03:00] - America/Argentina/San_Juan[UTC-03:00] - America/Argentina/San_Luis[UTC-03:00] - America/Argentina/Tucuman[UTC-03:00] - America/Argentina/Ushuaia[UTC-04:00] - America/Aruba[UTC-04:00] - America/Asuncion[UTC-05:00] - America/Atikokan[UTC-03:00] - America/Bahia[UTC-05:00] - America/Bahia_Banderas[UTC-04:00] - America/Barbados[UTC-03:00] - America/Belem[UTC-06:00] - America/Belize[UTC-04:00] - America/Blanc-Sablon[UTC-04:00] - America/Boa_Vista[UTC-05:00] - America/Bogota[UTC-06:00] - America/Boise[UTC-06:00] - America/Cambridge_Bay[UTC-04:00] - America/Campo_Grande[UTC-05:00] - America/Cancun[UTC-04:00] - America/Caracas[UTC-03:00] - America/Cayenne[UTC-05:00] - America/Cayman[UTC-05:00] - America/Chicago[UTC-06:00] - America/Chihuahua[UTC-06:00] - America/Costa_Rica[UTC-07:00] - America/Creston[UTC-04:00] - America/Cuiaba[UTC-04:00] - America/Curacao[UTC+00:00] - America/Danmarkshavn[UTC-07:00] - America/Dawson[UTC-07:00] - America/Dawson_Creek[UTC-06:00] - America/Denver[UTC-04:00] - America/Detroit[UTC-04:00] - America/Dominica[UTC-06:00] - America/Edmonton[UTC-05:00] - America/Eirunepe[UTC-06:00] - America/El_Salvador[UTC-07:00] - America/Fort_Nelson[UTC-03:00] - America/Fortaleza[UTC-03:00] - America/Glace_Bay[UTC-03:00] - America/Goose_Bay[UTC-04:00] - America/Grand_Turk[UTC-04:00] - America/Grenada[UTC-04:00] - America/Guadeloupe[UTC-06:00] - America/Guatemala[UTC-05:00] - America/Guayaquil[UTC-04:00] - America/Guyana[UTC-03:00] - America/Halifax[UTC-04:00] - America/Havana[UTC-07:00] - America/Hermosillo[UTC-04:00] - America/Indiana/Indianapolis[UTC-05:00] - America/Indiana/Knox[UTC-04:00] - America/Indiana/Marengo[UTC-04:00] - America/Indiana/Petersburg[UTC-05:00] - America/Indiana/Tell_City[UTC-04:00] - America/Indiana/Vevay[UTC-04:00] - America/Indiana/Vincennes[UTC-04:00] - America/Indiana/Winamac[UTC-06:00] - America/Inuvik[UTC-04:00] - America/Iqaluit[UTC-05:00] - America/Jamaica[UTC-08:00] - America/Juneau[UTC-04:00] - America/Kentucky/Louisville[UTC-04:00] - America/Kentucky/Monticello[UTC-04:00] - America/Kralendijk[UTC-04:00] - America/La_Paz[UTC-05:00] - America/Lima[UTC-07:00] - America/Los_Angeles[UTC-04:00] - America/Lower_Princes[UTC-03:00] - America/Maceio[UTC-06:00] - America/Managua[UTC-04:00] - America/Manaus[UTC-04:00] - America/Marigot[UTC-04:00] - America/Martinique[UTC-05:00] - America/Matamoros[UTC-06:00] - America/Mazatlan[UTC-05:00] - America/Menominee[UTC-05:00] - America/Merida[UTC-08:00] - America/Metlakatla[UTC-05:00] - America/Mexico_City[UTC-02:00] - America/Miquelon[UTC-03:00] - America/Moncton[UTC-05:00] - America/Monterrey[UTC-03:00] - America/Montevideo[UTC-04:00] - America/Montserrat[UTC-04:00] - America/Nassau[UTC-04:00] - America/New_York[UTC-04:00] - America/Nipigon[UTC-08:00] - America/Nome[UTC-02:00] - America/Noronha[UTC-05:00] - America/North_Dakota/Beulah[UTC-05:00] - America/North_Dakota/Center[UTC-05:00] - America/North_Dakota/New_Salem[UTC-02:00] - America/Nuuk[UTC-06:00] - America/Ojinaga[UTC-05:00] - America/Panama[UTC-04:00] - America/Pangnirtung[UTC-03:00] - America/Paramaribo[UTC-07:00] - America/Phoenix[UTC-04:00] - America/Port-au-Prince[UTC-04:00] - America/Port_of_Spain[UTC-04:00] - America/Porto_Velho[UTC-04:00] - America/Puerto_Rico[UTC-03:00] - America/Punta_Arenas[UTC-05:00] - America/Rainy_River[UTC-05:00] - America/Rankin_Inlet[UTC-03:00] - America/Recife[UTC-06:00] - America/Regina[UTC-05:00] - America/Resolute[UTC-05:00] - America/Rio_Branco[UTC-03:00] - America/Santarem[UTC-03:00] - America/Santiago[UTC-04:00] - America/Santo_Domingo[UTC-03:00] - America/Sao_Paulo[UTC+00:00] - America/Scoresbysund[UTC-08:00] - America/Sitka[UTC-04:00] - America/St_Barthelemy[UTC-02:30] - America/St_Johns[UTC-04:00] - America/St_Kitts[UTC-04:00] - America/St_Lucia[UTC-04:00] - America/St_Thomas[UTC-04:00] - America/St_Vincent[UTC-06:00] - America/Swift_Current[UTC-06:00] - America/Tegucigalpa[UTC-03:00] - America/Thule[UTC-04:00] - America/Thunder_Bay[UTC-07:00] - America/Tijuana[UTC-04:00] - America/Toronto[UTC-04:00] - America/Tortola[UTC-07:00] - America/Vancouver[UTC-07:00] - America/Whitehorse[UTC-05:00] - America/Winnipeg[UTC-08:00] - America/Yakutat[UTC-06:00] - America/Yellowknife[UTC+08:00] - Antarctica/Casey[UTC+07:00] - Antarctica/Davis[UTC+10:00] - Antarctica/DumontDUrville[UTC+11:00] - Antarctica/Macquarie[UTC+05:00] - Antarctica/Mawson[UTC+12:00] - Antarctica/McMurdo[UTC-03:00] - Antarctica/Palmer[UTC-03:00] - Antarctica/Rothera[UTC+03:00] - Antarctica/Syowa[UTC+02:00] - Antarctica/Troll[UTC+06:00] - Antarctica/Vostok[UTC+02:00] - Arctic/Longyearbyen[UTC+03:00] - Asia/Aden[UTC+06:00] - Asia/Almaty[UTC+03:00] - Asia/Amman[UTC+12:00] - Asia/Anadyr[UTC+05:00] - Asia/Aqtau[UTC+05:00] - Asia/Aqtobe[UTC+05:00] - Asia/Ashgabat[UTC+05:00] - Asia/Atyrau[UTC+03:00] - Asia/Baghdad[UTC+03:00] - Asia/Bahrain[UTC+04:00] - Asia/Baku[UTC+07:00] - Asia/Bangkok[UTC+07:00] - Asia/Barnaul[UTC+03:00] - Asia/Beirut[UTC+06:00] - Asia/Bishkek[UTC+08:00] - Asia/Brunei[UTC+09:00] - Asia/Chita[UTC+08:00] - Asia/Choibalsan[UTC+05:30] - Asia/Colombo[UTC+03:00] - Asia/Damascus[UTC+06:00] - Asia/Dhaka[UTC+09:00] - Asia/Dili[UTC+04:00] - Asia/Dubai[UTC+05:00] - Asia/Dushanbe[UTC+03:00] - Asia/Famagusta[UTC+03:00] - Asia/Gaza[UTC+03:00] - Asia/Hebron[UTC+07:00] - Asia/Ho_Chi_Minh[UTC+08:00] - Asia/Hong_Kong[UTC+07:00] - Asia/Hovd[UTC+08:00] - Asia/Irkutsk[UTC+07:00] - Asia/Jakarta[UTC+09:00] - Asia/Jayapura[UTC+03:00] - Asia/Jerusalem[UTC+04:30] - Asia/Kabul[UTC+12:00] - Asia/Kamchatka[UTC+05:00] - Asia/Karachi[UTC+05:45] - Asia/Kathmandu[UTC+09:00] - Asia/Khandyga[UTC+05:30] - Asia/Kolkata[UTC+07:00] - Asia/Krasnoyarsk[UTC+08:00] - Asia/Kuala_Lumpur[UTC+08:00] - Asia/Kuching[UTC+03:00] - Asia/Kuwait[UTC+08:00] - Asia/Macau[UTC+11:00] - Asia/Magadan[UTC+08:00] - Asia/Makassar[UTC+08:00] - Asia/Manila[UTC+04:00] - Asia/Muscat[UTC+03:00] - Asia/Nicosia[UTC+07:00] - Asia/Novokuznetsk[UTC+07:00] - Asia/Novosibirsk[UTC+06:00] - Asia/Omsk[UTC+05:00] - Asia/Oral[UTC+07:00] - Asia/Phnom_Penh[UTC+07:00] - Asia/Pontianak[UTC+09:00] - Asia/Pyongyang[UTC+03:00] - Asia/Qatar[UTC+06:00] - Asia/Qostanay[UTC+05:00] - Asia/Qyzylorda[UTC+03:00] - Asia/Riyadh[UTC+11:00] - Asia/Sakhalin[UTC+05:00] - Asia/Samarkand[UTC+09:00] - Asia/Seoul[UTC+08:00] - Asia/Shanghai[UTC+08:00] - Asia/Singapore[UTC+11:00] - Asia/Srednekolymsk[UTC+08:00] - Asia/Taipei[UTC+05:00] - Asia/Tashkent[UTC+04:00] - Asia/Tbilisi[UTC+04:30] - Asia/Tehran[UTC+06:00] - Asia/Thimphu[UTC+09:00] - Asia/Tokyo[UTC+07:00] - Asia/Tomsk[UTC+08:00] - Asia/Ulaanbaatar[UTC+06:00] - Asia/Urumqi[UTC+10:00] - Asia/Ust-Nera[UTC+07:00] - Asia/Vientiane[UTC+10:00] - Asia/Vladivostok[UTC+09:00] - Asia/Yakutsk[UTC+06:30] - Asia/Yangon[UTC+05:00] - Asia/Yekaterinburg[UTC+04:00] - Asia/Yerevan[UTC+00:00] - Atlantic/Azores[UTC-03:00] - Atlantic/Bermuda[UTC+01:00] - Atlantic/Canary[UTC-01:00] - Atlantic/Cape_Verde[UTC+01:00] - Atlantic/Faroe[UTC+01:00] - Atlantic/Madeira[UTC+00:00] - Atlantic/Reykjavik[UTC-02:00] - Atlantic/South_Georgia[UTC+00:00] - Atlantic/St_Helena[UTC-03:00] - Atlantic/Stanley[UTC+09:30] - Australia/Adelaide[UTC+10:00] - Australia/Brisbane[UTC+09:30] - Australia/Broken_Hill[UTC+10:00] - Australia/Currie[UTC+09:30] - Australia/Darwin[UTC+08:45] - Australia/Eucla[UTC+10:00] - Australia/Hobart[UTC+10:00] - Australia/Lindeman[UTC+10:30] - Australia/Lord_Howe[UTC+10:00] - Australia/Melbourne[UTC+08:00] - Australia/Perth[UTC+10:00] - Australia/Sydney[UTC+02:00] - Europe/Amsterdam[UTC+02:00] - Europe/Andorra[UTC+04:00] - Europe/Astrakhan[UTC+03:00] - Europe/Athens[UTC+02:00] - Europe/Belgrade[UTC+02:00] - Europe/Berlin[UTC+02:00] - Europe/Bratislava[UTC+02:00] - Europe/Brussels[UTC+03:00] - Europe/Bucharest[UTC+02:00] - Europe/Budapest[UTC+02:00] - Europe/Busingen[UTC+03:00] - Europe/Chisinau[UTC+02:00] - Europe/Copenhagen[UTC+01:00] - Europe/Dublin[UTC+02:00] - Europe/Gibraltar[UTC+01:00] - Europe/Guernsey[UTC+03:00] - Europe/Helsinki[UTC+01:00] - Europe/Isle_of_Man[UTC+03:00] - Europe/Istanbul[UTC+01:00] - Europe/Jersey[UTC+02:00] - Europe/Kaliningrad[UTC+03:00] - Europe/Kiev[UTC+03:00] - Europe/Kirov[UTC+01:00] - Europe/Lisbon[UTC+02:00] - Europe/Ljubljana[UTC+01:00] - Europe/London[UTC+02:00] - Europe/Luxembourg[UTC+02:00] - Europe/Madrid[UTC+02:00] - Europe/Malta[UTC+03:00] - Europe/Mariehamn[UTC+03:00] - Europe/Minsk[UTC+02:00] - Europe/Monaco[UTC+03:00] - Europe/Moscow[UTC+02:00] - Europe/Oslo[UTC+02:00] - Europe/Paris[UTC+02:00] - Europe/Podgorica[UTC+02:00] - Europe/Prague[UTC+03:00] - Europe/Riga[UTC+02:00] - Europe/Rome[UTC+04:00] - Europe/Samara[UTC+02:00] - Europe/San_Marino[UTC+02:00] - Europe/Sarajevo[UTC+04:00] - Europe/Saratov[UTC+03:00] - Europe/Simferopol[UTC+02:00] - Europe/Skopje[UTC+03:00] - Europe/Sofia[UTC+02:00] - Europe/Stockholm[UTC+03:00] - Europe/Tallinn[UTC+02:00] - Europe/Tirane[UTC+04:00] - Europe/Ulyanovsk[UTC+03:00] - Europe/Uzhgorod[UTC+02:00] - Europe/Vaduz[UTC+02:00] - Europe/Vatican[UTC+02:00] - Europe/Vienna[UTC+03:00] - Europe/Vilnius[UTC+04:00] - Europe/Volgograd[UTC+02:00] - Europe/Warsaw[UTC+02:00] - Europe/Zagreb[UTC+03:00] - Europe/Zaporozhye[UTC+02:00] - Europe/Zurich[UTC+03:00] - Indian/Antananarivo[UTC+06:00] - Indian/Chagos[UTC+07:00] - Indian/Christmas[UTC+06:30] - Indian/Cocos[UTC+03:00] - Indian/Comoro[UTC+05:00] - Indian/Kerguelen[UTC+04:00] - Indian/Mahe[UTC+05:00] - Indian/Maldives[UTC+04:00] - Indian/Mauritius[UTC+03:00] - Indian/Mayotte[UTC+04:00] - Indian/Reunion[UTC+13:00] - Pacific/Apia[UTC+12:00] - Pacific/Auckland[UTC+11:00] - Pacific/Bougainville[UTC+12:45] - Pacific/Chatham[UTC+10:00] - Pacific/Chuuk[UTC-05:00] - Pacific/Easter[UTC+11:00] - Pacific/Efate[UTC+13:00] - Pacific/Enderbury[UTC+13:00] - Pacific/Fakaofo[UTC+12:00] - Pacific/Fiji[UTC+12:00] - Pacific/Funafuti[UTC-06:00] - Pacific/Galapagos[UTC-09:00] - Pacific/Gambier[UTC+11:00] - Pacific/Guadalcanal[UTC+10:00] - Pacific/Guam[UTC-10:00] - Pacific/Honolulu[UTC+14:00] - Pacific/Kiritimati[UTC+11:00] - Pacific/Kosrae[UTC+12:00] - Pacific/Kwajalein[UTC+12:00] - Pacific/Majuro[UTC-09:30] - Pacific/Marquesas[UTC-11:00] - Pacific/Midway[UTC+12:00] - Pacific/Nauru[UTC-11:00] - Pacific/Niue[UTC+11:00] - Pacific/Norfolk[UTC+11:00] - Pacific/Noumea[UTC-11:00] - Pacific/Pago_Pago[UTC+09:00] - Pacific/Palau[UTC-08:00] - Pacific/Pitcairn[UTC+11:00] - Pacific/Pohnpei[UTC+10:00] - Pacific/Port_Moresby[UTC-10:00] - Pacific/Rarotonga[UTC+10:00] - Pacific/Saipan[UTC-10:00] - Pacific/Tahiti[UTC+12:00] - Pacific/Tarawa[UTC+13:00] - Pacific/Tongatapu[UTC+12:00] - Pacific/Wake[UTC+12:00] - Pacific/Wallis\n©2020 All Academic, Inc.   |   Privacy Policy\n      .ui-li-link-alt-left {\n        left: 0;\n        right: auto;\n      }\n      \n      .ul-li-has-alt-left {\n        padding-right: auto !important;\n        padding-left: 48px !important;\n        margin-right: 0 !important;\n      }\n      \n      ul.program_content {\n        margin-left: 40px;\n      }\n\n\n      @media all and (max-width: 50em) {\n      \t.my-breakpoint .ui-block-a, \n      \t.my-breakpoint .ui-block-b, \n      \t.my-breakpoint .ui-block-c,\n      \t.my-breakpoint .ui-block-d,\n      \t.my-breakpoint .ui-block-e { \n      \t\twidth: 100%; \n      \t\tfloat:none; \n      \t}\n      }\n\n\n\n      @media print {\n          .non-printable {\n            display: none;\n          }\n\n          #online_program1_panel_open_btn,\n          #online_program_back_btn,\n          #online_program_search_btn,\n          #online_program_home_btn {\n            display: none;\n          }\n\n          #back {\n            display: none;\n          }\n\n          .ui-btn {\n            display: none;\n          }\n      }\n      \n<!--\n\n      window.onload = function updateTimezoneSelect() {\n        // (new Date()).getTimezoneOffset()/60 will return the current number of hours offset from UTC.\n        user_timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;\n        selectObject = $(\"#new_timezone\");\n        \n        if (selectObject.val() == \"unselected\") {\n          selectObject.val(user_timezone).attr(\"selected\", true).siblings(\"option\").removeAttr(\"selected\");\n          selectObject.selectmenu(\"refresh\", true);\n          alert(\"Set to \" + user_timezone);\n        }\n     }\n    \n      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){\n      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n      })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');\n    \n      ga('create', 'UA-79637004-2', 'auto');\n      ga('send', 'pageview');\n      ga('create', 'UA-55209081-6', 'auto', 'extraTracker');\n      ga('extraTracker.send', 'pageview');\n-->\n"

So, we must set a timezone for our session. rvest has several tools that allow us to submit web forms.

mysession <- html_session(url)

timzone_form <- html_form(mysession)[[1]] %>% 
  set_values(new_timezone = "Africa/Abidjan")

submit_form(mysession, timzone_form)
## <session> https://convention2.allacademic.com/one/apsa/apsa20/
##   Status: 200
##   Type:   text/html; charset=utf-8
##   Size:   15344

After providing our newly-created browser session info, we can now navitgate to APSA’s “Created Panels” page with rvest’s follow_link function. We can then read the html and grab the linked text nodes.

html <- jump_to(mysession, url) %>% 
  follow_link("Browse By Session or Event Type") %>%
  follow_link("Created Panel") %>%
  read_html()

links <- html_nodes(html, "a") # "a" nodes are linked text

html_text(links) %>% head(20)
##  [1] "Search"                                                                                                                                                                                                       
##  [2] "Browse By Day"                                                                                                                                                                                                
##  [3] "Browse By Time"                                                                                                                                                                                               
##  [4] "Browse By Person"                                                                                                                                                                                             
##  [5] "Browse By Mini-Conference"                                                                                                                                                                                    
##  [6] "Browse By Division"                                                                                                                                                                                           
##  [7] "Browse By Session or Event Type"                                                                                                                                                                              
##  [8] "Change Preferences"                                                                                                                                                                                           
##  [9] "Sign In"                                                                                                                                                                                                      
## [10] "Search Tips"                                                                                                                                                                                                  
## [11] "Twitter"                                                                                                                                                                                                      
## [12] ""                                                                                                                                                                                                             
## [13] "Back"                                                                                                                                                                                                         
## [14] ""                                                                                                                                                                                                             
## [15] "Home"                                                                                                                                                                                                         
## [16] "2:00 to 3:30pm MDT (8:00 to 9:30pm GMT)TBAThat's Entertainment! Celebrities, Comedy, and Crime in Political CommunicationSub Unit: Division 38: Political CommunicationSession Submission Type: Created Panel"
## [17] "2:00 to 3:30pm MDT (8:00 to 9:30pm GMT)TBAPolitical Effects of Social MediaSub Unit: Division 40: Information Technology, & PoliticsSession Submission Type: Created Panel"                                   
## [18] "2:00 to 3:30pm MDT (8:00 to 9:30pm GMT)TBAAre Women Electable?Sub Unit: Division 31: Women and Politics ResearchSession Submission Type: Created Panel"                                                       
## [19] "2:00 to 3:30pm MDT (8:00 to 9:30pm GMT)TBAThe Politics and Economics of Bilateral Investment TreatiesSub Unit: Division 16: International Political EconomySession Submission Type: Created Panel"            
## [20] "2:00 to 3:30pm MDT (8:00 to 9:30pm GMT)TBAMedia & AutocracySub Unit: Division 44: Democracy and AutocracySession Submission Type: Created Panel"

html_text extracts text from HTML nodes. On this page the linked text is the title of each panel (except for the first 14 links).

To clean up the panel titles, I remove all text before “TBA” or after “Sub Unit” using the one regular expression to rule them all .*, which matches anything (.) anynumber of times (*).

html_attr extracts other HTML attributes. Linked URLs are in the “href” attribute.

Let’s put both into a tidy dataframe:

d <- tibble(title = html_text(links) %>% 
              str_remove_all(".*TBA|Sub Unit.*"),
            url = html_attr(links, "href") 
            )

# filter to rows that contain a "session_id" in their URL
d %<>% filter( str_detect(url, "session_id") )

d
## # A tibble: 578 x 2
##    title                                  url                                   
##    <chr>                                  <chr>                                 
##  1 That's Entertainment! Celebrities, Co… https://convention2.allacademic.com//…
##  2 Political Effects of Social Media      https://convention2.allacademic.com//…
##  3 Are Women Electable?                   https://convention2.allacademic.com//…
##  4 The Politics and Economics of Bilater… https://convention2.allacademic.com//…
##  5 Media & Autocracy                      https://convention2.allacademic.com//…
##  6 NGO’s in Politics: Mobilization, Ineq… https://convention2.allacademic.com//…
##  7 Group Representation                   https://convention2.allacademic.com//…
##  8 Governing AAPI People Through Immigra… https://convention2.allacademic.com//…
##  9 The Politics of Fear and Violence      https://convention2.allacademic.com//…
## 10 Populism and Discursive Governance     https://convention2.allacademic.com//…
## # … with 568 more rows

Results

Now that we have a tidy dataframe with a column of text, the world is our oyster. We could follow each URL to get more details on each panel using purrrs map_dfr like I did here, but I should get back to writing my APSA paper.

For pictionary, we just need a sample of common words. The tidytext package has a number of helpful tools for doing this. Most importantly, unnest_tokens “tokenizes” text–here breaking it up by word. filter, count, and slice from dplyr help us clean up, sample, and collapse these words into a block of text.

Tip: to get drawable words from messier text, try keeping only words in the NRC dictionary by adding inner_join(get_sentiments("nrc")) anywhere between unnesting and sampling them.

library(tidytext)

word_counts <- d %>%
  # get words from a column "title"
  unnest_tokens(word, title) %>% 
  # remove common words (such as "the")
  anti_join(stop_words) %>%
  # filter out words less than 5 letters or with apostrophes
  filter( nchar(word) > 5, !str_detect(word, "\\'") ) %>%
  # sample 500 words, weighted by their frequency
  count(word)

word_counts %>% arrange(-n)
## # A tibble: 780 x 2
##    word              n
##    <chr>         <int>
##  1 political        91
##  2 politics         73
##  3 public           34
##  4 policy           32
##  5 social           20
##  6 gender           19
##  7 international    19
##  8 conflict         18
##  9 democratic       18
## 10 democracy        16
## # … with 770 more rows
word_counts %>% 
  slice_sample(n = 500, weight_by = n) %>%
  # collapse to a block of text, separating words with commas
  .$word %>%
  str_c(collapse = ", ")
## [1] "methods, consequences, appointments, parties, global, twitter, entering, environmental, empire, thinking, ethnicity, gaining, resentment, dimensionality, undergraduate, political, regime, injustice, innovative, canadian, research, hybrid, backsliding, approaches, causal, theory, politics, quality, sexual, intervention, influence, authoritarian, freedom, nativity, online, economy, digital, networks, engagement, conflict, crisis, expedience, foundations, generalization, opinion, vulnerable, resistance, public, behavior, immigration, perceptions, security, foreign, poetry, technology, policy, building, broader, movement, unpacking, supreme, rights, understanding, classroom, diversity, responds, leadership, measures, responsibility, nationalism, citizenship, disinformation, packing, electoral, immigrant, ethnic, prejudice, elections, illiberal, civilians, environment, identity, bureaucrats, voting, constitutive, female, perspective, localities, campaign, fragility, social, rollback, biology, communication, strategic, rebels, issues, emergency, judicial, invisibilized, creative, development, divides, contracting, globalization, popular, legitimacy, strategies, perspectives, experiments, donors, ukraine, diffusion, change, tension, investment, populism, persistent, economic, inequality, discursive, alternative, speech, violence, negotiations, chinese, nativism, governance, mobilization, substantive, resilience, reconsidered, indigenous, executive, bargaining, claiming, tendencies, municipal, controversies, lending, resurgence, framing, pulpit, russia, question, models, jurisprudence, pedagogies, contemporary, physical, institutions, presidents, administration, taiwanese, capital, representatives, iberia, challenges, population, conservation, system, turnout, destabilization, participation, simulations, ethnographic, effects, democracy, redistricting, dynamics, europe, inclusion, trends, presidency, integration, elites, adversity, migration, positions, transitions, experimental, democratic, qualitative, student, american, regimes, ballot, activism, behaving, gerrymandering, complicated, insecurity, unrepresentability, comparative, refracting, eurasia, communities, societal, tactics, crises, international, follow, service, matter, multilevel, upheaval, socialist, agendas, blaming, america, alliances, insider, populist, formation, autocracy, minorities, inference, voters, toleration, climate, alliance, future, comparing, people, escalation, organized, diverse, control, coding, systems, difficult, military, current, legislative, territorial, religion, contexts, suffering, assessing, effectiveness, outsider, sovereignty, strangers, memory, loyalty, personal, constituents, bodies, information, european, leaving, directions, legislature, practices, emotions, civility, organizations, scientists, lobbying, constitutional, cities, accounting, difference, effect, constitutionalism, executives, institutional, gender, borders, transition, conversion, taiwan, psychology, experience, automation, conditions, relations, reactions, congress, intergovernmental, financial, education, movements, attitudes, backlash, deception, processing, science, conference, ambition, respond, character, structural, classrooms, progressive, seeking, sexuality, countries, transnationalism, misinformation, legislatures, precarity, imagination, visibility, campaigns, responsiveness, protests, learning, connections, collaboration, colonial, advancement, standards, delivery, modern, narratives, celebrities, promote, impacts, dissatisfaction, militaries, intersectionality, sexism, policies, nuclear, comedy, support, aspirations, combatants, protest, analysis, heterogeneity, collection, normative, studies, context, minority, motivations, labeling, sources, geography, measurement, branch, feminism, brexit, polarization, tariffs, spring, evolving, cooperation, management, communist, feminist, equality, advances, durability, segregation, accountability, forced, worlds, evidence, neoliberalism, tonight, addressing, credibility, private, financing, forecasts, intelligence, representation, brazil, performance, incivility, provocation, responses, influences, epistemological, sentiment, division, organization, teaching, policymaking, proliferation, dimensions, authoritarianism, epistography, relevance, feedback, lessons, sacralization, africa, theories, technological, welfare, markets, constitution, retreat, violent, process, objects, tradition, advantage, entertainment, evaluations, neighborhoods, actors, representativeness, coalition, peacekeeping, middle, differences, vulnerability, displaying, boundaries, health, unassimilable, terrorism, identities, patterns, judges, eastern, ontology, adults, genera, candidates, station, democratization, interactions, society, opportunities, schemes, exclusion, strategy, environments, duration, developing, governing, forces, mexico, lenses, repression, machiavelli, colonialism, origins, natural, pretty, procedures, challenge, emerging, disputes, fiction, reproduction, women’s, negotiating, communicating, indexing, signaling, housing, fooled, capitalism, disaster, experiential, deterrence, relationship, constraints, services, futures, organizing, corruption, content, confronting, liberal, history, legacy, administrative, dilemmas, shadow, response, persecution, forecasting, pedagogy, surveillance, playing, racism, rhetoric, collapse, deaths, boards, fieldwork, machiavelli’s, spatial, community, russian, vision, simulation, mobilizing, domestic, paleolithic, judging, beliefs"