The Biostatistics Computing Club provides a platform for staff and faculty at Weill Cornell Medicine (WCM) to learn and advance their computational skills, specifically in biostatistical analyses. The club is currently led by Anjile An, research biostatistician III, and Will Simmons, research biostatistician II, in the Department of Population Health Sciences.
The club was founded in 2018 by biostatisticians Dr. Jihui Lee and Elizabeth Mauer, as a means of sharing methodology that could be useful across the division. The club hosts meetings approximately once a month, where presenters speak on topics including packages and functions in R and Python, application programming interface basics, and more. Club meetings are currently held both in-person and virtually and are attended by 25 to 30 people regularly. All resources, including presentation slides and code, are available on the club website.
“Everyone is welcome to join or present,” An explained. The club is open to those interested in the resources provided and aims to foster intra-divisional knowledge sharing and inter-institutional collaboration. In 2021, Daniel D. Sjoberg, lead data scientist at Memorial Sloan Kettering Cancer Center, presented on “gtsummary,” an R package he wrote for creating analytical and summary tables. Attendees include students in the MS in Biostatistics and Data Science program and professionals working in health informatics and information technology.
“Presentations about programming workflow and tools in R have helped me hone my skills and improve my coding process,” said Rachel Heise, research biostatistician I. “One presentation about creating swimmer plots was really helpful to me; I was able to quickly adapt the code to create swimmer plots to present outcomes in sarcoma clinical trials.”
"The most memorable talks are overviews of presentations given at conferences, such as the 2023 New York R Conference that were summarized by club members with real world examples,” said Edward Gemson, project coordinator in the Division of Epidemiology. “Members have given talks on how to best clean, run, and organize code that I’ve found extremely helpful and use every day in my work.”
Given the independent nature of the work that statisticians may do, the club allows them to share their projects and provide guidance. Participants are encouraged to submit presentation topics pertinent to their current work, allowing others to learn those skills and workflows.
“The club has typically focused on R programming, but there have been other presentations touching on SAS, SQL, Python, and just general data science methods,” said Brady Rippon, research biostatistician III. “Given the nature of the contracted work in the division, we don't always get a chance to directly collaborate with each other as statisticians. The computing club allows for that collaborative environment.”
At the most recent meeting, Simmons presented methods for handling time series data, or data with a time component. Examples include global rainfall by month or electricity usage by day of week. Among these is “fable,” an R package that can conduct analyses on multiple types of time series models, allowing researchers to compare variables that may influence each other, such as interest rates and inflation. Simmons has used the package in his own work tracking pediatric mental health emergency department visits during the COVID-19 pandemic.
“We’re trying to present knowledge without catering to a specific audience,” said An. “It’s a great way to learn from your peers. Even if I don’t immediately use a resource [being presented] in my own work, I know who to talk to if I ever need to learn more about it.”
- Highlights