DataComPy is a widely used validation tool with over 1.1 million monthly installations, written to quickly and comprehensively compare a pair of dataframes. Designed to help make differences between schemas and data clear and explicit, DataComPy generates a comprehensive report at the completion of all comparisons, providing useful metrics such as match percentages, maximal differences, and sample mismatches between comparable columns. In addition, it supports various forms of input data types for both native (Pandas, Polars, Spark, Snowpark) and indirect (Dask, Ray) comparisons. It also supports comparison of certain types of database tables, including Snowflake and DuckDB. Over the course of the presentation, we'll be delving further into how DataComPy works, what features it provides, and what types of practical use cases it is designed to help address.
Sponsor Presentations
DataComPy - Dataframe Comparisons made Explicit (Sponsor: Capital One)
Thursday, May 15th, 2025 1:30 p.m.–2:30 p.m. in Room 316