Bird's-eye view of data in Python
Surveying biodiversity using automated acoustic recorders and machine learning models is being increasingly employed by ecologists and conservationists. This popularity is driven by the availability of inexpensive recording hardware and acoustic recognition software that scales to large volumes of collected data. The amount of data collected through automated recorders is a positive factor for developing machine learning models, but there are two issues/constraints that come along with the same.
One, the hours and hours of raw audio clips gathered require data management at scale. Storage requirements can be met by setting-up on-premise data servers and employing cloud services, yet an end-to-end data governance workflow is required to track the entire process from field deployment to aggregated datasets. Here, we present a Python data management solution utilizing Pandas along with a metadata database in PostgreSQL that helps efficiently collect, process (massage, transform) and store the audio data.
Second, the machine learning models are generally trained on pristine recordings or training/validation datasets prepared from a subset of specific recording/field conditions that may or may not generalize well on the entire raw dataset. In order to confirm the true performance of model predictions and verify the presence/false positives, knowledge of species sounds and a good ear for the same is necessary. To ease the process of evaluating model predictions, we developed a top-down listening process in a Jupyter notebook that lets the annotators view and listen to the clips in an interactive environment based on the sampling criteria and label them accordingly.
Python for Biometric Fingerprint Generation Using Generative Adversarial Networks (GAN)
The primary challenge for a cybersecurity system is knowing what type of attack to expect. This can help to strengthen the process of setting up precautionary systems for mitigating attacks. For machine learning based security systems, the knowledge of the possible attacks helps researchers to create robust training data for the model. The training dataset must contain features that that have been sampled from a wide array of attack scenarios. However, there is a no guarantee of finding large training datasets over a specific problem domain and the process of creating datasets is a manual task involving a lot of time and effort.
The GAN provides a means of creating realistic samples that could serve as large datasets for cybersecurity studies. Using the GAN, one can train and sample an unlimited number of samples from various problem domains across different security attack types. Studies have shown that machine learning based security system are able to accurately detect novel adversarial attacks after training on GAN generated samples.
The research goal here is to use the GAN to learn the encoding and embedding space. The GAN is then used to generate encoded samples. In order to generate synthetic fingerprint data, our GAN model makes use of two deep convolutional neural networks, the first neural network is the Discriminator D and it is a conventional convolutional neural network with the sigmoid function as the output node. The second neural network, is the generator, responsible for the actual generation of the synthetic fingerprints. The generator G is a deep convolutional neural network that makes use of the convolution operation. The two networks play the minimax game against each other in a bid to maximize a payoff. The source code for the GAN is written in Python using the Pytorch deep learning.
Early Learning and Python: A look into learning Python in K-12 education
Technology has rapidly become a part of everyday life and along with it, coding is becoming a more accessible skill to learn at younger ages. Everyone has a different approach to learning and age can play a role in what approach is used. For example, kids are more susceptible at younger ages and their minds are like sponges that allow them to intake more information about python in a quicker succession. Python is often regarded as one of the best languages to introduce beginners into the world of programming. However, not everyone is introduced to python at the same age. In this poster session, we aim to look at different teaching methods used at different grade levels in K-12 education and analyze the methods used.
PythonMatics: Simplifying Mathematics Learning with Python for non-science students
Pure mathematics has always been perceived to be hard by students all over and efforts have been made to simplify the subjects through a variety of teaching methods. Computing is all about problem-solving and has been shown to demonstrate support for teaching different subjects. In this proposal, I shall showcase a python-based teaching tool to improve the learning of mathematics for science and non-science students. The solution is a workbook on Introductory Mathematics with Python and the approach is aimed at offering a new and appealing approach to the learning of mathematics for both science and non-science students. The work also addresses how to increase students’ engagement, participation, and understanding, particularly when lessons are delivered online and a substantial part of the project is devoted developing study materials for an introductory course in mathematics for computer science and other fields and using open-source software. The materials are organized as a set of Jupyter notebooks hosted on an open GitHub repository and the notebooks deal with the fundamental concepts of pure mathematics from set theory to Algebra and Real Analysis, and up to Number Theory with their applications to everyday life, offering examples of what can be done with a few lines of Python code. In the notebooks I proposed activities to observe phenomena, describe problems, experiment, acquire and analyze data, and model the behavior of systems. The material will help undergraduates, students in high-school, and everyone else struggling with pure mathematics ass it contains \materials for lectures, guided laboratory sessions and other necessary materials to prepare a ready to use application available to all levels.
From Dataset to Features: A Python-Based Evolutionary Approach
In multilabel classification, each instance is assigned to a group of labels. Due to its expanding use in applications across domain, multi-label classification has gained prominence in recent years. Feature selection is the most common and important preprocessing step in all machine learning and data mining tasks. Removing highly correlated, irrelevant and noisy features increases the performance of an algorithm and reduces the computational time. Recently, a Black Hole metaheuristic algorithm, inspired by the phenomenon of Black Holes, has been developed.
In this poster, we present a modified standalone Black Hole algorithm by hybridizing the standard Black Hole algorithm it with two Genetic algorithm operators, namely Crossover and Mutation operators. The synergistic combination of Black Hole and Genetic Algorithms can be used to solve multi-label classification problems in different domains.
Visualisation of Wind Tunnel Data Obtained at Seven Times the Speed of Sound
Traditionally, large datasets from wind tunnel experiments have been processed and visualised using MATLAB. However, there is a need of an open-source alternative to make such tools accessible to a wider community, particularly to young scientists and engineers. In this poster, large experimental datasets obtained in a wind tunnel at a flow speed of Mach 7, i.e. seven times the speed of sound, are analysed and visualised using Python. The steps of data processing, analysis, and visualisation are demonstrated and the results compared to identical results obtained from MATLAB. The strengths and weaknesses of Python and its libraries are discussed with regards to data reduction, signal processing, ease of coding, and quality of visualisation options. The wind tunnel experiments involve the use of Pressure-sensitive paint (PSP) in conjunction with a high-speed camera, a technique that allows for the determination of oxygen partial pressure on a surface in the flow. A relative concentration, ranging from 0% to 100%, is constructed in a pixel-by-pixel fashion from the high-speed videos. The video data are treated with a stabilisation algorithm to remove the jitter that stems from the tunnel’s movement during the experiments. The data are then sent through several signal processing loops to reduce the noise. The frames are time-averaged over approximately 30 ms and the final quantity of relative concentration of the gas, e.g nitrogen or helium, is obtained and visualised in static plots and videos. The aim of this talk is to demonstrate that Python and its libraries are capable of producing scientific quality visualisation of aerodynamics or wind tunnel data and persuade researchers, teachers, and students in this field to utilise this open-source resource.
GraphBLAS: Sparse Linear Algebra Gone Wild!
GraphBLAS is a new foundational library for sparse data and graph analytics. It fills the role of a more flexible
scipy.sparse for sparse arrays
that can be used to build faster NetworkX-like algorithms for network analysis. If you have sparse data or graph workloads that you want to scale and make faster,
then this is for you.
GraphBLAS is designed to express graph algorithms in the language of linear algebra, and the primary implementation achieves state-of-the-art performance on many benchmarks. Because of its performance and expressiveness, it can be useful wherever data is sparse.
Come learn what makes GraphBLAS special and how to use it effectively. We will cover the mind-bending mathematical underpinnings and the unconventional Python syntax used by our Python library
python-graphblas along with many examples.
Exploring Single-Cell RNAseq Data Using Python
Today it is possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing. Single-cell RNA sequencing (scRNA-seq) has significantly advanced our knowledge of biological systems. The great potential of this technology has motivated computational biologists to develop a range of analysis tools. However, up until recently most analysis tools were developed in programming language R.
Recently, a flourishing body of python based computational tools have made it easier to robustly analyze single-cell -omics datasets in a scalable and reproducible way. Here we will dive into conducting an analysis of a single-cell RNA-sequencing dataset with Scanpy and scvi-tools, two popular Python libraries for general purpose analysis tasks.
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells. scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling of single-cell omics data, built on top of PyTorch and AnnData. The package hosts implementations of several models that perform a wide range of single-cell data analysis tasks, as well as the building blocks to rapidly prototype new probabilistic models.
The goal of this poster is to help PyCon attendees from all backgrounds feel empowered to explore scRNA-seq data. Specifically, we hope attendees leave with the ability to:
- Understand a general workflow for dealing with scRNA-seq data
- Anticipate and avoid some of the most common pitfalls in scRNA-seq analysis
- Build intuition around the tradeoffs inherent in analytical choices
- Feel comfortable and confident working with current Python-based tools for single cell analysis
- Know where to find additional information and assistance
Syne - A Python Based Alzheimer's Detection Assistant
Today, in the USA alone, there are over 5,800,000 citizens over the age of 65 living with Alzheimer's. On an annual basis, Alzheimer's costs roughly $305 Billion dollars, and by 2050, estimates project this figure to increase to as much as $1.1 Trillion per year. Most patients with Alzheimer's today are diagnosed at the mild dementia stage, only after they have already begun to experience significant memory and thinking issues. However, if the aggregate amount of all Americans alive today who will develop Alzheimer's were to be diagnosed earlier, when they have a mild cognitive impairment, it would save the US $7.9 trillion. Although there is no cure for Alzheimer's, early diagnosis for Alzheimer's results in many benefits for the healthcare system, patients, and their families. In addition to the cost savings for patients and the government, early diagnosis enables patients to access treatment options earlier, allowing them to have a greater chance of benefiting from new treatments, and the possibility of enrolling in clinical trials for new therapies. Additionally, with a diagnosis, patients can choose to adjust their lifestyle habits to slow cognitive decline and maximize the time they spend with their friends and family. Thus, I decided that there needed to be a solution for aging individuals (those most susceptible to dementia) to enable them to have their cognitive health screened & monitored in an innovative fashion.
Syne is a screening and data processing platform for cognitive impairment monitoring. Syne helps HCP's screen for changes in cognitive impairment as aging patients routinely get tested over time. The creation of Syne involves both hardware and software components which involved the usage of an OpenBCI device, Google Collab, and WordPress to create a more comprehensive screening and data analysis platform.
Webifying UncertainSCI Using Trame
UncertainSCI non-invasively computes forward model uncertainty propagated from input variability. It uses weighted approximate Fekete points (WAFP) to parsimoniously sample input parameter space, and polynomial chaos emulators (PCE) to accurately estimate model variation of complex computer simulations. UncertainSCI has been developed for use in 3D bioelectric field simulation to work with 3D biophysical models, balancing accurate multivariate analysis with computation cost.
In this presentation, we will describe UncertainSCI, how we used trame to build web apps and web services, and how users can use UncertainSCI to perform UQ on their modeling pipelines.
Determining the Fairness of School Zones in Oklahoma City
The purpose of this project is to research the fairness of OKC public school zones. The aim of my research is to determine if OKCPS zones are drawn fair in relation to race, household income, and population distribution. In the fall of 2019, OKCPS consolidated and reconfigured schools as a part of their “Pathway to Greatness” (PTG) program. After the schools were consolidated, some community members became upset. Teachers and parents have expressed concerns over the negative effects of PTG. Examples include late buses to school and overcrowded classrooms. An issue of equity regarding fair school zones is present in the OKCPS. Metric Geometry and Gerrymandering Group (MGGG) has developed a Python software known as GerryChain, which is used to detect instances of gerrymandering in districts. The group has used their techniques to find instances of Gerrymandering in Virginia. Python packages such as GeoPandas, and MAUP were used to create a Geographical Information System dataset for all the school zones in OKC. Mathematical constraints for school capacity and school allocation were developed and added to GerryChain. Python and QGIS assisted in producing and visualizing preliminary results of GerryChain on OKCPS zones. The methods developed will help analyze if the current OKCPS zones are fair. Final goal of the research is to develop code for generating new and better proposals of school zone partitions. My research will help ensure K-12 students in OKC schools are not being put at a disadvantage. The techniques developed will help judge the fairness of the redrawing of school zones in response to school consolidation and reconfiguration. The research will also provide a basis for other school districts having to grapple with consolidation.
Development of a Novel Non-invasive Smartphone-Based Blood Components Estimation Technique Using Python
Blood components such as hemoglobin and glucose measurements are essential for monitoring one's health condition. Abnormal hemoglobin and glucose levels can result in severe diseases such as anemia and diabetes. Currently, blood components are measured invasively, which is painful and uncomfortable for patients. Non-invasive technology can overcome the foregoing shortcomings and is popular in smart healthcare. Recently, smartphones have integrated built-in sensors for developing point-of-care health tools using photoplethysmogram (PPG) signals. However, improvements in deep learning powered by state-of-the-art Python packages along with smartphone may assist in resolving this issue.
This poster proposes an integrated pipeline to estimate hemoglobin and glucose levels from smartphone PPG signals extracted from fingertip videos, consisting of five specialized modules. Firstly, using Frame Extraction Module, the system records 10-second fingertip video using smartphone's camera. Therefore, it extracts 300 frames from 10-second fingertip video. Then, PPG Signal Module takes the input series of frames and applies our developed PPG signal generation algorithm to identify the region of interest and calculate the PPG value for each frame. Then, PPG signal is generated from the RED channel and applied Butterworth bandpass filter to reduce motion artifacts. After that, PPG Features Module extracts characteristic features from the PPG signal, its derivative, and Fourier-transformed signals. Furthermore, Estimation Module measures blood components from extracted features using deep neural models. Finally, a Result Presenting Module, it sends results to the end-user using a smartphone-based application.
This system is designed using different python packages, including – Tensorflow/Keras for deep learning, Scikit-learn for machine learning, OpenCV for image processing, Scipy for signal processing, Minepy for MIC feature selection, Chaquopy for integration with Android Studio. We believe this poster will demonstrate the effective utilization of Python open-source practitioners.
Simulating Cricket Match in Python
Cricket, with over 2 billion fanbases, is the second most popular sport in the world after soccer. Unfortunately, like other sporting events, cricket matches were heavily affected during the Covid-19 pandemic. This simulation project, built in Python, tries to replicate real cricket matches virtually.
In cricket, a team has eleven players. Each ball or delivery in cricket is an event, and the cumulative results of such an event (runs or wickets) decide the winner. This project picks two teams and gathers statistics or career profiles for each player using the Espncricinfo API. The result for each ball or delivery is simulated based on the player's career statistics. The live score is streamed based on the result of each ball, and the winner is decided based on the cumulative runs.
aiomonitor-ng: Improving debuggability of complex asyncio applications
The key of debugging is observability and reproducibility. Despite a series of the asyncio stdlib improvements for the last few years, it is still challenging to see what’s happening in complex real-world asyncio applications. Particularly, when multiple asyncio libraries and your codes are composed together, it is hard to track down silently swallowed cancellations and resource-hogging floods of tasks triggered by internals of 3rd-party callbacks. Moreoever, such misbehaviors are often observed only in production environments where the app faces the actual workloads and I/O patterns, making it even harder to reproduce.
In this talk, I present an improved version of aiomonitor, called aiomonitor-ng (next generation). The original aiomonitor provides a live access to a running asyncio process using a telnet socket and a basic REPL to inspect the list of tasks and their current stacks. After getting several times of assistance in production debugging with it, I have added more features to help tracking the above issues of asyncio apps running in production: task creation tracker and termination tracker. These trackers keeps the stack traces whenever a new task is created or terminated, and provides a holistic view of chained stack traces when the tasks are nested with arbitrary depths.
aiomonitor-ng also demonstrates a rich async TUI (terminal UI) based on prompt toolkit and Click, with auto-completion of commands and arguments, far enhancing the original version’s simple REPL.
With the improved aiomonitor-ng, I could successfully debug several production bugs. I hope this talk would help our fellow asyncio developers to make more complex yet stable applications at scale.
Introduction of sphinx-new-tab-link
I created a Sphinx extension
It enables to build HTML in which your browsers open external links in new tabs.
You can use
sphinx-new-tab-link with easy configuration.
sphinx-new-tab-link to Sphinx users and library developers with poster and demo.
It can be used with
I hope as many library developers as possible will know this library and use it to make their documentation more useful.
Unleashing the potential of Quantum Machine Learning (QML) using Python
This poster talks about the use of Python for Quantum Machine Learning (QML) research. The poster presents the work on using Quantum Algorithms (QAs) to learn image classification. Since QAs are not compatible with image data (unstructured data), the images are to be compressed into lower dimensional feature space using an encoder. The Variational Auto-Encoder model, a state-of-art Deep Learning architecture to learn to encode images into latent features by learning to reconstruct images from lower dimensional representations, is employed to encode images to feature vectors. The feature vectors are used to train QSVM. The final accuracy achieved by the model is 76.4%.
Topic Models of Jihadist Dark Web Forums: Text Analysis in Python
In this project, I apply natural language processing, specifically Latent Dirichlet Allocation (LDA) topic models using Python (e.g nltk, spacy, sklearn libraries), to uncover underlying topics and sentiments from three Dark Web forums of Jihadist terrorist groups and their supporters. The forums are dedicated to discussions on Islam and the Islamic world. Some forum members sympathize with and support terrorist organizations. Findings indicate that religion was the most prevalent topic in all forums. Forum members also discussed terrorism and terrorist attacks and support for the Mujahideen fighters. A few of the discussions were related to relationships and marriages, health, food, selling electronics, and fake identity cards. This project highlights the importance of finding topics and keywords from larger corpora using Python packages that can aid in real-time classification and removal of online terrorist content. Results support the importance of monitoring all Dark Web forums, including religious forums, for recruitment and radicalization content as Jihadists use religion to justify their goals and recruit in such forums.
Building Machine Learning Microservices & MLOps using Union ML
The difficulty of transitioning from research to production is a prevalent issue in the machine learning development life cycle. An ML team may need to modularize and rework their code to work more effectively in production. Occasionally, depending on whether the application requires offline, online, or streaming predictions, this can necessitate re-implementing and maintaining feature engineering or model prediction logic in several locations.
The audience will learn about an open-source microframework for creating machine learning applications in this session. UnionML, developed by the Flyte team, offers a straightforward, user-friendly interface for specifying the fundamental components of your machine learning application, from dataset curation and sampling to model training and prediction. UnionML automatically generates the procedures required to fine-tune your models and release them to production in various prediction use cases, such as offline, online, or streaming settings using these building blocks. There will be a live demonstration by taking an end-to-end machine learning-based example written in Python.
We can look to the web for ideas while we consider a solution to this issue. For instance, the HTTP protocol, which provides a backbone of techniques with clearly defined but flexible interfaces, standardizes the way we move data across the internet. We were interested in posing the question, "What if we could develop, automate, and monitor data and ML pipelines at scale?" as machine learning systems proliferate across industries. https://github.com/unionai-oss/unionml
Sim2Real Transfer for Robots using Object Detectors and Python Libraries
High-fidelity simulations for drones and other aerial vehicles may appear incredibly lifelike, but it is difficult to learn control procedures in a simulator and then apply them in the actual world. One explanation is that actual photos, particularly on low-power drones, provide output that differs from that produced by simulated images, ignoring for the time being the fact that, at the level relevant to machine learning, simulated worlds itself seem somewhat different from real ones. We concentrate on employing object detectors that typically translate well from simulation to the real world in order to get over this constraint, and we extract characteristics from items that are identified in order to feed them into reinforcement learning algorithms.
SSS: Building a Seven Segment Sign
Using a Raspberry Pi, 1152 seven-segment displays, and Python, we built a Seven Segment Sign (SSS). The project is designed to make something familiar and boring (seven-segment displays) into something remarkable. For this project to work, we had to have a tight integration of software and hardware. We designed a modular circuit board that combines hundreds of displays into one cohesive display. We created a Python library to abstract all of the hardware details of the sign to make it easy for new Python developers to write applications for the SSS. We built a SSS simulator so people can develop applications without access to the hardware. We have created multiple games and demos to show what the SSS can do. And, of course, we even ported Doom. We make extensive use of Python generators to design our system. We make our system modular so that it can receive input from multiple sources, like a game controller or the network using a smartphone. We open source our design (both hardware and software) and provide detailed documentation and tutorials.
Don't install python: teaching programing, sql, web ui and pygame completely in your browser (PyZombis)
PyZombis is an introductory Programing Course using Python, that covers the basic concepts and more advanced topics like Databases, User Interfaces and and Games.
Lectures are complemented with online activities like code visualization and interactive exercises. Chapters have challenges including a hangman and zombie chaser game.
Everything can be run in a browser (even offline), without needing to install Python locally or server-side!
Motivations: * Produce simpler Open Educational Resources that can be easily adapted for teachers/students with diverse needs * Zero-Footprint: avoid server operation costs and maintenance burden (ideal for schools without infrastructure nor good internet connection) * Universal static website: No installation required (learn python in cell phones, tablets, etc.)
This project is an adaptation of a successful Massive Open Online Course (MOOC) that had +120.000 enrolled students and a great termination rate: "Python para Zumbis" from Fernando Masanori
Thanks to the Python Software Foundation, it has participated in Google Summer of Code 2019, 2021 and 2022 (under the Python Argentina Sub-Org). Several collaborators have contributed code from other countries: Venezuela, Mexico, Colombia and India.
Python is one of the most popular programming languages worldwide. It is popular with new programmers because it is open source and has beginner friendly language. However, no programming language is immune to vulnerabilities. Careless programming can lead to multiple insecurities down the road when the program is being run. These vulnerabilities can lead to attacks, bugs, and loss of information.
There are multiple types of insecurities that can appear in a Python program if the programmer has not properly taken the steps to secure it. These include insertion flaws, directory flaws, outdated dependencies, and many more. These flaws often result from programmers who do not take the extra step to secure their code.
Our poster will educate beginner Python programmers on security in Python Programming. It will show them multiple types of vulnerabilities common within the Python programming language and how to avoid such mistakes. Finally, it will show them how to sanitize and update their programs. As well as other ways to secure a Python program.
The creative art of algorithmic embroidery
The poster will contain examples that turn straightforward commands into elaborate and intricate artworks with loops, randomness and recursive functions using only the built-in turtle library in Python. We will also show how you can turn your art into embroidery patterns that are readable by an embroidery machine using the TurtleThread library and how you can use Python to create decorative ornaments for your Christmas tree. This poster is for anyone interested in the intersection between Python programming, creative coding and arts and crafts!