VesselVio: Using Python to Develop an Open-Source Application for Vasculature Research

Briefly, vasculature datasets are loaded into the program and initially stored as numpy ndarrays. These arrays are then skeletonized, and vessel centerlines and vessel radii are extracted. The skeleton array is then converted into a network graph. This graph is then processed based on user-defined settings to remove isolated vessels and endpoint segments. The graph is then analyzed and reduced to extract whole-network and individual segment features. Finally, these graphs can be converted into 3D meshes to allow for researchers to visualize their datasets.

The frontend, backend, and packaging of VesselVio was conducted exclusively using Python. Backend image loading, annotation processing, and volume skeletonization rely on SimpleITK, nibabel, and scikit-image. Feature extraction and network processing rely on numpy, igraph, and geomdl. Multiprocessing and numpy parallelization rely on numba and the concurrent.futures module. The frontend of the appliaction was constructed using PyQt5, and dataset visualization was achieved using PyVista. Finally application packaging was achieved using PyInstaller. Application source-code and website hosting are all made located on GitHub.

Integrating React in the Django way!

How to use React with Django?

The most common solution shown by many tutorials and blogs on the Internet is to develop two isolated projects, one for the frontend and the other for the backend. All the interactions between these two projects happen through APIs. This approach is used in large projects since they generally have dedicated teams for backend and frontend. But this approach may not be suitable for individual or small-scale projects due to a variety of reasons like the cost for deployment, time consumption, and losses of features provided by Django Framework (forms).

A more Django-friendly approach would be to serve a single template (HTML) document, and then let React take over. This approach gives you the liberty to use plain HTML and Vanilla Javascript for simple pages or forms, and for highly interactive pages, one can take advantage of React. Using HTML and Vanilla Javascript for the simple pages can help reduce the bundle size of the frontend and thus help in reducing the loading time and improving the overall User Experience.

How this Package Helps you !

Package provides a command to set up a React app in both Javascript and Typescript programming languages. The resulting app would be a Django app with a webpack configuration with CSS and SCSS support. The webpack dev server would proxy the Django server in the development phase. After running the build script, the frontend code gets bundled into a single Javascript file and is available for Django to serve from an HTML template. After the setup, it is easy to install other npm packages and modify the webpack configuration as per specific requirements. Since the package is used in development only, there is no need to add it to the requirements.txt file of the production environment.

Ensuring Inclusive Language in Speech and Writings using Python

Python packages and frameworks used
1. Spacy and Indic-nlp library: For performing natural language processing on English, and Indian languages
2. SpeechRecognition: To extract text from speech
3. PyTesseract: To extract text from images
4. EasyOCR: for the Hindi language OCR
5. Streamlit: for creating a web interface

Our tool has the following use cases
1. It can check for inclusive language in presentation files (PPT, PPTX) and documents (TXT, DOC, PDF).
2. It can check for inclusive language in an audio file.
3. It can check inclusive language such as images, for example, advertisements
4. It detects exclusionary words in Indian languages (Hindi and Kannada) and suggests alternate inclusive language terms

Using Python for Disease Variant Analysis

Variant, a term once only known to the researchers of biological sciences, is now quite familiar to the general people. Rising of the new variants of SARS-Cov2 virus with novel mutations have become a topic of concern during this COVID-19 pandemic. How do the researchers identify these variants from the analysis of genomics data? How could Python be used in this analysis? This poster will address these questions.

Mutations in any organism are usually identified after performing a Next Generation Sequence analysis experiment named variant calling. Variant calling generates the output in a specialized file format called Variant Call Format (VCF) file. VCF file carries the meta data and the information of thousands of mutations and is generally large in size. Thus, it is challenging to extract information and identify mutations from this file, especially when there are hundreds of samples. The Python package scikit-allel provides utilities for exploring this large-scale genetic variation data in VCF file and helps to identify important mutations from the downstream analysis. This package depends on scipy, matplotlib, seaborn, pandas, scikit-learn, h5py and zarr. After identifying the mutations, the next step is the visualization of the mutations in a meaningful way. This task might be simpler for a small size virus like SARS-Cov2, but complicated for eukaryotic organisms with multiple chromosomes like mouse or human. Another python package QMplot is handy and useful for the visualization of thousands of mutations in each chromosome, making the interpretation of the extracted mutations easier for the biologists. This package uses numpy, scipy, pandas and matplotlib.

Don’t let your data model `Drift` away!

Due to changes in the actual world, production data might diverge or drift from the baseline data over time. When creating predictive models, which is the process of learning a model from previous data and applying it to fresh data for which we have no prior knowledge, things change throughout time, and model performance deteriorates. The model quality metric is the final criterion. It might be anything as simple as accuracy, mean error rate, or a downstream business KPI like click-through rate.

As a result, monitoring and detecting these distribution deviations from the training or historic time period is critical for monitoring the health of deployed models, ensuring that they remain relevant in production and provide fair and unbiased predictions over time; otherwise, if these drifts go undetected, predictions will be incorrect, and business decisions may have a negative impact.

Model drift may be caused by a variety of variables, the most common of which are data drift, prior probability drift and the concept drift. Because these drifts entail a statistical change in the data, a variety of statistical and model-based features, such as Kullback-Leibler divergence, Kolmogorov-Smirnov test, and others, might be used to detect them.

The poster is an attempt to better understand the topic of model drifts and how they may be monitored and evaluated in real time using a repeatable method in order to minimize future mishappenings.

Smart Document Recognizer Using Image Processing

To smartly extract the information from a document we employed a two-way approach:
1. Key Information will be extracted using a CNN model specified in the research paper CUTIE
2. The structured information in form of tables would be detected by the TableNet model

After preprocessing a document for OCR input, we use the CUTIE model to draw information in the form of a Key-Value pair. The DL model applies CNN on gridded texts where texts are also added as features. It takes a semantic and positional distribution of texts into account for learning different formats of documents. It provides robust performance to detect information in key-value format on a smaller dataset.

For detecting and extracting structured information, we replicated the TableNet model from scratch. They used a DL object detection algorithm for detecting tables and then subsequently extracting information from rows and columns detected. The proposed model is an encoder/decoder network. It uses a pre-trained VGG-19 layer and two decoder branches for pixel-wise detection of columns and tables. The single encoding layer would be better for finding active regions based on both column and table features.

To finally bring it all together, digitizing the documents into a spreadsheet or database is the next step. For serving the models, we used TensorFlow Serving with communications via gRPC protocol. Since ML tasks are heavy computationally, we used FastAPI with built-in support for async endpoints.

For ease of integration with new documents, we have built a UI tool for swiftly preparing the dataset instead of doing it manually. While storing the extracted information in a desirable format and using the in-place docker setup, one can spin up the whole system.

Development of a Novel Computer-Aided COVID-19 Diagnosis System Using Python

Diagnostics have proven to be a crucial step in fighting the COVID-19 pandemic. Chest X-ray (CXR) is a time and cost-efficient way that may potentially diagnose COVID-19. Unfortunately, CXR is not considered as a first-line option for diagnosing COVID-19 due to its low accuracy and confounding with other similar Pneumonia cases. However, recent advances in deep learning powered by state-of-the-art Python packages may help overcome this issue.

In this poster, we propose an integrated pipeline to diagnose COVID-19 from CXRs consisting of four specialized modules. First, using a View Classifier Module, the system separates an input CXR into one of the two different view positions – posterior-anterior (PA) and anterior-posterior (AP). As these two view positions have different visual appearances, this classification mimics the evaluation process of a human radiologist before proceeding to the diagnosis decision. Then, a Lung Segmentation Module takes the input CXR and applies our developed encoder-decoder model to annotate lungs areas where we get the valid opacities for COVID-19. After that, a COVID-19 Classification Module classifies and localizes the affected regions in the segmented lung image using a CNN model and Class Activation Mapping. Finally, in a Report Generation Module, considering positions of the opacities, area of affected localities, and the confidence score as the inputs, it generates a final medical report using fuzzy inference.

This system is designed using different python packages including - Tensorflow/Keras for deep learning, scikit-learn for machine learning, OpenCV for image processing, SciPy for signal processing, Deap for genetic algorithm, and SkFuzzy for fuzzy inference. We believe this poster will exhibit the effective utilization of Python open-source packages in developing a robust medical diagnosis tool and allow us to connect with similarly interested Python practitioners.

Winning in the Casino by solving the Multi-Armed Bandit Problem

Imagine that you are in a casino, looking at several slot machines. Some machines distribute big payouts, others smaller ones. How do you choose which machines to play on in order to maximize your winnings? Multi-armed-bandit (MAB) is a classic reinforcement learning setup for modelling and solving problem like this. Algorithms designed to tackle this problem have been used in various real world problems such as click-ad optimization, A/B testing, webpage ranking, recommendation system, dialogue systems, anomaly detection, portfolio design etc. With the recent proliferation of data science and machine learning applications, MAB is more relevant than ever, especially for the aspiring data scientists and machine learning practitioners. In this poster we will show how to implement and run experimental analysis of Thompson Sampling and Upper Confidence Bound (UCB) algorithm from scratch using Python - two of the most successful algorithms to tackle this problem. For our implementation and experiments, we use numpy, pandas and PyPlot.

Through this poster we hope the PyCon community will:

  1. Get acquainted with the MAB problem and how solution to this problem is used in many real world data science and machine learning applications.
  2. Learn how to implement algorithms to solve MAB problem and perform comparative analysis in Python.

Privacy-preserving data sharing in clinical research using Python for a human centred design approach

Sharing clinical databases is a complex process that often entails legal contracts that eventually limit the scope of harnessing emerging technologies such as machine learning. The potential of repurposing datasets for multiple research projects, funders, and other relevant stakeholders is on high demand. The use of differential privacy has been a popular approach in the industry to address these challenges. However, privacy-preserving data sharing is yet to become standard in the clinical workflow. Clinical databases require domain knowledge and clinical understanding to be pre-processed so that datasets are machine-readable for machine learning. This research explores how a human centred design approach, and other tools based on Python, can help clinicians and digital health developers identify and create a workflow that are compatible for machine learning and preserves the privacy of the patients.

Improving Natural Language Understanding in Online Discourse using Graph Machine Learning

Graph Machine Learning is a new branch of machine learning that deals with Graph data. This poster discusses how the recent advances in GraphML can be applied to the natural language text and how it has the potential to improve the natural language understanding of the online discourse. Online discussions on social media are in the form of tree or graph structures. Graph Machine Learning techniques are ideal to infer these tree or graph structures and derive contextual information from the online discussions.

The poster also discusses how the recent Python packages such as Deep Graph Library can be used to implement these concepts in Natural Language Processing. Various graph deep learning models such as Graph Convolutional Networks, Graph Attention Networks, etc. can be used in this type of task. Improvements in natural language understanding have the potential to detect derogatory arguments, hate, and abusive speech in online discourse and can help in shaping discussions in the right direction. It can also help in automatic moderation of discussions online. In a nutshell, this poster aims at covering the basics of GraphML, how it can be leveraged to improve the natural language understanding, and its applications.

Using Python Libraries to Predict, Optimize, and Provide End Users Decision Confidence

Our work demonstrates a code design that integrates a trained predictive model using the python library PyCaret into an optimization model using Pyomo. Using a publicly available dataset we predict window breakage from manufacturing process settings. Then via the Pyomo AML optimize what the process settings should be to minimize breakage rate. What makes this case study useful to the audience is that decision-makers such as the manufacturing technician that must set the process settings often want to know how an outcome (e.g., window breakage) might change if the “optimal” settings are not used. Our Python code design shows how to efficiently integrate the predicted model into the optimized model and then show an outcome distribution based on how the user might want to change an input parameter. The Data Scientist to the Developer would find our example as a great use case to extend to their problem.

Multi-Agent Learning and Surviving the Zombie Apocalypse

In most scenarios, zombies are not intelligent creatures. While their sole focus is on the destruction of humanity, let’s be honest – they don’t typically employ the best strategies to meet this goal. If granted intelligence, imagine how zombies and humans might adapt to each other’s survival strategies!

This poster presentation will enable audience members to utilize Python, Pygame, and the NEAT-Python package for multi-agent learning, neural networks, and deep learning applications. The presentation will detail the process and outcomes of a multi-agent learning simulation between zombies, soldiers, and civilians, all with different win incentives. Actor decisions are determined by custom, Python-language neural networks, while the machine learning environment utilizes the graphics libraries in Pygame.

In the multi-agent simulation, the zombie win goal is to infect the entire human population. The soldiers’ objective is to protect civilians by identifying and eliminating targets, and civilians are merely trying to stay human. Agent decisions are determined every 20 milliseconds and include: where to move in the playing field, which agent(s) to target, which agent(s) are threats, etc.

This poster presentation will detail the development and results of this simulation, show how agent strategies evolved through training iterations, and explore how each team’s learning impacted other teams.

The results of the simulation have implications for multi-agent training and show how simultaneous learning impacts the decision making of autonomous actors. While demonstrated here in a fun, post-apocalyptic scenario, multi-agent learning is a research and development priority across government, academia, and industry, with direct application impacts to autonomous commercial vehicles, drone swarms, autonomous warfare, financial market prediction algorithms, and more.

Audience members should have a basic understanding of Python, and a desire to use Python for innovation in multi-agent learning.