A Framework for Exploratory Data Analysis with Python

Tony Ojeda, Sasan Bahadaran

Exploratory data analysis (EDA) is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with. Yet, analysts and developers often resort to grasping at straws when it comes to their process for exploring their data and trying to find insights. Having witnessed the lack of structure in conventional approaches, I decided to document my own process and come up with a formal framework for data exploration with Python. This poster features both the resulting framework and the Python tools and libraries one can use with it.

Coconut: A Novel Language for Functional Programming in Python

Evan Hubinger

Coconut ( is a functional programming language that compiles to Python. All valid Python 3 is valid Coconut, and Coconut compiles to universal, version-independent Python—thus, using Coconut will only extend and enhance what you're already capable of in Python to include simple, elegant, Pythonic functional programming. * Coconut improves the code it compiles, optimizing away tail calls and producing universal code that runs the same on any Python version, making the 2/3 split a thing of the past. * Coconut extends Python syntax to include new constructs for pattern matching, algebraic data types, partial application, lazy lists, and much more, making functional programming in Python beautiful and elegant in a truly Pythonic style. Functional programming in Python is often difficult—some things work but are ugly (lambda, functools.partial) and many things don't work at all (tail call optimization, pattern matching, algebraic data types). Coconut solves this problem, doing to functional programming what Python did for imperative programming, and bringing the powerful tools of modern functional programming to Python programmers everywhere.

Desmod – A Python based modeling environment

Pete Grayson

Discrete event simulation is a powerful tool for modeling highly dynamic systems. Python is a great language for rapid development. The open source Desmod package, based on the excellent SimPy package, provides a complete environment for modeling and simulating highly dynamic systems. Desmod aims to rival commercial modeling environments based on SystemC on features, performance, and, of course, the superior development ecosystem of Python.

Django REST framework

Tom Christie

An overview of new and upcoming features in Django REST framework, including: * Building realtime APIs, using WebSockets. * Making the most of the automatic API documentation. * Using the JavaScript and Python client libraries.

Earning with a capital L

Charles Cossé, Kirby Urner

###Audience * Current and aspiring Edu-FLS developers; * Parents and teachers; ###Objectives * Discussion of a Django-powered eco-system which compensates Edu-FLS developers * Explain platform benefits from parent/child/developer perspectives * To generate interest and discussion within the Python community * To interact with interested python community members ###Abstract Requiring kids to earn their internet access from a self-serve education platform can serve as an effective means to supplement their education and develop a host of other life-skills, such as time management, accountability and attention to detail. The desire for credits serves as a single point of motivation which can be harnessed to teach any subject matter through well-designed activities. Enabling parents to control distribution of a subscription fee to activity developers of their choice keeps the user and developer communities connected with value flowing in both directions. Requiring FLS licensing and usability outside of this Django-powered credit-earning platform stimulates development of Edu-FLS software which thereby empowers children everywhere. ###Outline * Large infographic with sub-graphics illustrating individual aspects of the platform. * Elaboration of potential subtle benefits delivered through applications * Elaboration of subtle benefits delivered through platform * Discussion of un-realized potential in all subject areas * Historical account of platform development

educational framework for constructing black box optimization methods based on Python modules

Nadia Udler

Many pressing real world problems can be stated as problems of global optimization, where target function is given by values only (a black box function). They range from robot parameter tuning and optimization of chemical processes to biological systems analysis. Typical examples from financial engineering include model calibration, pricing, hedging, VaR/CVaR computation, credit rating assignment, forecasting, strategy optimization in electronic trading systems. The choice of the algorithm for decision making should depend on the type of the objective function and availability of a priori information, as well as user constrains such as budget. Such problems are best approached with a library of optimization methods to help study the nature of the problem. We use python/scipy/scikit-learn to create such library. This approach also has a great educational value giving the students a general roadmap rather than a collection of unrelated heuristic methods. It is essential for understanding and effective use of iterative optimization algorithms in general.


Liam Schumm

[Ergonomica][1] is a cross-platform shell implemented in Python. It uses core Python modules such as `os` and `shutil` to create a feature-rich terminal environment. It also utilizes Python applications such as [Suplemon][2] to provide features such as a text editor available across platforms. As more people are becoming exposed to Linux/Unix (e.g., in OS X, when using a computing cluster or cloud-based hosting service, or most recently in WSL), there is an increasing appreciation (and need) for the power of using a Command Line Interface (CLI) to manipulate files and processes. However, Bash syntax as well as that of standard utilities such as `awk`, `sed` and `grep` have a steep learning curve, and their usage details can vary across platforms. The goal of Ergonomica is to provide an easy-to-use, cross-platform alternative for common shell functions. This will be of interest to users who drop into a shell only occasionally but want to increase their efficiency (e.g., data scientists and other researchers who need to use a high-performance computing cluster or cloud-based platform for their research), or users who work on multiple platforms. It will also be of special interest to Python users, since it uses elements of Python syntax and several of its features parallel Python built-in functions. Unlike newer shells such as zsh and fish which seek to improve upon existing Unix shells, Ergonomica was designed from the ground-up and features an entirely new syntax (although many commands common to existing shells work as expected). In many cases, a single command may be used to accomplish what would otherwise require an esoteric line in Bash or a small script. For example, to find all files in the current directory that have "2016" in their name, in Ergonomica one could simply write: ``` ls -> (filter) "2016" in x ``` Similarly, if someone wanted to list all files whose names are 7 characters long and add `.py` to the ends of their names: ``` ls -> (filter) len(x) == 7 -> (map) x + '.py' -> (splice) -> mv ``` In this way, Ergonomica aims to be simpler and more intuitive, especially for those with familiarity of a high-level programming language such as Python. [1]: [2]:

Exploring Resistance Genes In Ocean Samples Using Python

Patricia Vera-Wolf

The ocean could play an important role as reservoir of resistance genes. The genetic information from these type of biological samples tends to encompass big volumes of data and their management can be chaotic and hazy. In this aspect, Python is an efficient tool to automate the estimation of resistance genes by comparing the genetic information of the samples with existing resistance databases.

Feeling Down? You're Not Alone! Tech Burnout and Mental Health

Kara Eads

Most people know that tech burnout is related to mental health symptoms like depression and isolation. But did you know tech burnout has been linked to increased ocular problems such as eye strain, irritation, burning sensation, redness, and blurred vision? Tech burnout in the form of emotional exhaustion has also been linked to musculoskeletal pain. Career tech stress has been shown to correlate with increased divorce rates and alcohol and drug use. If you’re feeling burnt out, the good news is, you are normal! Many people in the tech field are in the same boat at one time or another in their careers. Burnout in the tech field can be particularly extreme due to the high-pressure nature of the work. This poster session will cover early warning signs of burnout, symptoms of burnout, and most importantly, techniques you can use to combat burnout. Learn how tech career burnout is related to your overall mental and physical well-being and what you can do to improve your health.

How to Train your Robot - Computer Vision

Luke Bryan

An example of practical computer-vision using Dlib for object recognition: Python integrates with Dlib ( to provide an easy way to recognize faces and objects, or build your own object recognizer.

Hybrid Vocal Classifier: a package for automated labeling of birdsong

David Nicholson

Songbirds learn to sing very similarly to how humans learn to speak. The process of learning and producing song takes place in the song system, a network of areas in the songbird brain. Although the song system is unique to the songbird brain, it has evolved from brain areas common to all vertebrates, including humans. So by discovering how these specialized brain areas work, we can learn more about our own brains. For these reasons, neuroscientists study songbirds as a model of how the brain learns and produces behaviors like speaking a language, playing the piano, or kicking a soccer ball. To understand how the songbird brain learns and produces song, neuroscientists carry out behavioral experiments. Typically many hours of song are recorded from individuals of a given species. Current analyses of song are limited by the many more hours required to label song by hand. Several groups have proposed automated analysis of birdsong. These methods have contributed greatly to advancing the field, but there are some areas where they can be improved: 1. Until recently, many proposed methods did not have open source software implementations 2. None of the currently available methods build upon open source packages that are road tested by a broader data science community 3. The most commonly used methods rely on comparisons of entire songs, instead of labeling the individual elements of songs, known as syllables. Analyzing the entire song e.g. with cross-correlation of spectrograms may miss some important effects of experiments 4. Recently proposed methods that address many of the previous points have not been compared extensively, and there is no software package that incorporates all of these methods so that they can be easily tested by many different labs Results ---------- I present Hybrid Vocal Classifier, a Python package for automated labeling of birdsong. The main advantages of this package are: - open source, Python core with options for use in Matlab - built on top of well-established packages: Scikit-learn and the Scipy stack - incorporates improvements to recently proposed methods - feature sets for accurate classification with the K-Nearest Neighbors and Support Vector Machine algorithms - for groups with access to GPUs, an implementation of a recently published neural network for classifying and segmenting song in Keras (Python high-level neural networks library) - a Python implementation of a proposed method to improve the classifications of various machine learning methods using a Viterbi-like algorithm This poster will present the Hybrid Vocal Classifier package for the first time. I compare analysis of experimental effects using single syllables--facilitated by this package--with analysis of entire songs. I will also present analysis of the accuracies achieved with the neural network and Viterbi-like algorithm. My hope is that this package will provide a tool for songbird science, and at the same time present birdsong as an interesting test bed for many machine learning algorithms.

IceLab: A Python-based framework for semiconductor device measurement and analysis

Arnold van der Wal

Summary ------------- IceLab provides designers of semiconductor components with Python tools that help them define and carry out highly-automated devices measurements and data analysis. It supports local and remote operation of semiconductor wafer probe-stations and measurement equipment. Data is stored in a MongoDB database for subsequent analysis, including interactive visualization with Bokeh or Matplotlib. Abstract ----------- Evaluation of new semiconductor device components is time-consuming and often involves a lot of manual work. Good examples of such evaluations are in the fields of parametric and reliability analysis. IceLab helps automate a significant part of this process, thereby saving time and increasing the number of data points for statistical analysis. This framework is already being used successfully within the semiconductor industry. With IceLab, designers can write dedicated measurement flows in Python. These can be executed directly as Python programs or invoked through a GUI that facilitates device selection, wafer stepping control, and data visualization. IceLab provides drivers to control probe-stations and measurement equipment over ethernet and GPIB networks. In fact, it has already been used to remotely control laboratory equipment across continents. Measured data can be stored in MongoDB databases for later processing. Data from multiple devices with varying geometrical dimensions, varying process conditions, or from different wafer locations can be combined to form meaningful interactive visualizations for analysis. Both Matplotlib and Bokeh are used for data visualization. This presentation will show how tools like MongoDB, Numpy, Pandas, Matplotlib, and Bokeh are integrated into a data-processing pipeline for semiconductor device analysis.

Improving readability of online content by removing abusive speech using Python

Adyasha Maharana, Abhinav Gupta

This poster presents our work in Python that involves building a deep learning model to detect abusive textual content and implementing the model as a web filter during online browsing. ### Abstract Imagine you are scrolling down your twitter feed, your favorite Reddit channel or comments on one of your Facebook posts and you come across a rather distasteful comment. This sparks an online war on the post or prevents a constructive discussion from taking place. Either way, you wish you didn't have to see the comment in the first place. Our web content is plagued with cheap, misinformed and hate speech which is a cause of mental harassment for many people. Is it possible to ban it? Well, it supposedly violates the notion of Freedom of Speech. What can we do then? We can filter it. That’s what we do when we don’t like ads. Taking a stab at it, we are using Python to build a deep learning framework that can detect abusive textual content as well as understand the readability of content on web and then extend its utility through browser extensions. This poster presents our work in Python that involves building a Recurrent Neural Networks (RNNs) with annotated datasets from Kaggle and Twitter. The task of detecting hate speech has its own eccentricities, such as: + identifying abusive speech based on certain traits ('fatso') + learning to recognize masked insults ('d!ck') + detecting sarcastic insults We describe how the machine learning model works out solutions to such tasks. ###Data Sources: + Kaggle + Crowdflower (Twitter) ###Python Packages Used + TensorFlow + NLTK

JupyterHub: Interactive Learning at Scale

Carol Willing

The Jupyter notebooks (formerly IPython notebooks) accelerated interactive learning and collaboration in the Python community. Just a few years ago, it was a novel to see a workshop taught with notebooks. Today, it's much more popular, and students benefit from manipulating the material through visualizations, video, audio, and prose (and code too). College courses, workshops, user groups, and teams can benefit from Jupyter notebooks. JupyterHub enables them to do it efficiently at scale. Using JupyterHub in teaching allows the installation burden to shift from the student to the instructor. Students can focus more on the material being taught and less on operating systems and configuration. Project Jupyter values its roots in scientific education. As such, we are continuously refining tools that will allow an instructor or an administrator to get up and running easily with JupyterHub. The poster will provide the viewer greater understanding about: - JupyterHub's value in teaching at scale - JupyterHub's use by students and learners - Deployment and management of JupyterHub by instructors - Specific resources available to the education community - A broad overview of JupyterHub's future direction

KnightSky: A Chess Engine that learns

Aubhro Sengupta

When IBM’s Deep Blue beat reigning chess champion Garry Kasparov in 1997, it was established that computers could beat even the best human at chess. Today, chess engines contain many lines of code handcrafted under the guidance of grandmasters. Are you interested in being knee deep in chess theory just to crank out a half decent engine? No? This talk is for you. Why not create an engine that learns to improve itself? I created KnightSky to do just that.


Katya Vasilaky

You have an ill-conditioned data set, and you’re ready to make some predictions about what your consumers should buy, or who will win the NFL. Wait, what’s ill conditioned mean? Fortunately, as Jake Vanderplas pointed out last Pycon, doing statistics has now become accessible to a much larger set of programmers. Python’s Pandas, Numpy, Scikitlearn, and SciPy, among others, have all made data analysis much more accessible to the non-statistician. Regularizing one’s data is one of these steps that might be recommended before running a neural network, for example, and it essentially dampens the effect of certain predictors. But how does this work? And when should we do it? What exactly are the pros (less variance in the solution) and cons (more biased estimates)? This poster will present the least squares problem (inverse problem) and introduce the concept of ill-conditioned data, and the technique used to deal with ill-conditioned data – in statistics or machine learning – known as regularization. Regularization dampens the effect of features or covariates that are highly correlated (essentially using a filter or re-weighting of covariates). The poster will highlight common regularization techniques (e.g. Lasso, Tikhanov, Elastic Net), as well as a new method called Iterative (L2) Tikhanov. It will compare the mean squared prediction error from cross validated experiments with real data sets (NFL wins as as well as Kaggle's Rossman Sales data).

Model Management Systems: Scikit-Learn and Django

Benjamin Bengfort, Laura Lorenz, Rebecca Bilbro

Modern web applications encorporate machine learning models to create personalized, interesting, or even safer experiences for their users. From recommendations to troll detection, text summarization and automatic image captioning, machine learning is becoming a fixture of our experience on the Internet. However, while there are many tools for the administration of content (Django CMS) or the administration of an API (Swagger), machine learning model management systems are still custom software that must be created on a per-application basis. Employing a fitted model that was trained with Scikit-Learn is relatively easy: the model can be pickled then embedded into a REST API with Flask or the Django REST-Framework. Requests that contain data can be transformed, then the model can make a prediction which is returned to the front-end. However, as time goes on the model will either need to be retrained on new data or encorporate new information so as to be more predictive. Different models may be employed in an ensemble fashion or to evaluate different performance, models may be trained on different parts of the application or on a per-user basis. The end result is that a web application may have many hundreds of models, and if they are simply embeded with the code, they cannot be updated in real time (a new deployment is required). In this poster we introduce a new approach to machine learning in web applications: _model management systems_ (MMS). We present a Django app, similar to the django-admin app that allows for the storage, curation, and selection of Scikit-Learn models such that both data science efforts and users can interact with the machine learning capabilities of the system (similar to how editors and authors interact with content in a CMS). Model Management Systems are the next step to more easily allow many types of web and mobile applications to encorporate machine learning in meaningful ways. To illustrate this, we present a simple application, [Partisan Discourse](, that uses a model trained on the 2016 Presidential Campaign Debates to predict the political polarity of text. As users browse the web and read news articles, the application highlights articles and words that are "red" or "blue" indicating partisanship. Users can also submit their own suggestions for an article's political bias, and in so doing generate a personalized text classification model. Expert users (that is users who might have a professional need for such an application) can also create collective models. The result is a rich web application that allows many models and predictive interactions between the machine and different users. Without a model management system, such an application would not be possible!

Monitoring Your Plants With Raspberry Pi

Katya Vasilaky, C.Ryan Considine

With the rise of ag tech and DIY farming, raspberry Pi is a feasible way to technify one’s garden, learn more about precision agriculture. But diving into the realm of hardware can be daunting the first time around, even with all the built in features with raspberry pi. Ordering parts is an investment, and user error in soldering parts together is non trivial! Underlying all of that is some basic knowledge of circuitry and how to wire a bread board. There are many tutorials online about building a water sensor from a raspberry Pi, but once still has to piece together many components from disparate sources, particularly if you have no circuitry or hardware experience. Furthermore, it is difficult to see how the Pi is wired together through a video. Other less obvious steps are how to access the micro controller from your laptop, install packages to the linux based machine, and run a loop that will print out readings. This poster will present a step-by-step instruction kit of how to build your own raspberry Pi water sensor with a live demo (sensor in hand, plus a basic breadboard to demonstrate movement of a power source to ground that lights up a LED light for hardware novices.) The poster will go from what parts to purchase, how to assemble the parts, how to configure the Pi, and finally how to take the Pi’s output and push it to a postgres database on heroku and print the results to a queryable web application.

My Heart Stream On

Paul Logston

Ever wonder how an EKG works? Or what your EKG looks like? Ever wonder if you could build an EKG with Python? At this booth, a live demo will allow you to have your EKG taken*. The poster at this station will cover how data from an Arduino can be packaged and sent to a Python interpreter for display. Visitors will be able to get a PNG of their EKG.

Numerical Simulation of 1D Darcy-Forchheimer Equation using Python

Hassan Saad Ifti

The Darcy-Forchheimer Equation is used to quantify the flow velocity with respect to the pressure loss across a porous material. From an applications' perspective, this can be used to relate the heat transfer effect of a coolant gas injected into the external boundary layer through the porous material which acts as the skin of the hypersonic aircraft. A numerical approach is beneficial to predict this heat transfer with the help of the pressure drop correlation obtained by the Darcy-Forchheimer Equation. Since this is a finite difference approach for a single direction, the usage of python is straightforward and inexpensive, which is particularly important for projects as this where the experimental rigs are highly expensive; this demonstrates the added value of an open source language like python.

nwboot - Automated Package to Provision Network Device(s)

Sheng Wang, Karthik Muthusamy

Network Test automation is becoming an increasingly critical component in network product certification amongst large Tier-1 SPs and Web/DC customers. The production certification relies on simplified mechanism of network devices (routers/switches) provisioning. We would like to demonstrate how by using Python standard multiprocessing libraries and along python based network connectivity utilities including paramiko and pexpect, we developed a package geared towards provisioning different network platforms concurrently that helped solve our internal test automation challenges and which has allowed us to extend this to our customers.

OneToOne or Abstract? Techniques for customizing User data in Django

Eleanor Stribling

Django, a popular Python web framework, comes with a basic User model out of the box, but what do you do if you want to collect more data from customers when they sign up on your site? What are your options for a brand new project with special requirements versus an evolving, existing site? This poster will review two options - extending the User model via a OneToOne relationship and writing a custom model based on Django's AbstractUser - when to use them and how to implement them.

On the Hour Data Ingestion from the Web to a Mongo Database

Will Voorhees, Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

Systems designers consider the problem of integrating a variety of systems from databases to computational processes in such a way that they can run by themselves. While individual tasks like connecting to a database, or fetching data from an API may seem simple in isolation, systems experience increasing complexity as simple components are integrated in meaningful ways. Important questions about how to run processes on a schedule, how to detect and automatically recover from errors, how to manage each process, and how to view or administer the system as a whole become critically important to success. We were confronted with these questions when we attempted to build a system that on an hourly basis would go out and fetch posts from RSS feeds and store them in a Mongo database. This seemingly simple problem statement, intended to create a corpus of natural language for analytics and application oriented machine learning, became more complex as we integrated each component. In this poster we will present the lessons we learned from building the system and demonstrate an robust architecture that uses Python processes as a backbone for both work, scheduling, and administration. The system, called [Baleen]( after the whales that ingest huge amounts of plankton, has been running since March 2016 and has collected over 1 million HTML posts creating a corpus containing hundreds of millions of words (which we hope will grow to over a billion words by the time PyCon arrives). Immediate problems came up - what happens when you ingest duplicate documents? How often do you synchronize? How do you handle errors without stopping jobs? Other problems were more intermediate - how do you get a quick view of the system as a whole? How do you detect global failure? And still other problems took months until they were noticed -- what happens when you run out of memory, disk space? What happens if you ingest a video or audio instead of text? Our solutions may not be the best, but they are our own. We used the [schedule](#) library to kick off hourly jobs, and stored information both about the jobs and the data collected in a Mongo database. We created a Flask web application that could read information from Mongo and display the status of the app. We created a command line application that allowed us to quickly manage different parts of the system and a utility that configured our app using a YAML file. We created email handlers to notify us of big failures, and utilized a suite of linux tools for deploying our system on AWS.

PyKids as a Service - Implementation details

Meenal Pant

[PyKids]( is a service that enables students to learn Python only with a browser and an internet connection. This poster provides an inside look on the technologies used and the implementation details. **Outline** * Key technologies used - Python, Jupyter, tmpnb * Servers - AWS instances * Curriculum access - Distributed via github and nbviewer * Updates and student forum - Wordpress blog

Python for Energy Policy Research

Anna Liao

From 2011 to 2016, I wrote software and built systems to support energy policy research at Lawrence Berkeley National Laboratory. I would like to show the various applications for Python for energy policy research, including networked sensor data collection systems to study energy use in buildings and automated event detection for distribution grid research. The projects I would display are a low cost embedded system for enabling demand response, assessing hot water use in residential buildings, and event detection and data analysis for distribution grid research. I hope to engage the Python community to use their skills for clean energy and smart grid applications.

Python to save lives in Brazil.

Mariana Mioto

Introduction ==================== Idiopathic Pulmonary Fibrosis (IPF) presents a prognosis of deterioration of pulmonary function responsible for 50% of cases referred for lung transplantation. IPF is frequently associated with a histological and radiological pattern of Usual Interstitial Pneumonia (UIP), and this radiological pattern is characterized in high resolution computed tomography (HRCT). Considering the complexity of the diagnostic decision making and the difficulties in performing a more objective evaluation of the pulmonary parenchyma, this poster presents the initial results of the computational algorithm, written in Python. The aim of the study is to automatically segment the lung area as a first step in the tool for quantitative and objective analysis of interstitial fibrosis pneumonia with Python algorithms. Material & Methods ==================== The algorithm proposed in this study, performs preprocessing through filters. Scikit-Image filters were used to perform the detection of CT edges. We used a public image database containing examinations in DICOM (Digital Imaging and Communications in Medicine) format of patients with interstitial lung diseases. The images are preprocessed using a library called Pydicom. Lung area segmentation was processed based on a region-targeting approach. The first step was to find through the algorithm of the Sobel filter the magnification map of the pixels in each tomographic cut. The second step consisted in performing a study of the histogram of each tomographic cut and then establishing mean border transition values ​​so that we could construct markers of these images. Finally, we use the Watershed Transform to fill the regions of the elevation map from the markers determined above. The Watershed method finded basins in image flooded from given markers, this method is also implemented by Scikit-Image. Conclusion ==================== The segmentation obtained allows the evaluation of the pulmonary parenchyma and can be applied to HRCT images of patients with interstitial fibrosis pneumonia, including suspected UIP, in order to obtain the image area (lung) where the classification analyzes will be performed of pulmonary opacities.

Science: give DueCredit to software and methods developers!

Matteo Visconti di Oleggio Castello, Yaroslav O. Halchenko

Science depends on software. From Astronomy to Zoology, every (data) scientist uses software to collect, analyze, and report data. Much of this software is open source and developed first-hand by researchers in the field. While scholars cite relevant papers in their works, software and methods are often forgotten in the reference list. End-users tend to ignore the complexity and ramifications of the methods they are using, contributing to the lack of adequate citations for software. Thus, works that introduce a novel method or software often have an underrepresented number of citations compared to the actual users, making it harder for the creators to show the importance of their work when applying for funding. And when such citations exist, young investigators with novel methods are forced to “re-implement” the wheel, creating short-lived independent projects instead of contributing to existing ones, because they need recognition of their work. To counter this vicious cycle and let scientific software be given its due credit, we developed DueCredit, a framework in Python that allows users to automatically collect citations of methods, software, and datasets used in analysis pipelines. DueCredit aims to be invisible to the user, who just needs to flip a switch to start collecting references. When the analysis script finishes, the user can obtain a report containing references formatted in the citation style of choice or in BibTeX.

Synchronization Methods for Distributed Agent Based Models

Christine Harvey

Distributed computing addresses the traditional scaling limitations of Agent Based Models (ABMs) and allows for the development of massive-scale models. Synchronization of agent states between multiple processors is complex and needs to be reliable and efficient. The protocol used to synchronize agent states across processors has a significant impact on the efficiency of the tool. Repast HPC is one of the current tools available for distributed ABMs [1]. It approaches the problem by performing a complete synchronization of all entities of interest at every time step. This poster reviews an alternative approach to entity synchronization, a design which manages persistent, pertinent information. This protocol performs an initial synchronization among all entities with relationships to other agents and then only performs updates and synchronization following changes to relevant information. The pertinent data synchronization technique is an event-driven method to manage the communication and synchronization between the processors. The conservative and the event-driven approaches are both described and analyzed in this poster.

The Personality of the Snake: Personality Recognition using Convolutional Neural Networks

Maite Giménez

Science is always trying to improve itself. Recently, the Natural Language Processing field (NLP) of AI is trying to come up with new methods for classifying user profiles based on what they write. This new task is called Author Profiling ([AP]( Formally, AP is the task that, given a text, seeks to classify writers depending on their demographic features such as age, gender, or personality traits. There is still limited literature on the topic, and those models which address this task rely on handcrafted resources; therefore, they are restricted by the domain of the problem and by the availability of resources. In this poster, we show how to classify the personality of an author – described as a combination of five traits: openness (O), conscientiousness (C), extroversion (E), agreeableness (A), also, stability (S) – based on what they wrote on Twitter. We proposed to solve this problem using a Convolutional Neural Network (CNN) architecture developed in Python. We present how to properly train this model using a pre-trained [word embeddings]( that is capable of learning the best features for the task at hand without any external dependence. The results show the potential of this approximation compared against other state-of-the-art models. We will also present several toolkits available for developing your own system in Python and we will discuss the pros and contras of them (Keras, Theano & Tensorflow). Come and see how to apply this leading edge CNN for an innovative NLP task in Python!

The Yosai Project: A security framework for python applications

Darin Gordon

Yosai is a security framework providing services for the major aspects of application security: Authentication, Authorization, and Session Management. It is a port of Apache Shiro, which is written in java. Yosai addresses an unmet need by the Python community: the need for a feature-rich security framework that will free developers from addressing complex security requirements from scratch.

Training and Using Haar Classifiers in OpenCV

Matthew Parmelee

Real-time image recognition and processing has historically been a computationally intensive operation. However, with the advent of Haar-like features and cascading classifiers and their integration with the popular OpenCV library, this process has become much more accessible to the computer vision hobbyist. Using a relatively small pool of training data, users will be able to plan, build, and optimize a cascading classifier to recognize objects of their choosing in real time, with no more than the OpenCV library and a webcam. The goal of this poster is to provide a high-to-low level overview of these techniques, and allow beginners to learn to train and use their own image classifiers from scratch.

Twitter Bot Basics with Python

Kerstin Kollmann

Many of us use Twitter on a daily basis, but mostly as users or consumers who communicate via our personal (or organisation) accounts in quasi "official capacity", using our own voice... when we're not busy keeping up with the news, friends, the industry – and the occasional Twitter bot delivering content into our timelines. Programming Twitter bots can actually be quite fun and is not all *that* complicated once you know some Twitter API basics and botiquette, and which Python libraries you can use to talk to Twitter. ... Coming up with a neat, unique idea for a bot might actually turn out to be the greater challenge than understanding how to make it work. ^^ With the help of my poster and using a bot I programmed myself as an example, I will show you how you, too, can build your first simple Twitter bot. I will: walk you through the process of setting up a Twitter app, tell you what you need to know about the platform's APIs (plural, yes!), let you know what to avoid if you do not want your bot to get banned, give tips on how to release your bot into the wild, and demonstrate how to use Python to actually get your bot to do stuff! Additionally, I will provide ideas for potential bots (as well as suggestions for how they can be realised with Python) and point out existing Twitter bots for inspiration.

Using Lazy Evaluation to Optimize Python Programs

Joe Jevnik

Lazy evaluation, also known as call by need, is an evaluation strategy which defers all computations until a result is actually required. This is the opposite of eager evaluation, what Python normally does, where functions are executed as seen by the interpreter. Lazy evaluation gives the executor more context about how any given expression will be used, opening the door for interesting optimizations like intermediate object sharing or parallel execution. Python has some tools for emulating lazy evaluation, namely closures and generators; however, these objects cannot be used interchangably with non-lazy values. `lazy_python` is a library which allows users to automatically translate standard, eager Python functions into a lazy equivalent. This allows users to write or use standard Python code while getting the benifits of lazy evaluation. This poster will cover different ways to implement lazy evaluiation, focusing on the techniques used in `lazy_python`. It will show how to use the added execution context to implement optimizations. Finally, we will show a real example of using `lazy_python` with `dask` to automatically parallelize the execution of Python programs.

Werk, CLI tool for easing software development

William Holloway

## Best Practices: Too Many To Remember Server side automated testing generally assert that code meets a certain of quality of standards, but depending on capacity the feedback cycle for the user can be slow. Most of these sorts of checks are generally part of the team development workflow, but it is easy to forget all the steps and the server ends up catching problems that could have easily caught prior to submission. These sorts of steps are typically present at certain predictable intervals: starting a task, saving a task, submitting a task for review and finally submitting a task to be merged. ## Solving the Problem: A Tool to Streamline Workflows Werk is a command line tool that lets teams configure their development workflows. It supports four main commands: start, save, review and done. At each of these points, there and pre and post hooks which give teams granular control to customize their workflows with. For example, a team can configure a pre hook which runs a linter before a developer tries to run “done” on their task. Werk integrates with popular workflow tracking system JIRA and Github as a code review and backing VCS. At Box, we integrated Werk into one of our internal CI Build Systems so we automate triggering CI builds at different stages of the development workflow. Werk reads from a YAML configuration file thatcontainsinformation about the project’s Git repository, JIRA project details and code review system. With that information, Werk is able to create and transition JIRA tickets. It can open and close pull requests. In addition, it also provides pre and post hooks so users can automate which scripts should be invoked on the client side.

Yellowbrick: Steering Scikit-Learn with Visual Transformers

Rebecca Bilbro, Benjamin Bengfort

In machine learning, model selection is a bit more nuanced than simply picking the 'right' or 'wrong' algorithm. In practice, the workflow includes (1) selecting and/or engineering the smallest and most predictive feature set, (2) choosing a set of algorithms from a model family, and (3) tuning the algorithm hyperparameters to optimize performance. Recently, much of this workflow has been automated through grid search methods, standardized APIs, and GUI-based applications. In practice, however, human intuition and guidance can more effectively hone in on quality models than exhaustive search. This poster presents a new Python library, [Yellowbrick](, which extends the Scikit-Learn API with a visual transfomer (visualizer) that can incorporate visualizations of the model selection process into pipelines and modeling workflow. Visualizers enable machine learning practitioners to visually interpret the model selection process, steer workflows toward more predictive models, and avoid common pitfalls and traps. Yellowbrick is an open source, pure Python project that extends Scikit-Learn with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create publication-ready figures and interactive data explorations while still allowing developers fine-grain control of figures. For users, Yellowbrick can help evaluate the performance, stability, and predictive value of machine learning models, and assist in diagnosing problems throughout the machine learning workflow. In this poster, we'll show not only what you can do with Yellowbrick, but how it works under the hood (since we're always looking for new contributors!). We'll illustrate how Yellowbrick extends the Scikit-Learn and Matplotlib APIs with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the Scikit-Learn Pipeline process - providing iterative visual diagnostics throughout the transformation of high dimensional data.

ZimboPy: Empowering Zimbabwean Girls As Change Makers

Marlene Mhangami, Phoebe Chua

![][1] ZimboPy is an organic, on-the-ground effort by a local non-profit organization and Python developers in the Harare software development community to advance the cause of women in technology in Zimbabwe. The program operates in community centers, universities, high schools and tech hubs to make programming accessible to girls regardless of their socio-economic status. Upon initially joining a ZimboPy club, many of the girls would have never used a computer before, let alone code. In Zimbabwe, only 17% of computer science undergraduate majors are women, and in the developing world, women make up less than 20% of the information and technology workforce. ZimboPy exists to empower Zimbabwean girls with the skills and confidence necessary to not only enter the local tech industry, but to lead it. ![][2] In addition to learning to code, ZimboPy club members also join a global network of women in technology that are working to tackle social challenges through human-centered design and computer science. ZimboPy’s mentorship program invites experienced women developers, mainly from the United States and Europe, to help Zimbabwean girls address local problems which can be solved with technology, such as clean water and e-commerce solutions for small shops in towns and villages. Mentors will travel to Zimbabwe and work with girls as they develop a plan for their applications and pair-program with them, answering questions and providing feedback along the way. To ensure that the girls are successful, mentors will continue to work with their groups even after leaving the country through weekly video conferences and email feedback. Overall, ZimboPy looks forward to changing Zimbabwe's future through creativity, collaboration and the power of Python programming [1]: [2]: