Clojure Conj 2018 Trip Report

tom, 2018-12-07

Overview

The 2018 Clojure Conj conference spanned Nov 29 through Dec 1 in Durham, NC. With hundreds of members — primarily from industry — present, the conference provided a plethora of real-world applications for Clojure, lessons learned, success stories, and introduction to novel technologies and techniques. Clojure continues to be a force to be reckoned with in multiple domains — to include Operations Research / Data Science, with a consistent growth path and proven success in industry. The use of the Datomic database, applications of Machine Learning, real-time distributed systems, data engineering, and blended approaches to Artificial Intelligence were recurring themes at this conference. The applied use of Clojure in the medical space provided the most surprise, particularly in light of the measurable success both companies saw. Brief summaries of the more salient presentations follow, along with a final summary.

REBL (Stuart Halloway)

Halloway demonstrated an alternative means of interactive exploration — REBL (Read Evaluate Browse Loop) — and discovery with live Clojure-based systems. REBL provides a graphical user interface that enables extensible discovery and navigation similar to the Clojure REPL, with the benefit of leveraging new protocols in Clojure 1.10 — datafy and nav — to allow arbitrary data to be interpreted by REBL into idiomatic views which may be navigated. Cognitect designed REBL to enable rich interaction and exploration of distributed systems — like Datomic Cloud. The result is a lively environment reminiscent of the Smalltalk interactive graphical interface or a version of older Symbolics Lisp environments, where the user effectively interacts with, inspects, and navigates data using integrated textual and graphical views. REBL allows arbitrary custom views and means of navigation, so that results in HTML may be displayed natively as text or a web page, database queries show up as tables or graphs, etc. REBL is currently free to use, but not open source.

See the talk: https://www.youtube.com/watch?v=c52QhiXsmyI

Clojure on the Cyberpunk Frontier of Democracy (Chris Small)

Mr. Small reported on the development of Taiwan’s Polis voting and citizen sentiment analysis system. Polis provides a real-time service where citizens can submit opinions on current topics and political issues, which the Taiwanese government may leverage to find areas of commonality and cross-cutting political support. Polis is implemented as a distributed database using Clojure, with real-time message processing and scalable front-end via Heroku cloud services. For analysis and exploration, the system provides an interactive visualization with the Vega grammar of graphics JavaScript plotting library, via the Clojure-based Oz wrapper library. Chris and his team used Clojure to perform Principal Component Analysis and clustering to analyze voter sentiment on issues, rendering interactive clouds of voters projected along principal components and clustered by issue. Polis served to identify areas where voter sentiment closely aligned to help all parties in government garner support for legislation.

See the talk: https://www.youtube.com/watch?v=2tBVMAm0-00

A Clojure fusion of symbolic and data driven AI (Huahai Yang)

Dr. Yang — a cognitive neuroscientist formerly of IBM’s Watson team — presented cutting edge research on his company’s newest product, Juji. Juji is billed as a next-generation chat-bot, designed for customer-facing operations and information collection. Juji uses a novel combination of a symbolic rule system and specialized neural network models, represented as an embedded domain specific language in Clojure. The software fuses classical AI (symbolic rules systems based on propositional logic) with specialized deep-learning systems to create an authentic and relatable artificial customer service agent that is able to politely drive complex dialogue trees toward service goals. For Juji, Clojure — via metaprogramming — serves as the host language for the rules engine and the coordination layer for specialized deep-learning models based in Python and hosted on distributed systems. This architecture allows Juji developers to develop topical dialogues at a fairly high level, while leveraging specialized models and infrastructure (e.g., the TensorFlow deep learning library).

See the talk: https://www.youtube.com/watch?v=phA4bMjKvCY

Developing a Medical Image Viewer in ClojureScript (Oliver Eidel)

Dr. Eidel demonstrated a novel usage of machine learning to analyze mammography to identify possible breast cancer indicators. Using ClojureScript, he developed a browser-based user interface that radiologists use to annotate reports and interactively change brightness and contrast to detect anomalies. Radiologists used the application to build a training set of breast cancer images, using a simple spline path to create bounding geometry around visually suspicious areas of the imagery. This geometric data served as a label for deep-learning models to classify regions of an image that warranted further inspection (as well as the converse). The trained model then automatically highlights regions of interest on new patient images, bridging the gap between variably annotated swaths of imagery — that experts must otherwise take time to examine — to structured data informed by an ML-friendly classifier to quickly identify problematic tissue. This improvement allows experts to prune their working set down to provide more time to apply expertise per patient.

See the talk: https://www.youtube.com/watch?v=kNiGu%5FVaoTg

Data Science / Machine Learning (Community of Interest)

The Data Science/Machine Learning “unsession” provided an opportunity for members of the community — from neophytes to practitioners — to discuss best practices, challenges, and experiences using Clojure. A general consensus emerged where Clojure is seen as a potential low-level target for implementing efficient ML algorithms, but currently serves as a flexible mid-level language for composing distributed systems that either feed or leverage heterogeneous modeling pipelines. Participants saw no reason that Clojure could not dominate the space — given existing performant numerics libraries like Neanderthal and core.matrix, the Flare ML library, the cortex deep learning library, etc.; the only missing ingredient appears to be a broader base of users. Given that the state-of-the-art in ML typically occurs in ecosystems with existing momentum (typically Python-based services like TensorFlow), the current practice is to either port existing models to Clojure, or wrap them using Clojure-based services. This allows the developer to leverage Clojure’s power and flexibility even in a heterogeneous language setting, providing an end-to-end solution for data engineering, data science, and visualization.

AWS, Meet Clojure (David Chelimsky)

Mr. Chelimsky presented Cognitect’s official library for Amazon Web Services interop from Clojure. Since 2015, they have built a data-driven AWS client specification, with native Clojure integration. The library leverages Amazon’s JSON specifications for their HTTP request format, and extends to all AWS subservices — unlike other AWS wrappers. This yields modular, small specific dependencies by service, which leverage the familiar Clojure map data structure for communication. Given the effort that Cognitect has invested in the AWS ecosystem, the future looks bright for continuing first-class AWS and GovCloud integration.

See the talk: https://www.youtube.com/watch?v=ppDtDP0Rntw

AI Systems: Foundations for Artificial Minds (Ben Kamphaus)

Mr. Kamphaus examined the philosophical implications of ML with an eye toward artificial minds — that is useful artificial agents with capabilities beyond the current crop of ML-inspired functions. From the practitioner perspective, Ben identified the benefits of the current hype of ML (“deep learning”) in terms of immediate practical performance and production deployment, along with the long-term shortfalls inherent to neural networks, focusing on the inability to reason about or otherwise interrogate the resulting model. From a DevOps perspective, he highlighted using Clojure and Datomic to glue together disparate systems in an ML pipeline, and showed particular enthusiasm toward the prospect of using the Datomic database to provide reproducibility and provenance in model training and development. Current practices often “embed” training assumptions in the deployed model, which is significantly short-sighted in the presence of new data and the need to audit already opaque models. Ben hopes that by matching training data in a temporally versioned database like Datomic, production models can be matched to the data used to train them, providing some trace of the limitations and biases inherent to the model as well as a basis for comparison when trained with novel data.

See the talk: https://www.youtube.com/watch?v=5egU3VrElmA

Clojure vs Sepsis: Path to Real Time Enterprise Data Science (Igor Ges + Gerardo Castro)

The team at HCData built and deployed a real-time distributed computing platform that uses advanced data science techniques to classify hospital patients at-risk for sepsis. Their product SPOT interfaces with 146 hospitals and thousands of real-time patient sensor data, feeding statistics through Apache Kafka message streams to a Clojure-based cloud backed by the Datomic database. Clojure processes apply a custom model to identify which patients show signs consistent with sepsis, and then send warning alerts back to the patient’s hospital indicating the need for intervention. HCData made a significant, measurable impact on patient mortality due to sepsis — and in some cases similar system-wide illnesses — by allowing nurses to screen patients just-in-time rather than on entry or shift changes. HCData is moving toward leveraging its infrastructure to identify other time-sensitive treatments.

See the talk: https://www.youtube.com/watch?v=AyWbB52SzAg

Can you GAN? (Carin Meier)

Carin demonstrated the newly-developed first class support for Clojure in the Apach MXNet deep learning library. She focused on demonstrating a complete implementation of an ML pipeline in Clojure and MXNet using Generalized Adversarial Networks (GANs) to train image classifiers for arbitrary topics. Her notional case study focused on the intricacies of training a classifier to recognize images of flan (the dessert), which ended up being a nontrivial challenge. Thankfully, Clojure provides a complete, high-level interface for implementing this and many more ML pipelines in MXNet, including distributed variants to train on cloud clusters.

See the talk: https://www.youtube.com/watch?v=yzfnlcHtwiY

Probabilistic Programming and Meta-Programming in Clojure (Vikash Mansinghka)

Vikash demonstrated a novel development in programming paradigms: probabilistic programming. Through an embedded domain-specific language in Clojure, Vikhash and his team provide a general means for programming with stochastic variables, generalized inference, and the ability to compute traces of sophisticated models. The paradigm allows for high-level modeling and inference across sparse or missing data, enabling succinct descriptions of processes that describe phenomenon with a strong likelihood. The inferential engine leveraged by the DSL provides an efficient means for optimizing model fitting, which Vikash demonstrated by declaratively inferring multiple models based on a sparse dataset. The team is expanding the generality of this approach to Bayesian inference of data, in the form of BayesDB, in which sparse, missing, or possibly erroneous data may be modeled with an extension to the SQL language. Combining inference with general purpose programming and database technology looks fascinating, particularly the ability to synthesize useful datasets from a sparse set of samples. The results of the inference engine — referred to as traces — provide a symbolic reasoning for the structure of the model. This reasoning bears a striking similarity to the stochastic demand sampling rules I developed for the Helmet language in 2012, which indicates a possible use case for more robust stochastic demand futures.

See the talk: https://www.youtube.com/watch?v=KLGwLkmh8gI

Keynote — Let’s Talk About AI, ML, and Bias (Rebecca Parsons)

Dr. Parsons delivered a compelling talk about the sensitivity of production systems to implicit bias, along with potentially dire consequences. With the push for increased utilization and leveraging “AI” (in the current parlance, typically a variant of Machine Learning), the societal risks are significant. Dr. Parsons highlighted the inherent vulnerabilities in existing ML techniques, where minor details missing from the training set can create significant blind spots for the resulting model, or lead to implicit prejudices. Race, economic status, gender, vocal accent, medical history, and other variations merit intense inspection of the data used to build models, as well as advanced methodologies to detect bias. Since ML models are notoriously difficult to reason about, bias often emerges much later in production. When that bias is used to influence — or potentially countermand — decisions that affect human life, our society enters a danger zone. How often will a young doctor go against the medical determination of “the objective model,” or a judge countermand the sentencing or parole recommendation of “the objective model”? Dr. Parsons presented a strong case that we — as a society — must understand the implications and hidden biases of the technology before rushing to embrace it, before people’s lives are harmed. This involves new techniques for detecting structural bias in training data, as well as an intentional sensitivity analysis step which appears to be missing from the ML community.

See the talk: https://www.youtube.com/watch?v=w1lqZcnamAQ

Summary

The conference was an incredibly useful opportunity to see the latest ideas in industry and the Clojure community writ large. The push toward capturing more of the Machine Learning space is apparent, as is Clojure’s unique position in that space (evidenced in multiple presentations) and Datomic’s ability to support broad applications including ML. In spite of the focus on the rampant success of ML, I deeply appreciated the cautionary themes that also emerged highlighting the limitations of ML regarding opaqueness towards reasoning and unintended bias. Senior leaders in industry and governemnt are quick to embrace the next big thing (currently ML, or “AI”), but this can come with dire consequences if the subject matter experts do not sufficiently control the hype train to head off uninformed decision making. The conference provided an educated basis to weigh both the positive and negative aspects of ML (e.g., the reality vs. the hype) the next time ML or AI emerges as a topic of discussion. Given the prevalence of ML (or “AI”), more analysts should develop a similar critical foundation.

The increasing prevalence of the Datomic distributed database and its cloud offering via Amazon Web Services is also encouraging. Datomic is a technology that I will be exploring for use with distributed simulation results aggregation, querying, and visualization in the future. The plethora of libraries and services cropping up in the Clojure community (including production systems in industry) only strengthen the case for using Datomic. Its presence as a service on AWS significantly lowers the barrier to experimentation and eventual production use. The general consensus is that Clojure is not only alive and well — after a decade of development and use — but that its practical applications are only growing in industry.

See all the talks: https://www.youtube.com/playlist?list=PLZdCLR02grLpMkEBXT22FTaJYxB92i3V3