Ben Lambert

Follow me on GitHub

ResumeProjectsPublications

Current Projects

Although I can’t speak about specifics, I am involved in some of the following activities:

  • Building retrieval-augmented generation systems for question answering.
  • ETL pipelines with Spark in the Databricks environment.
  • Computer vision for automotive applications.
  • Various data science and engineering projects.
  • Tools: Python, Spark, SQL, and many others!

Prior Open Source Projects

Combining Argo Workflows with Great Expectations

  • Wrote a medium article (Link) about using Great Expectations for data validation using Argo Workflows.
  • Tools: Argo Workflows, Great Expectations, Kubernetes, Docker, Python

Concurrent Sequence Preprocessing

  • In molecular biology workflows one must commonly compute all possible k-mers from a biological sequence. A k-mer is a string of length k that occurs naturally in the sequence. For example a 4-mer could be ‘ATCG’.
  • This can be a very slow process, but using Golang I was able to create a very performant concurrent k-mer pre-processing tool. (Link)
  • Tools: Golang

Multi-resolution Local Binary Patterns

  • MrLBP is a technique that can be used as a precursor to binning microbial genomes. I found that the published version of this approach was quite slow, so I re-implemented it in Golang and took advantage of its great concurrency model. (Link)
  • Tools: Golang

Ph.D. Research

  • During my Ph.D. work I generated mass amounts of microscopy data. I then used MATLAB to automatically process the images and count cells.
  • To analyze environmental data I used a combination of R ecology packages and custom MATLAB scripts.
  • I also had the opportunity to do some robotics work and created underwater instruments that carried out fluid-handling operations. All control was carried out with an Arduino.
  • Tools: MATLAB, Arduino, R, SolidWorks