Product Experiments

poison-web

AI Model Payload Injection Toolkit - Research tool for embedding invisible payloads in web content. Because sometimes the most dangerous threats are the ones you can't see coming.

GitHub Repo

AI Research

foresight-forge

Daily forecasting pipeline that ingests the news, distills the signal, publishes probabilistic predictions, and threads community feedback into a Markdown briefing.

GitHub Repo

AI Product

Varro

Meta-learning with Qwen using GSPO and MLX for efficient training on Apple Silicon. Built a framework that teaches AI models to learn faster by learning how to learn—because the best AI is one that can teach itself new tricks.

GitHub Repo

AI Research

Mneme

Human research behavior is complex—so built Mneme to capture it. Transforms raw browsing data into GRPO-ready trajectories that teach AI agents to replicate how humans actually research, because the best AI learns from real human patterns.

GitHub Repo

AI Product

Bridge

Managing AI agents shouldn't require a PhD in prompt engineering. Built Bridge as Mission Control for AI fleets—a CEO-appropriate interface that makes orchestrating AI agents as simple as managing a team of humans.

GitHub Repo

AI Product

Branch

Google Docs are great for collaboration, but they're not Git-friendly. Built Branch to sync Google Docs into local Git repositories—because version control shouldn't be limited to code. Now you can diff, blame, and branch your documents like any other source code.

GitHub Repo

Tool Product

Livy

Drowning in documents but need a clear report? Built Livy to read the mountain, find the gold, and draft the summary. Because your brainpower is too precious for CTRL+F.

GitHub Repo

AI Product

EthicsBench

Analysing ethical behaviour of LLMs. Because AI that can't tell right from wrong is just a very smart sociopath waiting to happen.

GitHub Repo

AI Eval

freechat

Chat apps are everywhere, but what if they could pay for themselves? Built an AI chat platform that seamlessly integrates contextual advertising—because great conversations shouldn't cost a fortune to host.

GitHub Repo

AI Product

Agentic_search

Claude-powered research assistant that blends semantic RAG discovery with tool-driven deep dives and auto-evaluation loops, so questions come back with sourced, defensible answers.

GitHub Repo

AI Product

PD-platform

Built a web interface that transforms natural language queries into SQL using a purpose-built DSL. Connect to any database, ask questions in plain English, and get instant insights powered by LLMs.

GitHub Repo

AI Product

Walter

Built an AI writer that learns from social media feeds using GRPO reinforcement learning. The cool part? It teaches itself to write posts that actually resonate with real audiences by learning from what works.

GitHub Repo

AI Product Research

ParaLLM

Watching LLMs process one by one felt like a dial-up flashback. Built ParaLLM to let them all run wild in parallel—because real AI work needs warp speed.

GitHub Repo

AI Product

Galen

Life sciences need task-level evals, so Galen runs mission-style workflows—data extraction, analysis, reporting—to show which LLMs can actually keep up in the lab.

GitHub Repo

AI Eval

GRPO Poker

Curious if a tiny 0.5B LLM could bluff its way through poker with some RL coaching. Turns out, even small models can learn a mean poker face.

Gist

AI Research

agenticsloprank

Uses SlopRank and YamLLMs to rank LLM agents. Because not all AI agents are created equal—some are just sloppier than others.

GitHub Repo

AI Eval

LLM Poker

LLM evaluation: To see how LLMs *really* stack up at playing poker, I built a casino where AIs (and you!) can go all-in. Because strategy is the ultimate LLM test.

GitHub Repo

AI Eval

LLMRank

Rankings are often shallow, just leaderboards lacking depth—so LLMRank lets models critically judge each other. Because when everyone's good, you need nuance to know who's great.

GitHub Repo

AI Eval

task_evals_v2

Task specific industry evaluations. Because generic benchmarks are like using a butter knife for brain surgery—you need the right tool for the job.

GitHub Repo

AI Eval

CATransformer

Cellular Automata are mesmerizingly complex, but predicting their next move is incredibly difficult. So I trained a transformer to crack it, then had it evolve even smarter versions of itself, to test how new model ensembles could be created.

GitHub Repo

AI Research

Task Evals (Private)

LLM evaluation: Picking the right LLM for a job is often a guessing game. Built a rigorous bootcamp for models: custom evals, RAG, DB hookups, the works. Because choosing your AI shouldn't be a shot in the dark.

GitHub Repo

AI Research

LLM text inspector

LLM evaluation: LLM outputs can be slick, but how *good* are they, really? Created a linguistic detective kit to dissect their prose. Because style *and* substance matter.

GitHub Repo

AI Eval

LOOP Evals

LLM evaluation: LLMs are great at essays, but can they *reason*? So I created an env to throw Wordle, Sudoku, and other logic puzzles at them with LOOP Evals. Because true smarts mean more than just smooth talk.

GitHub Repo

AI Eval

Twain

Everyone wants AI to tell stories, but most attempts are... meh. Coaxed an AI to write surprisingly decent six-part tales with Twain. Because storytelling is an art, even for machines.

GitHub Repo

AI Product

ReflectGPT

LLMs often just steamroll ahead, even when they're wrong. Gave them a 'pause and rethink' button. ReflectGPT lets them catch their own blunders and try again—because real intelligence means admitting you messed up.

GitHub Repo

AI Research

galen-evals

Coworker for Life Sciences. Because even AI needs to prove it can handle the complex world of biology before we trust it with our health.

GitHub Repo

AI Eval

Fab AI Evals (Private)

LLM evaluation: Advanced manufacturing needs AI, but generic evals are quite bad. Designed custom benchmarks to find the sharpest AI tools for the factory floor. Because precision matters, from code to an assembly line.

GitHub Repo

AI Eval Product

Shipping Tycoon: Jones Act

The Jones Act is famously complex—so, I had a bit of fun turning it into a strategy game. Navigating bureaucracy has never been so (intentionally) frustratingly fun.

GitHub Repo

AI Product

QCGOL

Conway's Game of Life is a classic. What happens when you give it a quantum spin? QCGOL explores that rabbit hole.

GitHub Repo

AI Research

Prof

Investment analysis is tough. Sketched out 'Prof,' an AI mentor to train aspiring analysts. The core idea? Even complex finance can be taught with smart AI.

GitHub Repo

AI Research

fomodoro

The Pomodoro timer is great, but felt a bit... analog. Infused it with LLM smarts to create Fomodoro—a focus tool that watches your computer and makes sure you're staying on task, the way a good productivity tool should.

GitHub Repo

AI Tool

Autotune_GPT

What if an LLM could teach itself to be better? Autotune_GPT is that feedback loop: AI improving AI.

GitHub Repo

AI Research

Audiochat

Computers got boring, so I made mine talk back. Turns out an AI voice that's sharp, helpful, and just a bit snarky made working a lot more fun.

GitHub Repo

AI Product

Mini VC

The VC world is a maze. Built 'Mini VC' to simulate the hustle and find patterns in the chaos. Even a simulated investor needs a good thesis.

GitHub Repo

AI Research

test_site

Testing ground for new ideas and experiments. Because every great project starts with a simple test.

GitHub Repo

Experiment

gpt-chrome-extension

We need new form factors to use AI. Made a Chrome extension that brings GPT smarts to any webpage—highlight, click, understand.

GitHub Repo

AI Tool

Loopy

Before 'agents' were all the rage, there was Loopy: an early experiment in getting LLMs to think, observe, and act.

GitHub Repo

AI Research

Slackbot

Needed a way to automate Slack chores and actually *use* the data. Built a bot that not only messages but also neatly logs everything for analysis.

GitHub Repo

AI Tool

Repo reader

Jumping into a new Python codebase can be daunting. Repo Reader uses GPT to give you the lay of the land—like a friendly AI guide for unfamiliar code.

GitHub Repo

AI Tool

Network Analysis

How does information *really* flow and grow? Dove into network analysis to see how connections and shapes impact our collective brainpower.

GitHub Repo

AI Research

scraper

Scrape a website and all sub-domains. Because sometimes you need to see what's hiding beneath the surface.

GitHub Repo

Tool

Alignment

Getting LLMs to play by the rules is a big deal (and a fun puzzle). A cheeky way to keep their creativity in check—maybe even accidentally solved alignment. Mostly kidding. Maybe.

GitHub Repo

AI Research

Autocoder

Also before agents were du jour: What if code could write itself... and then actually run? Autocoder was my dive into that meta-dream. Because the ultimate dev tool builds itself.

GitHub Repo

AI Research

Growth and innovation

Growth and innovation are the engines of progress, but what *really* drives them? Dug into the data to find out. Because understanding the past is key to building a cooler future.

GitHub Repo

AI Research

Company chat

Company knowledge felt scattered across Docs and Slack. Built a system to hoover it all up, pop it in a DB, and let you ask it anything.

GitHub Repo

AI Tool

GPT-search

Wanted GPT answers with actual Google-backed sources. GPT-Search was born. Yeah, should've turned it into Perplexity—hindsight's 20/20, but still a cool hack!

GitHub Repo

AI Tool

file-sumamriser

Choose a file, extract text from it, and recursively summarise. Because sometimes you need to turn a novel into a tweet.

GitHub Repo

Tool

Basics

Some basic calls to use GPT I wanted to put in one place so I don't have to call them again and again. Because reinventing the wheel is so 2022.

GitHub Repo

Tool

Product experiments

Pinned

poison-web

foresight-forge

Varro

Mneme

Bridge

Branch

Livy

EthicsBench

freechat

Agentic_search

PD-platform

Walter

ParaLLM

Galen

GRPO Poker

agenticsloprank

LLM Poker

LLMRank

task_evals_v2

CATransformer

Task Evals (Private)

LLM text inspector

LOOP Evals

Twain

ReflectGPT

galen-evals

Fab AI Evals (Private)

Shipping Tycoon: Jones Act

QCGOL

Prof

fomodoro

Autotune_GPT

Audiochat

Mini VC

test_site

gpt-chrome-extension

Loopy

Slackbot

Repo reader

Network Analysis

scraper

Alignment

Autocoder

Growth and innovation

Company chat

GPT-search

file-sumamriser

Basics