π€Έπ»ββοΈ π€Έπ»ββοΈ π€Έπ»ββοΈ
π§ͺ Experiment write-ups
- 25-39. Quantified learning Part I
- 21-24. Test effects of qi gong on people and objects
- 20. AI interface to generate flashcards about Wikipediaβs vital articles (version 1)
- 19. Recreate SWE-bench to collect real-world programming evaluation data
- 18. bmo.cafe for automating tasks
- 17. Try Hume.ai - an empathetic AI
- 16. Learn how OpenAI does public evals
- 15. Learn morse code with chunking and spaced repetition
- 14. Build a simple scalar-based neural network
- 13. Memorize 111 experiments using Avatar the Last Airbender
Larger umbrella of experiments:
Experiment System Changelog if youβre interested in the meta of how Iβm running these experiments!
Older experiments
These write-ups were created by running my written notes through ChatGPT.
- 12. Claude vs. GPT on GSM8K math benchmarks - How do they compare? π’
- 11. What does a useful parking ticket payer agent look like? π΄
- 10. GPT vs. Gemini - What does a minimalistic user interface for model comparison look like? π’
- 9. How might a set of AI models improve code generation through iterative code and test set generation? π’
- 8. What does a good translation interface look like for same language βtranslationβ across contexts? π΄
- 7. How is the writing quality of GPT-4 + RAG for querying over my Obsidian notes? π΄
- 6. What is it like to convert a Next.js app to PWA? π’
- 5. What does a suite of personal GPTs on ChatGPT look like for content like research proposals and social media posts? π
- 4. What does it look like to automating UI Improvements with GPT-Vision? π’
- 3. How accurate is RAG + GPT-4 for an orthopedic surgeon? π’
- 2. What would a knowledge graph of science look like, generated recursively entirely by GPT-4? π’
- 1. How do I track my hours in a way that feels good? π
- 0. Will I be able to build this site in 1 day, where Iβll stick with it? π΄
Proposals
These are proposals that Iβve been working on
- Back-translation for Alignment of LLM Generation of Code, Tests, and more
- Proposal - Crowdsourcing a universal browser automation task graph with continuously running tests
- AI Theorem Proving for International Math Olympiad Problems
- βLLM Operating Systemβ
- Explainable Protocols for 2x+ Retention Improvements to Human Long-Term Memory
- What does a better knowledge graph structure for enhancing human learning alongside LLMs look like?
- What does an configurable format for a single AI agentβs profile look like?
- What does a configurable format look like to represent essential data for a single person?