Skip to content
View koaning's full-sized avatar

Block or report koaning

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
koaning/README.md
πŸ™‚ Vincent D. Warmerdam
┣━━ πŸ“¦ Open Source Packages
┃   ┣━━ scikit-lego       - lego bricks for sklearn
┃   ┣━━ drawdata          - draw datasets in jupyter
┃   ┣━━ embetter          - embeddings ready for sklearn
┃   ┣━━ uvtrick           - run functions in external venvs via uv
┃   ┣━━ mktestdocs        - turn markdown files into pytest tests
┃   ┣━━ wigglystuff       - extra notebook widgets
┃   ┣━━ mohtml            - Pythonic HTML (for Marimo)
┃   ┣━━ smartfunc         - turns docstrings into LLM-functions
┃   ┣━━ dicekit           - domain specific interface for dice
┃   ┣━━ taskhut           - basic task routing for annotation
┃   ┣━━ diskdantic        - a mini ORM for files on disk
┃   ┣━━ pbt               - domain specific interface for dice
┃   ┣━━ human-learn       - rule-based components for sklearn
┃   ┣━━ doubtlab          - suite of tools to help find bad labels
┃   ┣━━ simsity           - dead simple vector 'database'
┃   ┣━━ lazylines         - lightweight utils for .jsonl wrangling
┃   ┣━━ fh-matplotlib     - matplotlib for FastHTML
┃   ┣━━ fh-altair         - altair for FastHTML
┃   ┣━━ durations         - pytest duration insights
┃   ┣━━ tuilwindcss       - tailwindcss for textual tui apps
┃   ┣━━ sentence-models   - a different take on textscat
┃   ┣━━ memo              - saves a whole log of time
┃   ┣━━ scikit-partial    - partial_fit() pipelines for sklearn
┃   ┗━━ scikit-bloom      - bloom transformers for sklearn
┣━━ πŸ‘ Larger Project Contributions
┃   ┣━━ fairlearn         - contributed the CorrelationFilter
┃   ┣━━ polars            - contributed the .pipe() method
┃   ┗━━ BERTopic          - added lightweight sklearn pipeline support
┣━━ ⭐ Online Projects
┃   ┣━━ calmcode.io       - intermediate developer education
┃   ┗━━ koaning.io        - personal blog
┣━━ πŸŽ™οΈ Popular Talks
┃   ┣━━ Natural Intelligence is All You Need
┃   ┣━━ Group-by statements that save the day
┃   ┣━━ Tools to Improve Training Data
┃   ┣━━ Optimal on Paper, Broken in Reality
┃   ┣━━ Playing by the Rules-Based-Systems
┃   ┣━━ How to Constrain Artificial Stupidity
┃   ┣━━ The Profession of Solving the Wrong Problem
┃   ┣━━ Winning with Simple, even Linear, Models
┃   ┗━━ Untitled12.ipynb
┣━━ πŸ”¬ Random Experiments
┃   ┣━━ narlogs        - logs all dataframe pipelines
┃   ┣━━ scikit-prune   - prune scikit learn pipelines
┃   ┣━━ gitlit         - tracking github action times across open source
┃   ┣━━ sentimany      - many sentiment models, one repo
┃   ┣━━ tokenwiser     - sklearn token tricks
┃   ┣━━ clumper        - functional API for lists of dicts
┃   ┣━━ whatlies       - exploration tools for word embeddings
┃   ┣━━ skedulord      - makes cron a bit more fun
┃   ┣━━ icepickle      - cool and safe storage for linear models
┃   ┣━━ bulk           - simple bulk labelling interface
┃   ┣━━ evol           - grammar for genetic heuristics
┃   ┗━━ flowshow          - over the top logging decorator
┗━━ πŸ‘¨β€πŸ’» Employer
    ┣━━ πŸ€ marimo      - better Python notebooks
    ┃   ┣━━ mofresh           - Refresh marimo cells remotely
    ┃   ┣━━ mopaint           - MS paint notebook widget
    ┃   ┣━━ moterm            - Chainable terminal notebook widget
    ┃   ┣━━ mobuild           - Build Python pkgs from marimo notebook
    ┃   ┣━━ mopad             - Gamepad support for Python notebooks
    ┃   ┣━━ motalk            - Webspeechkit for Python notebooks
    ┃   ┗━━ datasette-marimo  - datasette plugin for marimo
    ┣━━ 🎲 :probabl.   - scikit-learn and friends
    ┃   ┣━━ scikit-churn      - safety rails for churn work
    ┃   ┣━━ scikit-playtime   - rethinking pipelines
    ┃   ┗━━ scikit-mdn        - mixture density networks
    ┣━━ πŸ’₯ Explosion   - developer tools for nlp
    ┃   ┣━━ prodigy-hf        - Prodigy integration for the HuggingFace stack
    ┃   ┣━━ prodigy-pdf       - Annotate PDFs via Prodigy
    ┃   ┣━━ prodigy-ann       - ANN techniques to find relevant subsets
    ┃   ┣━━ prodigy-segment   - Prodigy integration for Segment Anything
    ┃   ┣━━ prodigy-lunr      - Search techniques to find relevant subsets
    ┃   ┣━━ prodigy-whisper   - Transcribe audio with OpenAI's whisper models
    ┃   ┣━━ prodigy-tui       - Prodigy from the terminal
    ┃   ┗━━ cluestar          - inspiration for your first text labels
    ┗━━ πŸ€– Rasa        - conversational software provider
        ┣━━ nlu examples      - custom nlu components for Rasa
        ┣━━ taipo             - data augmentation tools
        ┗━━ algo whiteboard   - nlp education

Follow me on twitter @fishnets88

Pinned Loading

  1. scikit-lego scikit-lego Public

    Extra blocks for scikit-learn pipelines.

    Python 1.4k 121

  2. embetter embetter Public

    just a bunch of useful embeddings for scikit-learn pipelines

    Python 519 16

  3. human-learn human-learn Public

    Natural Intelligence is still a pretty good idea.

    Jupyter Notebook 822 57

  4. drawdata drawdata Public

    Draw datasets from within Python notebooks.

    JavaScript 1.6k 144

  5. wigglystuff wigglystuff Public

    A collection of creative AnyWidgets for Python notebook environments

    JavaScript 111 9

  6. mktestdocs mktestdocs Public

    Run pytest against markdown files/docstrings.

    Python 145 10