Skip to content

Conversation

@semohr
Copy link
Contributor

@semohr semohr commented Oct 14, 2025

This PR completely restructures the MediaFile library from a single monolithic file into a modular, maintainable package architecture. The core functionality remains unchanged while improving code organization, readability, and future maintainability.

Architectural Restructuring

  • Split monolithic mediafile.py (2342 lines) into logical modules:
    • mediafile/__init__.py - Main MediaFile class and public API
    • mediafile/constants.py - Constants and enums (TYPES, ImageType)
    • mediafile/exceptions.py - Custom exception hierarchy
    • mediafile/fields.py - Field descriptor classes (MediaField, DateField, etc.)
    • mediafile/storage/ - Format-specific storage strategies
    • mediafile/utils/ - Utility functions and helpers

Build System & Dependencies

  • Updated to modern Python packaging with pyproject.toml
  • Dropped Python 3.7 and 3.8 support - now requires Python ≥3.9
  • Updated dependency specifications and modernized build configuration
  • Removed Tox in favor of plain GitHub Actions for testing
  • New version: 1.0.0-alpha.1 - reflecting the significant architectural changes
  • Coverage: Code test coverage is now uploaded as artifact (maybe we want to switch to codecov at some point)

To review, try to look at one commit at a time - the changes are organized logically across commits to make the refactoring easier to follow. The source code stayed mostly unchanged: I renamed some functions (removed the underscore _) but other than this it should be functionally identical.

@github-actions
Copy link

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Member

@henry-oberholtzer henry-oberholtzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactor a lot - it's a lot easier to tell what mediafile does and where it does it! I'll let others who have more knowledge on this weigh in too.

@JOJ0
Copy link
Member

JOJ0 commented Oct 21, 2025

Great idea! Restructureing is overdue! I started reading this commit by commit. Thanks for thoroughly doing things in a readable manner! Will try to use it in my main branch too very soon and hope to get back with a review at some point

@snejus
Copy link
Member

snejus commented Oct 21, 2025

This looks good. Since this PR does not change the interface to the world, I think, it doesn't require a major version bump. We'd rather save it for the future when we do add breaking changes.

Are you happy to add .git-blame-ignore-revs?

@semohr
Copy link
Contributor Author

semohr commented Oct 21, 2025

I'm not sure the gitblame is able to pickup the moved files but I can add it.

it doesn't require a major version bump

Are we sure about that? Imports changed all over the place and I also renamed some functions. This will break some upstream consumer. Therefore major increase in the version number.

My idea was to to do an 1.0.0alpha now, than with the typehints do an 1.0.0beta (this will most likely also break some things too). Once we are happy with the changes and are sure everything works as expected we do an 1.0.0 or 1.0.0rc1. Standard pip installs will ignore pre-release versions and will still install the latest stable version.

We'd rather save it for the future when we do add breaking changes.

We can always do another major bump on breaking changes.

@snejus

This comment was marked as resolved.

@snejus
Copy link
Member

snejus commented Oct 25, 2025

Imports changed all over the place

Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in mediafile.py. On the other hand, I think we should add all classes that were previously defined in mediafile module to __all__ - I think I've seen some of them being used in the wild.

I also renamed some functions.

Those seem to have been private (prefixed by _), unless you made them private in the previous PR?

@snejus
Copy link
Member

snejus commented Oct 25, 2025

It seems like commits cbc64e3 and 0c6ccd4 should be moved to #87, right? Presumably, the scope of this PR is restructuring of everything under mediafile folder.

@semohr
Copy link
Contributor Author

semohr commented Oct 25, 2025

It seems like commits cbc64e3 and 0c6ccd4 should be moved to #87, right?

I could move them if you want. I would not put them into the uv PR tho. It is just a simple modernization/version bump of the flit packaging setup. And ofcourse I removed tox.

@semohr
Copy link
Contributor Author

semohr commented Oct 25, 2025

Imports changed all over the place

Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in mediafile.py. On the other hand, I think we should add all classes that were previously defined in mediafile module to __all__ - I think I've seen some of them being used in the wild.

I also renamed some functions.

Those seem to have been private (prefixed by _), unless you made them private in the previous PR?

I dont care too much tbh. Thought it might be a good time for the package to approach an 1.0.0 🤷 No hard feelings either way. We can do an 0.14.0 otherwise.
Btw, the semver spec is a surprisingly good read. https://semver.org/spec/v1.0.0.html#how-do-i-know-when-to-release-100. According to the spec, the first major version should mark the point at which we define the public api for future releases.

@semohr
Copy link
Contributor Author

semohr commented Oct 25, 2025

Those seem to have been private (prefixed by _), unless you made them private in the previous PR?

It's indeed the only renaming I have done. We should be fine here 🤞

@semohr
Copy link
Contributor Author

semohr commented Oct 25, 2025

Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in mediafile.py. On the other hand, I think we should add all classes that were previously defined in mediafile module to all - I think I've seen some of them being used in the wild.

You mean all of them 🤣 ? It was only one file beforehand.

@JOJ0
Copy link
Member

JOJ0 commented Oct 26, 2025

Imports changed all over the place

Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in mediafile.py. On the other hand, I think we should add all classes that were previously defined in mediafile module to __all__ - I think I've seen some of them being used in the wild.

I also renamed some functions.

Those seem to have been private (prefixed by _), unless you made them private in the previous PR?

I dont care too much tbh. Thought it might be a good time for the package to approach an 1.0.0 🤷 No hard feelings either way. We can do an 0.14.0 otherwise. Btw, the semver spec is a surprisingly good read. https://semver.org/spec/v1.0.0.html#how-do-i-know-when-to-release-100. According to the spec, the first major version should mark the point at which we define the public api for future releases.

I read this. Thanks for the recommendation. mediafile should have been 1.something a long time ago already IMHO. I think it doesnt really matter what version you give after merging this. I think 1.00-alpha-something is a good idea!

Let's aim for the important part: 1.0.0 should be with new packaging (which then should stay the normal/stable way of packaging aka a stable public api / usage, reading a little bit between the lines this is what the guide tells me!)

Also I think we should merge this one quickly and release it. if it's an alpha, it's ok if something is not ok 100% yet. Even though I think it might be fine already. Am I right that actual changes are:

  • Refactor Exceptions
  • Rename some methods to _something

?

@semohr
Copy link
Contributor Author

semohr commented Oct 26, 2025

Also I think we should merge this one quickly and release it. if it's an alpha, it's ok if something is not ok 100% yet. Even though I think it might be fine already. Am I right that actual changes are:
Refactor Exceptions
Rename some methods to _something

@JOJ0 Yes you are right here. These should be as far as I remember the only changes. To give a bit of context: I started to fix the exceptions and than thought f* it let's restructure all of mediafile. If we want to review this separately I can move it out.

@JOJ0
Copy link
Member

JOJ0 commented Oct 26, 2025

IMHO moving out not necessary. @snejus agrees too?

Copy link
Member

@JOJ0 JOJ0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote vor merging this and bumping to 1.0.0-alpha.1. I would be happy with 0.14.0 too. The goal should be to work on the packaging PR and aim for 1.0.0

Any concerns with merging this already @snejus @henry-oberholtzer?

super().__init__(filename, message)


__all__ = ["UnreadableFileError", "FileTypeError", "MutagenError"]
Copy link
Member

@JOJ0 JOJ0 Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had copilot claude sonnet 4 review if there are any exceptions we are not testing yet (since exceptions are the main thing that actually changed here). The only things it found don't seem to be too important (at least to my understanding, correct me if you think something makes sense to add):

Missing Test Coverage:
The only gaps I found are:

NotImplementedError in MP4BoolStorageStyle: No tests explicitly verify that calling get_list() or set_list() on MP4 boolean fields raises NotImplementedError

ValueError in add_field() method: While the MP4 image format ValueError is tested, there don't appear to be explicit tests for the field validation errors in the add_field() method

@henry-oberholtzer
Copy link
Member

No concerns here, I think it's good to get activity going on Mediafile again, and it'll be easier to fix any issues that do arise with this refactor.

Copy link
Member

@snejus snejus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean all of them 🤣 ? It was only one file beforehand.

@semohr it was indeed! That's why everything that was publicly defined in this file needs to be added to __all__ within mediafile/__init__.py.

$ grep <mediafile.py -E '^(class|def) [^_]'
class UnreadableFileError(Exception):
class FileTypeError(UnreadableFileError):
class MutagenError(UnreadableFileError):
def mutagen_call(action, filename, func, *args, **kwargs):
def loadfile(method=True, writable=False, create=False):
def image_mime_type(data):
def image_extension(data):
class ImageType(enum.Enum):
class Image(object):
class StorageStyle(object):
class ListStorageStyle(StorageStyle):
class SoundCheckStorageStyleMixin(object):
class ASFStorageStyle(ListStorageStyle):
class MP4StorageStyle(StorageStyle):
class MP4TupleStorageStyle(MP4StorageStyle):
class MP4ListStorageStyle(ListStorageStyle, MP4StorageStyle):
class MP4SoundCheckStorageStyle(SoundCheckStorageStyleMixin, MP4StorageStyle):
class MP4BoolStorageStyle(MP4StorageStyle):
class MP4ImageStorageStyle(MP4ListStorageStyle):
class MP3StorageStyle(StorageStyle):
class MP3PeopleStorageStyle(MP3StorageStyle):
class MP3ListStorageStyle(ListStorageStyle, MP3StorageStyle):
class MP3UFIDStorageStyle(MP3StorageStyle):
class MP3DescStorageStyle(MP3StorageStyle):
class MP3ListDescStorageStyle(MP3DescStorageStyle, ListStorageStyle):
class MP3SlashPackStorageStyle(MP3StorageStyle):
class MP3ImageStorageStyle(ListStorageStyle, MP3StorageStyle):
class MP3SoundCheckStorageStyle(SoundCheckStorageStyleMixin, MP3DescStorageStyle):
class ASFImageStorageStyle(ListStorageStyle):
class VorbisImageStorageStyle(ListStorageStyle):
class FlacImageStorageStyle(ListStorageStyle):
class APEv2ImageStorageStyle(ListStorageStyle):
class MediaField(object):
class ListMediaField(MediaField):
class DateField(MediaField):
class DateItemField(MediaField):
class CoverArtField(MediaField):
class QNumberField(MediaField):
class ImageListField(ListMediaField):
class MediaFile(object):

Remember the issue that I fixed in confuse where LazyConfig was missing from __all__ which broke beets?

See this comment as well: #86 (comment)

@semohr
Copy link
Contributor Author

semohr commented Oct 28, 2025

That's the reason why I want to do a proper 1.0.0 release to properly define the public api. See here. I dont think we need to care about this too much if we do a major release.

See this comment too: #86 (comment)

@snejus
Copy link
Member

snejus commented Oct 28, 2025

That's the reason why I want to do a proper 1.0.0 release to properly define the public api. See here. I dont think we need to care about this too much if we do a major release.

I did not realise that you intended to break the public API here 😅. Major version upgrade makes sense in such case.

However, I am not at all convinced that we should do this, given that we simply need to extend __all__ definition to not break it.

A couple of repositories that depend on something that is not currently included in __all__:

$ gh search code --extension=py mediafile.MediaField --json=repository | jq '.[].repository.url' -r | grep -v beets_pr
https://github.com/beetbox/beets
https://github.com/Neurrone/beets-audible
https://github.com/adamjakab/BeetsPluginXtractor
https://github.com/jaeheonshim/vibenet
https://github.com/clinton-hall/nzbToMedia
https://github.com/kergoth/beets-kergoth
https://github.com/SimonTeixidor/beets-plugins
https://github.com/Morikko/beets-multivalue
https://github.com/AlexChalk/dotfiles
https://github.com/mgoltzsche/beets-ytimport
https://github.com/tweitzel/beets-recordingdate
https://github.com/rembo10/headphones
https://github.com/adamjakab/BeetsPluginGoingRunning
https://github.com/Multipixelone/beets-plugins
https://github.com/grantoesterling/beets-rym
https://github.com/bbaserdem/NixOS-Config
https://github.com/EcoG-One/Web-Media-Server-and-Player
https://github.com/beetbox/beets

Not all of them define a strict mediafile = "^1.0..." requirement, and beets is one of them that don't:

mediafile = ">=0.12.0"

@semohr
Copy link
Contributor Author

semohr commented Oct 28, 2025

That's the reason why I want to do a proper 1.0.0 release to properly define the public api. See here. I dont think we need to care about this too much if we do a major release.

I did not realise that you intended to break the public API here 😅. Major version upgrade makes sense in such case.

However, I am not at all convinced that we should do this, given that we simply need to extend __all__ definition to not break it.

A couple of repositories that depend on something that is not currently included in __all__:

$ gh search code --extension=py mediafile.MediaField --json=repository | jq '.[].repository.url' -r | grep -v beets_pr
https://github.com/beetbox/beets
https://github.com/Neurrone/beets-audible
https://github.com/adamjakab/BeetsPluginXtractor
https://github.com/jaeheonshim/vibenet
https://github.com/clinton-hall/nzbToMedia
https://github.com/kergoth/beets-kergoth
https://github.com/SimonTeixidor/beets-plugins
https://github.com/Morikko/beets-multivalue
https://github.com/AlexChalk/dotfiles
https://github.com/mgoltzsche/beets-ytimport
https://github.com/tweitzel/beets-recordingdate
https://github.com/rembo10/headphones
https://github.com/adamjakab/BeetsPluginGoingRunning
https://github.com/Multipixelone/beets-plugins
https://github.com/grantoesterling/beets-rym
https://github.com/bbaserdem/NixOS-Config
https://github.com/EcoG-One/Web-Media-Server-and-Player
https://github.com/beetbox/beets

Not all of them define a strict mediafile = "^1.0..." requirement, and beets is one of them that don't:

mediafile = ">=0.12.0"

This is what you can expect if you don't use lockfiles and do not fix major releases tho. It was planning an alpha release anyways which should not affect these packages in any way. There will probably be quite some more changes if I introduces mypy here too.

Also we already had an __all__ which obviously defines the current "public" api. These exports have not changed so we should be fine on this end too.

@snejus
Copy link
Member

snejus commented Oct 28, 2025

It indeed doesn't break! I guess, confuse broke because it had

# confuse/__init__.py
from core import *

And the star import picked up only the members defined in __all__.

Could you explain why do we have these imports in __all__ and nothing else?

__all__ = [
    "UnreadableFileError",
    "FileTypeError",
    "MediaFile",
    "Image",
    "TYPES",
    "ImageType",
]

@JOJ0
Copy link
Member

JOJ0 commented Oct 29, 2025

I did not realise that you intended to break the public API here 😅. Major version upgrade makes sense in such case.

I think you have a misunderstanding here. The semver guide states "1.0.0 DEFINES" the public API, I think that is what @semohr tries to do here. He didn't mention "breaking"

@semohr
Copy link
Contributor Author

semohr commented Oct 29, 2025

Could you explain why do we have these imports in all and nothing else?

It was like this before. Thought it might be useful to have the types exported too so I added them.

Even tho __all__ practically only defines how an from mediafile export * is handled it also serves as a curated list of names (functions, classes, variables) that constitute the module’s public API. This helps users identify intended entry points and prevents reliance on internal implementation details.

@snejus
Copy link
Member

snejus commented Nov 1, 2025

It was like this before. Thought it might be useful to have the types exported too so I added them.

I didn't realise this! This probably got lost in the big diff, and it showed it as if it was added for the first time 🤦🏼

Well, if that's the case then there's barely any change compared to what we had before. Let's merge this in - feel free to ignore the failing build which will get fixed in the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants