Skip to content

Refactor FST algorithms to be exclusively non-mutating instance methods #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 24, 2025

Conversation

michaelpginn
Copy link
Collaborator

@michaelpginn michaelpginn commented Jun 19, 2025

Overview

Refactor (once more) with the following major breaking changes:

  • The majority of FST functionality has been removed from algorithms, which now only contains algorithms such as dijkstra which take an FST and compute some value (and do not return a new FST).
  • All other functionality is now in the FST class again. All methods are instance methods which return a new FST and do not mutate the original FST. See below for rationale.
  • The State and Transition implementation has been extracted to separate modules.

Rationale

PyFoma historically had both mutating and non-mutating methods, which was formalized in the prior refactor. However, this can introduce confusion, particularly for methods like fst1.compose(fst2) where it is unintuitive that fst1 will be mutated.

Mutating methods are already discouraged and possibly being deprecated in comparable frameworks such as Pandas and PyTorch. They can lead to silent logic errors due to unintentional mutations, particularly in environments such as Jupyter notebooks where cells may be run several times.

Furthermore, this refactor removes the dynamically-created methods, simplifying the implementation and making it easier to use things like static type checkers.

The counterargument is that mutation can be desirable in some cases when minimizing memory usage is critical. In these cases, the following pattern is appropriate:

fst.become(fst.invert())

This does not entirely avoid creating copies, but it does avoid keeping these copies in memory. Furthermore, many of the mutating methods previously depended on become anyway, so this approach is equivalent in many cases.

@michaelpginn michaelpginn requested a review from mhulden June 19, 2025 06:52
@michaelpginn michaelpginn linked an issue Jun 19, 2025 that may be closed by this pull request
@michaelpginn michaelpginn added this to the 2.0.0 milestone Jun 19, 2025
@bluebear94
Copy link
Contributor

+1 in terms of readability for relying less on dynamically adding instance methods to FST. I’m a little worried about performance, since I have one FST that takes over 10 minutes to build.

@michaelpginn
Copy link
Collaborator Author

michaelpginn commented Jun 19, 2025

+1 in terms of readability for relying less on dynamically adding instance methods to FST. I’m a little worried about performance, since I have one FST that takes over 10 minutes to build.

@bluebear94 Would you be willing to share an example large FST so I can profile before and after? My sense is that the actual speed shouldn’t be too different, but the memory impact might be of concern.

@bluebear94
Copy link
Contributor

It’s the DeduplicateTypeIOCMedial rule from the 9n-proto repository, which implements deduplication of non-initial Type I oginiþe cfarðerþ in Ŋarâþ Crîþ v9n.

For context, Ŋarâþ Crîþ is my constructed language, with v9n being a work-in-progress revision. Type I oginiþe cfarðerþ involves two identical consonants separated by a vowel (to simplify things greatly), and this rule changes the first consonant to something different. The rule uses a custom rewriting function because the deduplication needs to propagate from end to start, and all the rules need to be applied in parallel. For that reason, I can’t split this rule into smaller ones, so the caching system I have is useless for it.

@mhulden mhulden merged commit 7499af3 into main Jun 24, 2025
1 check passed
@michaelpginn michaelpginn deleted the refactor-3 branch June 24, 2025 17:10
@dhdaines
Copy link
Contributor

dhdaines commented Jul 9, 2025

Thanks for this! It's a big change, but the mutating nature of the methods and the magic used to create them always bothered me.

I also have a rather large FST that I can test, though it is so big that I'm no longer using pyfoma to compile it, but a bridge to rustfst. But I will test this anyway.

(the bridge to rustfst / openfst / at&t could be contributed in the near future...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutating operators
4 participants