Skip to content

Support static data tables in ecto.dump mix task #419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Support static data tables in ecto.dump mix task #419

wants to merge 1 commit into from

Conversation

maennchen
Copy link
Contributor

In progress, depending on outcome of https://groups.google.com/g/elixir-ecto/c/FtyKyZWJGjI

@josevalim
Copy link
Member

Sorry for the delay on the replies. Your email got stuck on the group spam filter and then flagged on my own inbox as spam later. :/

Can you please send a PR with the flag approach? I think it is less internal bookkeeping for us to take care of.

@maennchen
Copy link
Contributor Author

@josevalim Yeah, something with that message is strange. I don't see why it would be rated as spam based on content and the sender is a google account as well...

I've added support for flags in the Mix.Task. I did however not remove the config value description. The reasoning for this is that all configs are passed through automatically:

config = Keyword.merge(repo.config(), opts)

This means, that without specifically excluding the value, it will be supported via config.

Therefore the only question is if we should officially document & support it or not. I thought that if it is already supported, then we should better document it properly.

How would you like me to proceed?

  • Remove documentation
  • Remove documentation and exclude value being passed through from the config
  • Leave as is

]

@switches [
dump_path: :string,
quiet: :boolean,
repo: [:string, :keep],
no_compile: :boolean,
no_deps_check: :boolean
no_deps_check: :boolean,
additional_data_tables: [:string, :keep]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim Here's the flag

@josevalim
Copy link
Member

Oh, sorry, there was a misunderstanding. I don't mind as Mix.Task flags, but rather implementing the feature through generic flags:

config :my_app, MyApp.Repo, dump_flags: ~w(--data-table foo --data-table bar)

The idea is that PG/MySQL do not have to know about this feature, instead we just append the given flags. WDYT?

@maennchen
Copy link
Contributor Author

@josevalim That solution would really depend on the implementation of the specific dump commands.

Already with MySQL / Postgres, exporting the whole schema and only specific tables can only be achieved by calling the command twice. (Once for the schema, once for the table)

It seems that in SQlite, one would need to call the command once per export. (See https://github.com/elixir-sqlite/ecto_sqlite3/blob/e118505193320388cd2a65c264baf754b5ae056b/lib/ecto/adapters/sqlite3.ex#L458)

If going with a more generic approach as opposed to supporting dumping specific tables, I think we would need to provide a better interface than just exposing arguments to the command used for dumping stuff.

@josevalim
Copy link
Member

I see, thanks! In this case, I don't see a reason to have this as part of Ecto. It is more for us to maintain and it would be implemented by a custom alias that dumps and then adds information to the dump, right?

@maennchen
Copy link
Contributor Author

@josevalim A custom alias was my first idea on how to solve this. Unfortunately that means copying a lot of code since all the functions are private.

I believe this to be a helpful and simple addition that is worth its maintenance.

Would you be open to posting about this in the Elixir Forum to see if the community is interested in this feature or if it is just me?

@josevalim
Copy link
Member

Why do you need to copy code? Can't you append to the generated file after the initial dump?

@maennchen
Copy link
Contributor Author

@josevalim I could append manually to the file for sure, but then I'll have to do that manually with every dump.

We normally update the dump regularly since it takes quite some time to run all the migrations to speed up development.

If I want to automate the export, I would have to create an alias / mix task and copy most of the dumping stuff from the Postgres adapter into it. (All relevant parts of the adapter are private)

(We have ecto.load inside the ecto.reset alias and normally do an ecto.dump before every release.)

@josevalim
Copy link
Member

josevalim commented Jun 29, 2022

Sorry, I don’t get the concern with updating the file. The location of the file is public, you can append your contents to the file without concerns. Either the task or the alias can call the existing dump task.

@maennchen
Copy link
Contributor Author

@josevalim It is basically the same concern as not supporting the dump command at all:

The dump command solves a few issues like authenticating the pg_dump command etc. automatically. It could also be done by hand but it is just a lot more convenient using it.
The same applies to this issue: I can either write some elixir code that does the same as the adapter (Reading the config, setting up auth for the command etc.) or I'll have to do that manually.

I thought that this addition might save some people some time / copying code. This is not that important though and I don't want to steal more of your time. I'll therefore go ahead and close the PR. I will also link to a Gist later that shows how I solved it for people having the same issue.

@maennchen maennchen closed this Jun 29, 2022
@josevalim
Copy link
Member

josevalim commented Jun 29, 2022

It is basically the same concern as not supporting the dump command at all:

That's besides the point. :) I agree having a Mix task is convenient, that's not under debate, the question is how to make it happen.

The concern is that people may want to further customize dump and, whenever that happens, me and the Ecto team will have to be the ones reviewing pull requests (and potentially saying no), so I am much more interested in a general solution to the problem.

So I needed help understanding where the complexity is, rather than focusing on the feature in its current shape. :)

The dump command solves a few issues like authenticating the pg_dump command etc

And this was the part that I was missing!

Looking at the code, we only set three flags: --no-acl, --no-owner, and config[:database]. So the only code you have to duplicate is the call to System.cmd (which is orthogonal) and those three flags. MySQL has a bit more logic around it though.

So with that said, I would rather have those adapters expose an API that calls dumps and adds the necessarily flags, rather than adding meal piece functionality around it. Something like Ecto.Adapters.Postgres.dump_cmd(flags, opts).

EDIT: changed the tone to a fairer one.

@josevalim
Copy link
Member

Or also Ecto.Adapters.SQL.dump_cmd, if that's a functionality most adapters can provide.

@maennchen
Copy link
Contributor Author

@josevalim I haven't thought about that one.

Something along those lines?

@callback run_cmd(config :: Keyword.t(), args :: [String.t()], into :: Collectable.t()) :: {:ok, Collectable.t()} | {:error, term()}

(Collectable.t() does not exist, so I'm not sure about the type...)

This could be used something like this:

defmodule Mix.Tasks.Acme.DumpSql do
  @shortdoc "Dumps the repository database structure & static tables"

  use Mix.Task

  import Mix.Ecto
  import Mix.EctoSQL

  alias Acme.Repo

  @impl Mix.Task
  def run(args) do
    ensure_repo(Repo, args)

    config = Repo.config()

    with {:ok, location} <- Repo.__adapter__().structure_dump(source_repo_priv(Repo), config),
      output_stream = File.stream!(location, [:append]),
         {:ok, _stream} <- Repo.__adapter__().dump_cmd(config, ["--table", "table_name"], output_stream) do
      Mix.shell().info("The structure for #{inspect(Repo)} has been dumped to #{location}")
    else
      {:error, term} when is_binary(term) ->
        Mix.raise("The structure for #{inspect(Repo)} couldn't be dumped: #{term}")

      {:error, term} ->
        Mix.raise("The structure for #{inspect(Repo)} couldn't be dumped: #{inspect(term)}")
    end
  end
end

Another way that comes to mind would be something along the lines of this:

  def structure_dump(default, config) do
    # ...    

    after_dump_callback = case config[:migration_after_dump_callback] do
      {module, function} -> Function.capture(module, function, 2)
      nil -> fn _run_cmd_fn, output_stream -> {:ok, output_stream} end
    end

    with {:ok, output_stream} <- dump_structure(config, output_stream),
         {:ok, has_versions?} <- has_versions(migration_table, config),
         {:ok, output_stream} <- dump_versions(config, output_stream),
         {:ok, _output_stream} <- after_dump_callback.(&pg_dump(config, &1, &2), output_stream),
         do: {:ok, path}
  end

with this one could do that and use the builtin ecto.dump mis task:

# config.exs
config :acme, Acme.Repo, migration_after_dump_callback: {Acme.DumpExtension, :call}

# lib/acme/dump_extension.ex
defmodule Acme.DumpExtension do
  def call(dump_cmd_fn, output_stream) do
    with {:ok, output_stream} <- dump_cmd_fn.(["--table", "table_name"], output_stream), # Dump Extra Table
         {:ok, output_stream} <- Enum.into(["REFRESH MATERIALIZED VIEW bla"], output_stream), # Add something by hand
         do: {:ok, output_stream}
  end
end

@josevalim
Copy link
Member

Yes! Perhaps we add Ecto.Adapters.SQL.dump_cmd(repo | config, args, opts) instead? So you don't need to directly call the adapter.

I am just not sure if the third argument should be opts or into. I am thinking about proxying all of System.cmd options instead. Why did you pick into? :)

@josevalim
Copy link
Member

I think we don't need the migration_after_dump_callback callback. You can override "ecto.dump" as an alias an Ecto should pick it up.

@maennchen
Copy link
Contributor Author

The reason why I proposed the into (which we could default to an empty string) is so that we do not have to keep large amounts of data in memory.

Potentially the outputs of a dump command could be quite large.

If we allow supplying opts directly to the cmd command, that should work as well since into can be specified.

So the callback should be this?

@callback dump_cmd(repo_or_config :: module() | Keywrod.t(), args :: String.t(), opts :: Keyword.t()) ::{Collectable.t(), exit_status :: non_neg_integer()}

@josevalim
Copy link
Member

Change args to [String.t()] and I think we are solid!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants