Support static data tables in ecto.dump mix task #419

maennchen · 2022-06-27T08:18:30Z

In progress, depending on outcome of https://groups.google.com/g/elixir-ecto/c/FtyKyZWJGjI

josevalim · 2022-06-27T08:21:27Z

Sorry for the delay on the replies. Your email got stuck on the group spam filter and then flagged on my own inbox as spam later. :/

Can you please send a PR with the flag approach? I think it is less internal bookkeeping for us to take care of.

maennchen · 2022-06-27T09:18:55Z

@josevalim Yeah, something with that message is strange. I don't see why it would be rated as spam based on content and the sender is a google account as well...

I've added support for flags in the Mix.Task. I did however not remove the config value description. The reasoning for this is that all configs are passed through automatically:

ecto_sql/lib/mix/tasks/ecto.dump.ex

Line 64 in d624288

config = Keyword.merge(repo.config(), opts)

This means, that without specifically excluding the value, it will be supported via config.

Therefore the only question is if we should officially document & support it or not. I thought that if it is already supported, then we should better document it properly.

How would you like me to proceed?

Remove documentation
Remove documentation and exclude value being passed through from the config
Leave as is

maennchen · 2022-06-27T09:19:29Z

lib/mix/tasks/ecto.dump.ex

  ]

  @switches [
    dump_path: :string,
    quiet: :boolean,
    repo: [:string, :keep],
    no_compile: :boolean,
-    no_deps_check: :boolean
+    no_deps_check: :boolean,
+    additional_data_tables: [:string, :keep]


@josevalim Here's the flag

josevalim · 2022-06-27T11:24:20Z

Oh, sorry, there was a misunderstanding. I don't mind as Mix.Task flags, but rather implementing the feature through generic flags:

config :my_app, MyApp.Repo, dump_flags: ~w(--data-table foo --data-table bar)

The idea is that PG/MySQL do not have to know about this feature, instead we just append the given flags. WDYT?

maennchen · 2022-06-27T14:36:36Z

@josevalim That solution would really depend on the implementation of the specific dump commands.

Already with MySQL / Postgres, exporting the whole schema and only specific tables can only be achieved by calling the command twice. (Once for the schema, once for the table)

It seems that in SQlite, one would need to call the command once per export. (See https://github.com/elixir-sqlite/ecto_sqlite3/blob/e118505193320388cd2a65c264baf754b5ae056b/lib/ecto/adapters/sqlite3.ex#L458)

If going with a more generic approach as opposed to supporting dumping specific tables, I think we would need to provide a better interface than just exposing arguments to the command used for dumping stuff.

josevalim · 2022-06-29T07:45:32Z

I see, thanks! In this case, I don't see a reason to have this as part of Ecto. It is more for us to maintain and it would be implemented by a custom alias that dumps and then adds information to the dump, right?

maennchen · 2022-06-29T07:58:53Z

@josevalim A custom alias was my first idea on how to solve this. Unfortunately that means copying a lot of code since all the functions are private.

I believe this to be a helpful and simple addition that is worth its maintenance.

Would you be open to posting about this in the Elixir Forum to see if the community is interested in this feature or if it is just me?

josevalim · 2022-06-29T08:05:13Z

Why do you need to copy code? Can't you append to the generated file after the initial dump?

maennchen · 2022-06-29T08:38:37Z

@josevalim I could append manually to the file for sure, but then I'll have to do that manually with every dump.

We normally update the dump regularly since it takes quite some time to run all the migrations to speed up development.

If I want to automate the export, I would have to create an alias / mix task and copy most of the dumping stuff from the Postgres adapter into it. (All relevant parts of the adapter are private)

(We have ecto.load inside the ecto.reset alias and normally do an ecto.dump before every release.)

josevalim · 2022-06-29T08:41:58Z

Sorry, I don’t get the concern with updating the file. The location of the file is public, you can append your contents to the file without concerns. Either the task or the alias can call the existing dump task.

maennchen · 2022-06-29T09:01:39Z

@josevalim It is basically the same concern as not supporting the dump command at all:

The dump command solves a few issues like authenticating the pg_dump command etc. automatically. It could also be done by hand but it is just a lot more convenient using it.
The same applies to this issue: I can either write some elixir code that does the same as the adapter (Reading the config, setting up auth for the command etc.) or I'll have to do that manually.

I thought that this addition might save some people some time / copying code. This is not that important though and I don't want to steal more of your time. I'll therefore go ahead and close the PR. I will also link to a Gist later that shows how I solved it for people having the same issue.

josevalim · 2022-06-29T09:20:28Z

It is basically the same concern as not supporting the dump command at all:

That's besides the point. :) I agree having a Mix task is convenient, that's not under debate, the question is how to make it happen.

The concern is that people may want to further customize dump and, whenever that happens, me and the Ecto team will have to be the ones reviewing pull requests (and potentially saying no), so I am much more interested in a general solution to the problem.

So I needed help understanding where the complexity is, rather than focusing on the feature in its current shape. :)

The dump command solves a few issues like authenticating the pg_dump command etc

And this was the part that I was missing!

Looking at the code, we only set three flags: --no-acl, --no-owner, and config[:database]. So the only code you have to duplicate is the call to System.cmd (which is orthogonal) and those three flags. MySQL has a bit more logic around it though.

So with that said, I would rather have those adapters expose an API that calls dumps and adds the necessarily flags, rather than adding meal piece functionality around it. Something like Ecto.Adapters.Postgres.dump_cmd(flags, opts).

EDIT: changed the tone to a fairer one.

josevalim · 2022-06-29T09:22:59Z

Or also Ecto.Adapters.SQL.dump_cmd, if that's a functionality most adapters can provide.

maennchen · 2022-06-29T09:59:59Z

@josevalim I haven't thought about that one.

Something along those lines?

@callback run_cmd(config :: Keyword.t(), args :: [String.t()], into :: Collectable.t()) :: {:ok, Collectable.t()} | {:error, term()}

(Collectable.t() does not exist, so I'm not sure about the type...)

This could be used something like this:

defmodule Mix.Tasks.Acme.DumpSql do
  @shortdoc "Dumps the repository database structure & static tables"

  use Mix.Task

  import Mix.Ecto
  import Mix.EctoSQL

  alias Acme.Repo

  @impl Mix.Task
  def run(args) do
    ensure_repo(Repo, args)

    config = Repo.config()

    with {:ok, location} <- Repo.__adapter__().structure_dump(source_repo_priv(Repo), config),
      output_stream = File.stream!(location, [:append]),
         {:ok, _stream} <- Repo.__adapter__().dump_cmd(config, ["--table", "table_name"], output_stream) do
      Mix.shell().info("The structure for #{inspect(Repo)} has been dumped to #{location}")
    else
      {:error, term} when is_binary(term) ->
        Mix.raise("The structure for #{inspect(Repo)} couldn't be dumped: #{term}")

      {:error, term} ->
        Mix.raise("The structure for #{inspect(Repo)} couldn't be dumped: #{inspect(term)}")
    end
  end
end

Another way that comes to mind would be something along the lines of this:

  def structure_dump(default, config) do
    # ...    

    after_dump_callback = case config[:migration_after_dump_callback] do
      {module, function} -> Function.capture(module, function, 2)
      nil -> fn _run_cmd_fn, output_stream -> {:ok, output_stream} end
    end

    with {:ok, output_stream} <- dump_structure(config, output_stream),
         {:ok, has_versions?} <- has_versions(migration_table, config),
         {:ok, output_stream} <- dump_versions(config, output_stream),
         {:ok, _output_stream} <- after_dump_callback.(&pg_dump(config, &1, &2), output_stream),
         do: {:ok, path}
  end

with this one could do that and use the builtin ecto.dump mis task:

# config.exs
config :acme, Acme.Repo, migration_after_dump_callback: {Acme.DumpExtension, :call}

# lib/acme/dump_extension.ex
defmodule Acme.DumpExtension do
  def call(dump_cmd_fn, output_stream) do
    with {:ok, output_stream} <- dump_cmd_fn.(["--table", "table_name"], output_stream), # Dump Extra Table
         {:ok, output_stream} <- Enum.into(["REFRESH MATERIALIZED VIEW bla"], output_stream), # Add something by hand
         do: {:ok, output_stream}
  end
end

josevalim · 2022-06-29T11:52:46Z

Yes! Perhaps we add Ecto.Adapters.SQL.dump_cmd(repo | config, args, opts) instead? So you don't need to directly call the adapter.

I am just not sure if the third argument should be opts or into. I am thinking about proxying all of System.cmd options instead. Why did you pick into? :)

josevalim · 2022-06-29T11:55:18Z

I think we don't need the migration_after_dump_callback callback. You can override "ecto.dump" as an alias an Ecto should pick it up.

maennchen · 2022-06-29T14:02:02Z

The reason why I proposed the into (which we could default to an empty string) is so that we do not have to keep large amounts of data in memory.

Potentially the outputs of a dump command could be quite large.

If we allow supplying opts directly to the cmd command, that should work as well since into can be specified.

So the callback should be this?

@callback dump_cmd(repo_or_config :: module() | Keywrod.t(), args :: String.t(), opts :: Keyword.t()) ::{Collectable.t(), exit_status :: non_neg_integer()}

josevalim · 2022-06-29T14:21:13Z

Change args to [String.t()] and I think we are solid!

Support static data tables in ecto.dump mix task

b66864b

maennchen commented Jun 27, 2022

View reviewed changes

maennchen closed this Jun 29, 2022

This was referenced Jun 29, 2022

Simplify Code for structure_dump #422

Closed

Expose Structure.dump_cmd/3 #423

Merged

Support static data tables in ecto.dump mix task #419

Support static data tables in ecto.dump mix task #419

Uh oh!

Conversation

maennchen commented Jun 27, 2022

Uh oh!

josevalim commented Jun 27, 2022

Uh oh!

maennchen commented Jun 27, 2022

Uh oh!

maennchen Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

josevalim commented Jun 27, 2022

Uh oh!

maennchen commented Jun 27, 2022

Uh oh!

josevalim commented Jun 29, 2022

Uh oh!

maennchen commented Jun 29, 2022

Uh oh!

josevalim commented Jun 29, 2022

Uh oh!

maennchen commented Jun 29, 2022

Uh oh!

josevalim commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maennchen commented Jun 29, 2022

Uh oh!

josevalim commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Jun 29, 2022

Uh oh!

maennchen commented Jun 29, 2022

Uh oh!

josevalim commented Jun 29, 2022

Uh oh!

josevalim commented Jun 29, 2022

Uh oh!

maennchen commented Jun 29, 2022

Uh oh!

josevalim commented Jun 29, 2022

Uh oh!

Uh oh!

josevalim commented Jun 29, 2022 •

edited

Loading

josevalim commented Jun 29, 2022 •

edited

Loading