Skip to content

integration of host system information script into Nominatim CLI tool #2739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 13, 2022

Conversation

tareqpi
Copy link
Contributor

@tareqpi tareqpi commented Jun 7, 2022

  • Refactored and included additional host system information into collect_os_info.py
  • Integrated collect_os_info.py into Nominatim's CLI tool
  • Added a test for the new feature in the CI flow

This pull request resolves #2552 and #307

Copy link
Member

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added some minor comments. Looks okay in general. pylint has some more complaints, see the CI. You may need to install the latest version of pylint from pip to get the same reports locally.

The file collect_so_info.sh can be deleted, I think?

Could you please add a call to your new command in the CI run, around here. Simply duplicate what is done for the version call. That way, we can see in the CI directly what the output looks like.

@@ -58,16 +61,23 @@ def run(args):
LOG.warning('Analysing performance of indexing function')
from ..tools import admin
with connect(args.config.get_libpq_dsn()) as conn:
admin.analyse_indexing(conn, osm_id=args.osm_id, place_id=args.place_id)
admin.analyse_indexing(
conn, osm_id=args.osm_id, place_id=args.place_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not reformat unrelated code.

message = """
Use this information in your issue report at https://github.com/osm-search/Nominatim/issues
Copy and paste or redirect the output of the file:
$ ./collect_os_info.py > report.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs adaption to the new command.

self.os_info: str = self._os_name_info()

self.nominatim_ver: str = '{0[0]}.{0[1]}.{0[2]}-{0[3]}'.format(
NOMINATIM_VERSION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nominatim.version now has a convenience function version_str().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I globally searched this function and couldn't find it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this one here:

def version_str(version=NOMINATIM_VERSION):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have synced up the branch to the master branch because the branch was not updated for a long time and then found that function.

"""Generate a report about the host system including software versions, memory,
storage, and database configuration."""

with connect(config.get_libpq_dsn()) as conn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI test highlights a problem here: this will only work when a Nominatim database was already imported. To avoid that issue, try to connect against database 'postgres' instead. That will always exist.

In general, I would prefer that we catch any exception raised during connect and report something like 'Unable to connect to database' where the version string would be reported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, to connect to the postgres database you can simply use connect("dbname=postgres")

with conn.cursor() as cur:
nominatim_db_exists = cur.execute("""
SELECT datname FROM pg_catalog.pg_database
WHERE datname='nominatim'""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot really make the assumption that the database is always called 'nominatim'. Users can use arbitrary names in their configuration.

That said, we cannot even make the assumption that they are using a local database. So I was wrong when recommending connect("dbname=postgres"). We need to use any other connection parameters from the user-supplied dsn. You can use make_dsn() to replace the database name in the DSN that you get from config.get_libpq_dsn():

connect(psycopg2.make_dsn(config.get_libpq_dsn(), dbname='postgres'))

You can also use parse_dsn to extract the database name from the DSN and then use the SELECT to find out if the database exists.

@@ -0,0 +1,165 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a license header here (simply copy from other .py files).

@tareqpi
Copy link
Contributor Author

tareqpi commented Aug 6, 2022

Some of the third-party libraries' functions that are introduced in this pull request lack the correct type hints which is why mypy produces errors. I have raised an issue to them and made the necessary pull requests to resolve it as well.

@lonvia
Copy link
Member

lonvia commented Aug 7, 2022

I've usually marked the calls with # type: ignore[no-untyped-call], see for example

LOG.debug(self.mogrify(query, args).decode('utf-8')) # type: ignore[no-untyped-call]

But if you contribute the necessary typing info to typeshed, even better.

@lonvia
Copy link
Member

lonvia commented Aug 8, 2022

With your typeshed PR merged (jay!), there was a new mypy error for a now unneded ignore[no-untyped-call]. I've already fixed that one master. Please rebase.

Copy link
Member

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rebase this once more on master. #2792 already had the adaptions for the type annotations. Also consider squashing your commits to get rid of the intermediate steps. If you do that, only squash your commits, not the ones from micahcochran.


try:
with open("report.md", "w", encoding='utf8') as output:
output.write(report)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if it just writes to stdout, i.e. uses print. Users can always redirect to a file if they want it in a file. If left like this we'd have to add a parameter to state the output filename and that makes things more complicated.

@tareqpi tareqpi force-pushed the collect_os_info.sh branch 5 times, most recently from aa71002 to 72ac916 Compare August 12, 2022 20:50
@tareqpi tareqpi force-pushed the collect_os_info.sh branch from 6d60306 to 5e477e3 Compare August 12, 2022 21:55
Changed the script to functional programming paradigm to remove the big number of local attributes to decrease memory usage when running it. Additional OS info are now included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants