Skip to content

Use first bad_words as extra parameters, and implement min-p #1536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

pathorn
Copy link
Contributor

@pathorn pathorn commented May 2, 2024

An approach for implementing #1154

The user-facing classes in BatchManager and tensorrt_llm::executor::SamplingConfig are not open source (the constructor implementation exists in .a files and will segfault if the class is modified), so we hacked it by using the integers in the first bad_words for extra parameters.

In this case, we're using the first integer reinterpreted as a float, to represent min_p (the default value is 0.0 which matches the 0 padding in bad_words).

I implemented the min-p by piggybacking on the existing logprobs calculation in cuda, so it should have no additional performance overhead beyond the logprobs calculation.

@pathorn
Copy link
Contributor Author

pathorn commented May 2, 2024

Here is some example BLS code for adding the min_p value into the bad_words list in the way this PR expects:

            numpy_tensor = preproc_output_tensor.as_numpy()
            if trtllm_tensor_name == "bad_words_list":
                bad_words_data, bad_words_offsets = numpy_tensor[0]
                opt = np.get_printoptions()
                np.set_printoptions(threshold=np.inf)
                pprint(numpy_tensor)
                min_p = 0.0
                if "min_p" in bls_input_tensors_map:
                    minptensor = bls_input_tensors_map["min_p"].as_numpy()
                    pprint(minptensor)
                    min_p = minptensor[0,0]
                min_p_int ,= struct.unpack('i', struct.pack('f', min_p))
                extra_data = np.array([min_p_int], dtype=np.int32)
                if bad_words_offsets[0] == -1:
                    # Special case: if no bad_words are passed, numpy_tensor will be [[[0], [-1]]]
                    # In this case, we don't want to prepend [0] because that would add a bad word offset where there otherwise was none.
                    bad_words_data = extra_data
                    bad_words_offsets = np.array([-1], dtype=np.int32)
                else:
                    # Prepend min_p words.
                    bad_words_data = np.concatenate((extra_data, bad_words_data), axis=0)
                    # The offsets array is padded with -1, so we first add one to make the padding all zeros, then trim_zeros and subtract one.
                    bad_words_offsets = np.trim_zeros(bad_words_offsets + 1) - 1
                    # Then, we prepend an extra 0 element to account for an extra bad_word being added.
                    bad_words_offsets = np.concatenate((np.array([0], dtype=np.int32), bad_words_offsets), axis=0)
                    # Then, we offset the indices by the length of the newly added data.
                    bad_words_offsets = bad_words_offsets + len(extra_data)
                    # Finally, we pad this array to make it the same length as bad_words_data:
                    bad_words_offsets = np.concatenate((bad_words_offsets, np.array([-1] * (len(bad_words_data) - len(bad_words_offsets)), dtype=np.int32)), axis=0)
                numpy_tensor = np.array([[bad_words_data, bad_words_offsets]], dtype=np.int32)
                print("Final:")
                pprint(numpy_tensor)
                np.set_printoptions(**opt)

            trtllm_input_tensors.append(
                pb_utils.Tensor(trtllm_tensor_name,
                                numpy_tensor))

@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch from 7e1acc9 to 3731f5b Compare May 3, 2024 22:40
@juney-nvidia
Copy link
Collaborator

@pathorn

Hi Pathorn

Thanks for your interest to submit the MR into TRT-LLM.

The current process of merging community MR into TRT-LLM is:

  • After the contributor finishing the implementation with passing the local test. TRT-LLM engineers can help review the MR with providing the feedbacks and then several iterations of code refinements/discussions are necessary :)
  • After the MR is ready to get landed, one TRT-LLM engineer will cherry-pick the MR into our internal git repo.
  • Then later when the new TRT-LLM version gets pushed to the github, we will acknowledge the contributor's name in the announcement notes.

Pls let me know whether the above process makes sense to you.
Thanks

June

@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch from 3731f5b to 3d7d658 Compare May 29, 2024 10:31
@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch from 3d7d658 to 0481a36 Compare June 5, 2024 17:01
@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch from 0481a36 to f4c8e1c Compare June 25, 2024 14:04
@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch from f4c8e1c to a248ba1 Compare September 25, 2024 00:13
@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch 2 times, most recently from d1f2263 to 55db1b6 Compare October 9, 2024 07:05
@pathorn pathorn force-pushed the minp_via_badwords_apr30 branch from 55db1b6 to 6b54714 Compare October 16, 2024 22:59
@DanBlanaru DanBlanaru mentioned this pull request Feb 6, 2025
@pathorn pathorn closed this Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants