Skip to content

Conversation

@Pasewark
Copy link
Contributor

MLPs need more parameters for larger block sizes. A transformer can deal with different block sizes with the same number of parameters, making it easy to scale to much larger block sizes while maintaining flexible representation capabilities. The CompressTransformer applies a regular transformer with no causal masking to the token embeddings in a block. At the end, it returns the final embedding for the last token, which should be a summary of all the tokens in the block. You can choose the number of layers with the num_layers parameter.

You can use it like this:

dim_head=64
compress_block_size=16
num_kv_heads=2
mlp_expand_factor=.3
vocab_size=32000
hidden_size=1280
num_hidden_layers=22
num_heads=20
sliding_window_size=512
fine_block_size=16
num_fine_selected=4
overlap_size=0
compress_num_layers=2


compress_transformer=CompressTransformer(
    num_layers=compress_num_layers,
    dim=dim_head*num_kv_heads,
    num_heads=num_kv_heads,
)

model=Transformer(
    num_tokens = vocab_size,
    dim = hidden_size,
    depth = num_hidden_layers,
    heads = num_heads,
    dim_head = dim_head,
    kv_heads = num_kv_heads,
    use_sparse_attn = True,
    use_flex_sliding_window = True,
    use_triton_fine_selection = False,
    use_flex_fine_selection = False,
    sparse_attn_kwargs = dict(
        sliding_window_size = sliding_window_size,
        compress_block_size = compress_block_size,
        compress_block_overlap_len = overlap_size,
        compress_mlp = compress_transformer,
        selection_block_size = fine_block_size,
        num_selected_blocks = num_fine_selected,
        use_diff_topk = True,
        interpolated_importance_score = False,
        query_heads_share_selected_kv = True
    ),
)

@lucidrains
Copy link
Owner

@Pasewark thank you Eric! add a single e2e test and we are good for merging! 🙏

@Pasewark
Copy link
Contributor Author

Okay, I have added a test. Thanks!

@lucidrains lucidrains merged commit cb34259 into lucidrains:main Mar 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants