Skip to content

[ctx_profile] Profile reader and writer #91859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions llvm/include/llvm/ProfileData/PGOCtxProfReader.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
//===--- PGOCtxProfReader.h - Contextual profile reader ---------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
///
/// \file
///
/// Reader for contextual iFDO profile, which comes in bitstream format.
///
//===----------------------------------------------------------------------===//

#ifndef LLVM_PROFILEDATA_CTXINSTRPROFILEREADER_H
#define LLVM_PROFILEDATA_CTXINSTRPROFILEREADER_H

#include "llvm/ADT/DenseSet.h"
#include "llvm/Bitstream/BitstreamReader.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/ProfileData/PGOCtxProfWriter.h"
#include "llvm/Support/Error.h"
#include <map>
#include <vector>

namespace llvm {
/// The loaded contextual profile, suitable for mutation during IPO passes. We
/// generally expect a fraction of counters and of callsites to be populated.
/// We continue to model counters as vectors, but callsites are modeled as a map
/// of a map. The expectation is that, typically, there is a small number of
/// indirect targets (usually, 1 for direct calls); but potentially a large
/// number of callsites, and, as inlining progresses, the callsite count of a
/// caller will grow.
class PGOContextualProfile final {
public:
using CallTargetMapTy = std::map<GlobalValue::GUID, PGOContextualProfile>;
using CallsiteMapTy = DenseMap<uint32_t, CallTargetMapTy>;

private:
friend class PGOCtxProfileReader;
GlobalValue::GUID GUID = 0;
SmallVector<uint64_t, 16> Counters;
CallsiteMapTy Callsites;

PGOContextualProfile(GlobalValue::GUID G,
SmallVectorImpl<uint64_t> &&Counters)
: GUID(G), Counters(std::move(Counters)) {}

Expected<PGOContextualProfile &>
getOrEmplace(uint32_t Index, GlobalValue::GUID G,
SmallVectorImpl<uint64_t> &&Counters);

public:
PGOContextualProfile(const PGOContextualProfile &) = delete;
PGOContextualProfile &operator=(const PGOContextualProfile &) = delete;
PGOContextualProfile(PGOContextualProfile &&) = default;
PGOContextualProfile &operator=(PGOContextualProfile &&) = default;

GlobalValue::GUID guid() const { return GUID; }
const SmallVectorImpl<uint64_t> &counters() const { return Counters; }
const CallsiteMapTy &callsites() const { return Callsites; }
CallsiteMapTy &callsites() { return Callsites; }

bool hasCallsite(uint32_t I) const {
return Callsites.find(I) != Callsites.end();
}

const CallTargetMapTy &callsite(uint32_t I) const {
assert(hasCallsite(I) && "Callsite not found");
return Callsites.find(I)->second;
}
void getContainedGuids(DenseSet<GlobalValue::GUID> &Guids) const;
};

class PGOCtxProfileReader final {
BitstreamCursor &Cursor;
Expected<BitstreamEntry> advance();
Error readMetadata();
Error wrongValue(const Twine &);
Error unsupported(const Twine &);

Expected<std::pair<std::optional<uint32_t>, PGOContextualProfile>>
readContext(bool ExpectIndex);
bool canReadContext();

public:
PGOCtxProfileReader(BitstreamCursor &Cursor) : Cursor(Cursor) {}

Expected<std::map<GlobalValue::GUID, PGOContextualProfile>> loadContexts();
};
} // namespace llvm
#endif
91 changes: 91 additions & 0 deletions llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
//===- PGOCtxProfWriter.h - Contextual Profile Writer -----------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file declares a utility for writing a contextual profile to bitstream.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_PROFILEDATA_PGOCTXPROFWRITER_H_
#define LLVM_PROFILEDATA_PGOCTXPROFWRITER_H_

#include "llvm/Bitstream/BitstreamWriter.h"
#include "llvm/ProfileData/CtxInstrContextNode.h"

namespace llvm {
enum PGOCtxProfileRecords { Invalid = 0, Version, Guid, CalleeIndex, Counters };

enum PGOCtxProfileBlockIDs {
ProfileMetadataBlockID = 100,
ContextNodeBlockID = ProfileMetadataBlockID + 1
};

/// Write one or more ContextNodes to the provided raw_fd_stream.
/// The caller must destroy the PGOCtxProfileWriter object before closing the
/// stream.
/// The design allows serializing a bunch of contexts embedded in some other
/// file. The overall format is:
///
/// [... other data written to the stream...]
/// SubBlock(ProfileMetadataBlockID)
/// Version
/// SubBlock(ContextNodeBlockID)
/// [RECORDS]
/// SubBlock(ContextNodeBlockID)
/// [RECORDS]
/// [... more SubBlocks]
/// EndBlock
/// EndBlock
///
/// The "RECORDS" are bitsream records. The IDs are in CtxProfileCodes (except)
/// for Version, which is just for metadata). All contexts will have Guid and
/// Counters, and all but the roots have CalleeIndex. The order in which the
/// records appear does not matter, but they must precede any subcontexts,
/// because that helps keep the reader code simpler.
///
/// Subblock containment captures the context->subcontext relationship. The
/// "next()" relationship in the raw profile, between call targets of indirect
/// calls, are just modeled as peer subblocks where the callee index is the
/// same.
///
/// Versioning: the writer may produce additional records not known by the
/// reader. The version number indicates a more structural change.
/// The current version, in particular, is set up to expect optional extensions
/// like value profiling - which would appear as additional records. For
/// example, value profiling would produce a new record with a new record ID,
/// containing the profiled values (much like the counters)
class PGOCtxProfileWriter final {
SmallVector<char, 1 << 20> Buff;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a llvm::MemoryBuffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how - it's for the ctor of BitstreamWriter. Not seeing it take a MemoryBuffer, what am I missing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I didn't look hard enough and assumed that the usage was concerned with just a buffer of char with a particular size.

BitstreamWriter Writer;

void writeCounters(const ctx_profile::ContextNode &Node);
void writeImpl(std::optional<uint32_t> CallerIndex,
const ctx_profile::ContextNode &Node);

public:
PGOCtxProfileWriter(raw_fd_stream &Out,
std::optional<unsigned> VersionOverride = std::nullopt)
: Writer(Buff, &Out, 0) {
Writer.EnterSubblock(PGOCtxProfileBlockIDs::ProfileMetadataBlockID,
CodeLen);
const auto Version = VersionOverride ? *VersionOverride : CurrentVersion;
Writer.EmitRecord(PGOCtxProfileRecords::Version,
SmallVector<unsigned, 1>({Version}));
}

~PGOCtxProfileWriter() { Writer.ExitBlock(); }

void write(const ctx_profile::ContextNode &);

// constants used in writing which a reader may find useful.
static constexpr unsigned CodeLen = 2;
static constexpr uint32_t CurrentVersion = 1;
static constexpr unsigned VBREncodingBits = 6;
};

} // namespace llvm
#endif
2 changes: 2 additions & 0 deletions llvm/lib/ProfileData/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ add_llvm_component_library(LLVMProfileData
ItaniumManglingCanonicalizer.cpp
MemProf.cpp
MemProfReader.cpp
PGOCtxProfReader.cpp
PGOCtxProfWriter.cpp
ProfileSummaryBuilder.cpp
SampleProf.cpp
SampleProfReader.cpp
Expand Down
173 changes: 173 additions & 0 deletions llvm/lib/ProfileData/PGOCtxProfReader.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
//===- PGOCtxProfReader.cpp - Contextual Instrumentation profile reader ---===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// Read a contextual profile into a datastructure suitable for maintenance
// throughout IPO
//
//===----------------------------------------------------------------------===//

#include "llvm/ProfileData/PGOCtxProfReader.h"
#include "llvm/Bitstream/BitCodeEnums.h"
#include "llvm/Bitstream/BitstreamReader.h"
#include "llvm/ProfileData/InstrProf.h"
#include "llvm/ProfileData/PGOCtxProfWriter.h"
#include "llvm/Support/Errc.h"
#include "llvm/Support/Error.h"

using namespace llvm;

// FIXME(#92054) - these Error handling macros are (re-)invented in a few
// places.
#define EXPECT_OR_RET(LHS, RHS) \
auto LHS = RHS; \
if (!LHS) \
return LHS.takeError();

#define RET_ON_ERR(EXPR) \
if (auto Err = (EXPR)) \
return Err;

Expected<PGOContextualProfile &>
PGOContextualProfile::getOrEmplace(uint32_t Index, GlobalValue::GUID G,
SmallVectorImpl<uint64_t> &&Counters) {
auto [Iter, Inserted] = Callsites[Index].insert(
{G, PGOContextualProfile(G, std::move(Counters))});
if (!Inserted)
return make_error<InstrProfError>(instrprof_error::invalid_prof,
"Duplicate GUID for same callsite.");
return Iter->second;
}

void PGOContextualProfile::getContainedGuids(
DenseSet<GlobalValue::GUID> &Guids) const {
Guids.insert(GUID);
for (const auto &[_, Callsite] : Callsites)
for (const auto &[_, Callee] : Callsite)
Callee.getContainedGuids(Guids);
}

Expected<BitstreamEntry> PGOCtxProfileReader::advance() {
return Cursor.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
}

Error PGOCtxProfileReader::wrongValue(const Twine &Msg) {
return make_error<InstrProfError>(instrprof_error::invalid_prof, Msg);
}

Error PGOCtxProfileReader::unsupported(const Twine &Msg) {
return make_error<InstrProfError>(instrprof_error::unsupported_version, Msg);
}

bool PGOCtxProfileReader::canReadContext() {
auto Blk = advance();
if (!Blk) {
consumeError(Blk.takeError());
return false;
}
return Blk->Kind == BitstreamEntry::SubBlock &&
Blk->ID == PGOCtxProfileBlockIDs::ContextNodeBlockID;
}

Expected<std::pair<std::optional<uint32_t>, PGOContextualProfile>>
PGOCtxProfileReader::readContext(bool ExpectIndex) {
RET_ON_ERR(Cursor.EnterSubBlock(PGOCtxProfileBlockIDs::ContextNodeBlockID));

std::optional<ctx_profile::GUID> Guid;
std::optional<SmallVector<uint64_t, 16>> Counters;
std::optional<uint32_t> CallsiteIndex;

SmallVector<uint64_t, 1> RecordValues;

// We don't prescribe the order in which the records come in, and we are ok
// if other unsupported records appear. We seek in the current subblock until
// we get all we know.
auto GotAllWeNeed = [&]() {
return Guid.has_value() && Counters.has_value() &&
(!ExpectIndex || CallsiteIndex.has_value());
};
while (!GotAllWeNeed()) {
RecordValues.clear();
EXPECT_OR_RET(Entry, advance());
if (Entry->Kind != BitstreamEntry::Record)
return wrongValue(
"Expected records before encountering more subcontexts");
EXPECT_OR_RET(ReadRecord,
Cursor.readRecord(bitc::UNABBREV_RECORD, RecordValues));
switch (*ReadRecord) {
case PGOCtxProfileRecords::Guid:
if (RecordValues.size() != 1)
return wrongValue("The GUID record should have exactly one value");
Guid = RecordValues[0];
break;
case PGOCtxProfileRecords::Counters:
Counters = std::move(RecordValues);
if (Counters->empty())
return wrongValue("Empty counters. At least the entry counter (one "
"value) was expected");
break;
case PGOCtxProfileRecords::CalleeIndex:
if (!ExpectIndex)
return wrongValue("The root context should not have a callee index");
if (RecordValues.size() != 1)
return wrongValue("The callee index should have exactly one value");
CallsiteIndex = RecordValues[0];
break;
default:
// OK if we see records we do not understand, like records (profile
// components) introduced later.
break;
}
}

PGOContextualProfile Ret(*Guid, std::move(*Counters));

while (canReadContext()) {
EXPECT_OR_RET(SC, readContext(true));
auto &Targets = Ret.callsites()[*SC->first];
auto [_, Inserted] =
Targets.insert({SC->second.guid(), std::move(SC->second)});
if (!Inserted)
return wrongValue(
"Unexpected duplicate target (callee) at the same callsite.");
}
return std::make_pair(CallsiteIndex, std::move(Ret));
}

Error PGOCtxProfileReader::readMetadata() {
EXPECT_OR_RET(Blk, advance());
if (Blk->Kind != BitstreamEntry::SubBlock)
return unsupported("Expected Version record");
RET_ON_ERR(
Cursor.EnterSubBlock(PGOCtxProfileBlockIDs::ProfileMetadataBlockID));
EXPECT_OR_RET(MData, advance());
if (MData->Kind != BitstreamEntry::Record)
return unsupported("Expected Version record");

SmallVector<uint64_t, 1> Ver;
EXPECT_OR_RET(Code, Cursor.readRecord(bitc::UNABBREV_RECORD, Ver));
if (*Code != PGOCtxProfileRecords::Version)
return unsupported("Expected Version record");
if (Ver.size() != 1 || Ver[0] > PGOCtxProfileWriter::CurrentVersion)
return unsupported("Version " + Twine(*Code) +
" is higher than supported version " +
Twine(PGOCtxProfileWriter::CurrentVersion));
return Error::success();
}

Expected<std::map<GlobalValue::GUID, PGOContextualProfile>>
PGOCtxProfileReader::loadContexts() {
std::map<GlobalValue::GUID, PGOContextualProfile> Ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an DenseMap? If we iterate on the contents, I'm thinking that non-determinism may be masked by integer guids being ordered by std::map. Also it has log(n) lookup which will usually be slower than DenseMap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be too many roots, and the use of map is because of pointer / iterator stability. Now, the set of roots shouldn't change, but this matches the other place std::map is used (for call targets).

RET_ON_ERR(readMetadata());
while (canReadContext()) {
EXPECT_OR_RET(E, readContext(false));
auto Key = E->second.guid();
if (!Ret.insert({Key, std::move(E->second)}).second)
return wrongValue("Duplicate roots");
}
return Ret;
}
Loading
Loading