Skip to content

add security note about accessing urls #1600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gregsdennis
Copy link
Member

What kind of change does this PR introduce?

clarification

Issue & Discussion References

Summary

Adds a security note about performing network operations when encountering URLs.

The last sentence in the addition was taken directly from @awwright's comment in the issue.

Does this PR introduce a breaking change?

no

@gregsdennis gregsdennis added this to the stable-release milestone Apr 26, 2025
@gregsdennis gregsdennis requested a review from a team April 26, 2025 09:20
@gregsdennis gregsdennis moved this to In Progress in Stable Release Development Apr 26, 2025
Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mention any security considerations. It's a requirement that we made for security reasons, but it's not a security consideration itself.

We could talk about the security considerations that led to that decision, but that feels out-of-place to me. This section should be about things implementers need to consider and protect against. It's not supposed to be a place for us to justify decisions we made for security reasons.

Because this requirement is a "SHOULD" and not a "MUST", we could talk about the security considerations that implementers who chose to support that kind of retrieval need to be aware of. That's the only way I think this makes sense.

Comment on lines 1997 to 1998
the host system to various security vulnerabilities, such as man-in-the-middle
attacks or data leaks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to sound alarmist, but RCEs are also a potential if there's the potential of bad parsing and maliciuos intent. I think MitM is a low risk, but a noteable consideration.

How do you imagine data leaks might happen? By virtue of making a request to a URL from a system which should be invisible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A misbehaving implementation with access to the internet could send your data to another server, unrequested. To avoid this we instruct implementations to not make network calls by default. Thus making use of the network is opt-in, suggesting that the user understands the risks.

I can add the RCE risk to the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there are nuances that I'm not familiar with in this area, but I don't see any of these things as risks worth mentioning.

A misbehaving implementation with access to the internet could send your data to another server, unrequested

I don't see how that's possible. We're talking about retrieving schemas over a network. Information is coming into the system, never out. The only data that could be leaked is what public schemas your network is accessing.

I think MitM is a low risk, but a noteable consideration.

I see MitM as essentially the same thing as data leakage. MitM is about covertly intercepting communications that are thought to be done privately. If you're retrieving a publicly available schema there's no need for MitM because the schema is already public. Again, the only information that could be exposed is which schemas you're accessing.

RCEs are also a potential if there's the potential of bad parsing and maliciuos intent.

I'm not sure what you mean by this. It would need to be code send by the attacker that gets executed by the implementation that isn't intended to be executed by the implementation. I don't see how that's possible.

@Relequestual
Copy link
Member

Minor issue, but otherwise looks good. Thanks!

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that occurs to me is that real risk is evaluating untrusted schemas. Retrieving a schema you control that is accessible only on a VPN is a safe practice. The risk only comes when on an untrusted network because it opens the possibility that an untrusted schema can get into your system.

I think this section should focus on specific risks of network/filesystem access related to untrusted schemas. For example, if a system accepts user schemas and one of those schemas has a filesystem reference, you don't want an untrusted schema trying to access your filesystem.

Once we've covered that, then we can simply say that accessing schemas over an untrusted network opens the possibility of unintentionally evaluating untrusted schemas due to malicious actors. I wouldn't even mention specific types of network attacks. I think that's out of scope.

@@ -1990,6 +1990,13 @@ A malicious schema author could place executable code or other dangerous
material within a `$comment`. Implementations MUST NOT parse or otherwise take
action based on `$comment` contents.

When encountering an IRI that also represents a valid file system or network
location, implementations are discouraged from automatically making an operation to
access that location. Schema authors should take care when configuring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audience of the spec isn't schema authors and we shouldn't be speaking directly to them. We've moved away from that in other places and I think we should stick to that as a policy.

We can say the same thing this is saying, but from the perspective of the implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably the one place where it should be okay to address the schema author. They should know the risks of using an implementation.

@karenetheridge
Copy link
Member

FWIW this is what I note for security considerations in my implementation -- https://metacpan.org/pod/JSON::Schema::Modern#SECURITY-CONSIDERATIONS -- as regular expressions provide a potential vector for executing code or creating a DoS.

@gregsdennis
Copy link
Member Author

@karenetheridge thank you. I notice that what you have is particularly focused on regular expressions, which are already included in the validation spec.

@karenetheridge
Copy link
Member

I notice that what you have is particularly focused on regular expressions

Yes, since I don't support fetching schemas from disk or the network, I think this is the only direct source of vulnerabilities that a user might not already be aware of.

I think the key to emphasize (and we can repeat it in a few places if relevant) is "do not trust schemas from external sources".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Security considerations should mention treating URIs as URLs (from $ref and $schema)
4 participants