I just recently started documenting my code as it helped me. Though I feel like my documentations are a bit too verbose and probably unneeded on obvious parts of my code.
So I started commenting above a few lines of code and explain it in a short sentence what I do or why I do that, then leave a space under it for the next line so it is easier to read.
What do you think about this?
Edit: real code example from one of my projects:
async def discord_login_callback(request: HttpRequest) -> HttpResponseRedirect:
async def exchange_oauth2_code(code: str) -> str | None:
data = {
'grant_type': 'authorization_code',
'code': code,
'redirect_uri': OAUTH2_REDIRECT_URI
}
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
async with httpx.AsyncClient() as client:
# get user's access and refresh tokens
response = await client.post(f"{BASE_API_URI}/oauth2/token", data=data, headers=headers, auth=(CLIENT_ID, CLIENT_SECRET))
if response.status_code == 200:
access_token, refresh_token = response.json()["access_token"], response.json()["refresh_token"]
# get user data via discord's api
user_data = await client.get(f"{BASE_API_URI}/users/@me", headers={"Authorization": f"Bearer {access_token}"})
user_data = user_data.json()
user_data.update({"access_token": access_token, "refresh_token": refresh_token}) # add tokens to user_data
return user_data, None
else:
# if any error occurs, return error context
context = generate_error_dictionary("An error occurred while trying to get user's access and refresh tokens", f"Response Status: {response.status_code}\nError: {response.content}")
return None, context
code = request.GET.get("code")
user, context = await exchange_oauth2_code(code)
# login if user's discord user data is returned
if user:
discord_user = await aauthenticate(request, user=user)
await alogin(request, user=discord_user, backend="index.auth.DiscordAuthenticationBackend")
return redirect("index")
else:
return render(request, "index/errorPage.html", context)
Yeah, my general rule of thumb is that the following 4 things should be in the documentation:
Why?
Why not?, which IMO is often more important as you might know a few pitfalls of things people might want to try but that aren't being done for good reasons.
Quirks and necessities of parameters and return values, this ensures that someone doesn't need to skim your code just to use it.
If applicable, context for the code's existance, this is often helpful years down the line when trying to refactor something.
Yep. I mostly document why the obvious or best practice solution is wrong. And the answer is usually because of reliance on other poorly written code - third party or internal.
One interesting thing I read was that commenting code can be considered a code smell. It doesn't mean it's bad, it just means if you find yourself having to do it you should ask yourself if there's a better way to write the code so the comment isn't needed. Mostly you can but sometimes you can't.
API docs are also an exception imo especially if they are used to generate public facing documentation for someone who may not want to read your code.
Agree with you though, generally people should be able to understand what's going on by reading your code and tests.
I know there are documentation generators (like JSDoc in JavaScript) where you can literally write documentation in your code and have a documentation site auto-generated at each deployment. There’s definitely mixed views on this though
Good comments describe the "why" or rationale. Not the what. This function doesn't need any comments at all... but it needs a far better name like logAndReturnSeed. That said, depending on what specifically you're doing I'd probably advocate for not printing the value in this function because it feels weird so I'd probably end up writing this function like
def function rollD10() -> int:
return random.randInt(1, 10)
And I, as a senior developer, think that level of comments is great.
You mentioned that this is a trivial example but the main skill in commenting is using it sparingly when it adds value - so a more realistic example might be more helpful.
Great summary. The only thing I would add is that when we say "Answer Why?" we're implicitly inlcuding "WTF?!". It's the one version of "what" that's usually worth the window line space it costs. - Usually with a link to the unsolved upstream bug report at the heart of the mess.
I'm curious as to thoughts regarding documenting intent which cross over with what in my opinion.
Regarding self documenting: I agree, but I also think that means potentially using 5 lines when 1 would do if it makes maintenance more straightforward. This crazy perl one liner makes perfect sense today but not in 3 years.
Summarize and explain larger parts of code at the top of classes and methods. What is their purpose, how do they tackle the problem, how should they be used, and so on.
Add labels/subtitles to smaller chunks of code (maybe 4-10 lines) so people can quickly navigate them without having to read line by line. Stuff like "Loading data from X", "Converting from X to Y", "Handling case X". Occasionally I'll slip in a "because ..." to explain unusual or unexpected circumstances, e.g. an API doesn't follow expected standards or its own documentation. Chunks requiring more explanation than that should probably be extracted into separate methods.
There is no need to explain what every line of code is doing, coders can read the code itself for that. Instead focus on what part of the overall task a certain chunk of code is handling, and on things that might actually need explaining.
Imagine your "code" as English sentences. If it is hard to read, you might rephrase it. If something is getting long and drawn out, use paragraphs (methods and functions). At the end of the day, the easier it is to read, the better, unless there's a performance cost that's worthy of considering.
Like the top-level comment suggests, you should comment your methods. I would go one step further and use a standard comment format. I like Ruby, so immediately, I think YARDoc. With a YARDoc comment, you define what it does, the parameter types and descriptions, what it returns, possible exceptions that could be returned, etc.
Even better, by using standardized comments, not only does this make it easier to read by you and others, but most of the time, you get documentation rendered for free. For example, here is a library I wrote:
This style of auto-generated documentation is available for pretty much all mature languages, and I highly recommend that you hit the ground running with them 👍
I rarely read comments in code, that is from within source code anyway. I of course write comments explaining the behavior of public facing interfaces and otherwise where they serve to generate documentation, but very rarely otherwise. And I use that generated documentation. So in a roundabout way I do read comments but outside of the code base.
For instance I might use godoc to get a general idea of components but if I’m in the code I’ll be reading the code instead.
As others have said, your code generally but not always should clearly express what it does. It is fine to comment why you have decided to implement something in a way that isn’t immediately clear.
I’m not saying others don’t read comments in code; some do. I just never find myself looking at docs in code. The most important skill I have cultivated over the decades has been learning to read and follow the actual code itself.
Line or block comments. Reserved for when you're doing something non-obvious, like a hack, a workaround because of a bug that can't be fixed yet etc. Designed to help other programmers (or yourself a few months later) to understand what's going on. Ideally you shouldn't have any of these but life ain't perfect.
If parts of your code are intended to be used as libraries, modules, APIs etc. there are standard methods of documenting those and extracting the documentation automatically in a readable format — like JavaDoc, Swagger etc. Modern IDEs will generate interface hints on the fly so most people nowadays rely on those, but they're not a 100% substitute for the human-written description next to a class or method.
Unit tests describe the intent for a piece of code and offer concrete pass/fail instructions. Same goes for other type of tests, like end to end tests, regression tests etc. All tests come with specific frameworks, which have their own methods of outlining specifications.
Speaking of specifications those are also a very important type of documentation. Usually provided by the product owner and fleshed out by technical people like architects or team leads, they're documented in tools like JIRA as part of the development process. They are at the core of the work done by programmers and testers.
Speaking of processes and procedures, it helps everybody if they're documented as well, usually in a wiki. They help a new hire get up to speed faster and they explain how the toolchains are set up for development, testing, deployment and bug fixing.
The human interfaces are a particularly interesting and important aspect and they're usually modeled and shared in specific tools by UX people.
Last but not least the technical as well as business designs should be documented as well. These usually circulate as PDF, DOC, Excel, PPT over email and file shares. Typically made and contributed to by business analysts and software architects.
For new code I'm writing I'm using mostly JsDoc function headers on public methods of classes and exported functions. With one or two sentences explaining what function does.
Also try to explain what to expect in edge cases, like when you pass am empty string, null, ... stuff of that nature - for which I then create unit tests.
I also always mention if a function is pure or not or if a method changes the state of its object. On a sidenote I find it odd that almost no language has a keyword for pure functions or readonly methods.
If I add a big new chunk of code that spans multiple files but is somewhat closed off, I create a md file explaining the big picture. For example I recently added my own closed off library to my angular frontend that handles websocket stuff like subscribing, unsubscribing, buffering, pausing,... for which a created a md file explaining it.
Essentially a function that doesn't produce side effects, like modifying variables outside of its scope or modifying the function parameters. This something you should always try to incorporate into your code as it makes it much easier to test and makes the function's use less risky since you don't relay on external unrelated values.
To give you an example in JavaScript, here are two ways to replace certain numbers from an other list of numbers with the number 0
first a way to do it with a non pure function :
let bannedNumbers = [4,6]
const nums = [0,1,2,3,4,5,6,7,8,9]
function replaceWithZero(nums){
for (let i = 0 ; i < nums.length; i++){
if (bannedNumbers.includes(nums[i])){
nums[i] = 0
}
}
}
replaceWithZero(nums)
console.log("numbers are : ", nums)
here the function replaceWithZero does two things that make it impure. First it modifies its parameter. This can lead to issues, for example if you have Second it uses a non-constant variable outside of its scope (bannedNumbers). Which is bad because if somewhere else in the code someone changes bannedNumbers the behavior of the function changes.
A proper pure implementation could look something like this :
const nums = [0,1,2,3,4,5,6,7,8,9]
function repalceWithZero(nums){
const bannedNumbers = [4,6]
const result = []
for(const num of nums){
result.push(bannedNumbers.includes(num) ? 0 : num)
}
return result
}
const replaced = replaceWithZero(nums)
console.log("numbers are : ", replaced)
Here we are not modifying anything outside of the function's scope or its parameters. This means that no matter where, when and how often we call this function it will always behave the same when given the same inputs! This is the whole goal of pure functions.
Obviously in practice can't make everything 100% pure, for example when making a HTTP request you are always dependent on external factors. But you can try to minimize external factors by making the HTTP request, and the running the result only through pure functions.