Does Bing Chat give reliable answers to math and physics questions? If not is it possible to make it more reliable?
I realize and understand the criticisms of ChatGPT and I have personally seem how bad it can be. Once I asked to count the number of days till a random date giving the present date and it failed miserably, again and again. Trust me! I get the criticism. But, what about Bing Chat Bot?
Have you ever tried to ask you Physics and Maths related questions to it? I was coding a while ago and I had a pretty complex questions which could not be solved by a very popular reddit coding community but Bing Chatbot gave an answer to it in an instant! I was genuinely impressed. Apparently it checks for answers on multiple webpages on the internet, it reads and understands what it reads and it gives the answer to it after combining the knowledge it gained from it's search. Again, the question I asked was pretty complex but it was able to answer it in an instant and it was the right answer! It was coding, it's pretty hard to get the right answer in the first try, I have found it's more "trial and error".
So yeah!
Can I rely partially on Bing Chatbot for math questions?
If not can I ask it to form a query which encapsulates my question perfectly?
If not, should I ask it to "Answer this question and site your sources"?
Can I do something more? i.e., like I did in 3? What are your thoughts on this?
I won't be able to reply to each of your comments anytime soon, but know that I deeply appreciate this community and it's members and their help :')
Language models are designed to produce responses which convince the user that they are a coherent response. They don't care about factuality, and in fact have no ability to "know" if they are correct. And they don't "care"
If you want a smart query tool that lets you ask math problems, you should try something like Wolfram Alpha. It's not perfect, but it's at least designed with the intent to produce answers to math problems.
They don’t care about factuality, and in fact have no ability to “know” if they are correct. And they don’t “care”
I suspect that most people think (maybe not even consciously) that these models answer questions by retrieving data and then writing a response which incorporates that data, rather than just generating text that may or may not contain actual facts.
It really bears repeating over and over that all these so-called AI systems do is take a prompt and output text in response to it that reads as if a human wrote it.
You can tell these things about API calls and they can make API calls.
I have my own GPT4 instance instructed to gather information as necessary so it asks me questions when it needs to.
You can get it to “ask questions” with specific syntax which can then be translated to API calls. This is a way you can get an LLM to consider new information in its tasks.
No. Bing chatbot is chatgpt after all. It will oftentimes provide information that doesn’t match the sources it provides at all.
Don’t trust it blindly.
No. As far as I know the only LLM that can be reliable with math and physics is GPT 4 with the Wolfram extension, since it runs the math through the wolfram api and double-checks the validity of its info. Everything else has the habit of hallucinating a lot and giving wrong answers.
I have experience with GPT-4, and in particular I've used to for math questions in my work occasionally. I'm not sure how Bing chat compares.
For GTP-4, I've noticed the following:
How reliable the answer is depends on how easy or obscure the question is. It hasn't lied to me on easy or introductory material, but once your questions start becoming more obscure, and it's less likely to have the answer in the training set, it starts making things up.
I think of it as search to an extent - it needs to have the answer in the training data to find it. Unlike google, it can usually find an answer even if you don't use the proper terms. But if it doesn't find an answer, it might make something up.
"Easy or introductory" is relative - I have been able to get good answers for some masters-level math, and some wrong ones for lower-level things. Ultimately it depends on how much resources on the topic have been in the training set.
It's actually much more reliable in detecting errors than it's in generating text. So you can open a new chat and ask, "Is the following true: ..." and it will catch most of its own errors. Once it starts catching error, you should know you've left the reliable "easy questions" territory, and even if it can still be useful, exercise much more care.
The way you phrase a prompt matters a lot. For example, if you ask it to explain its reasoning step by step, it becomes much more accurate.
It is generally good in rephrasing questions to use better terminology.
.
Bing chat might be different in some regards. I know that it automatically searches the web for sources, and when generating an answer, and bases its answer on the contents of the sources it found - but I don't have experience with it.
That said, asking for additional sources (besides the search results it found) shouldn't improve the accuracy. It might just give you something you can use to fact-check it.