Hello! This is the code, datasets, and model weights for my Rubik's Cube solving robot.
The robot works like so:
- You place a scrambled cube in a specific orientation
- Robot turns the cube so one face is towards the camera
- Robot recognizes the colors on that face using a neural network
- The rest of the faces are pointed at the camera and read
- Once the robot knows the Rubik's cube configuration, it can start solving
- For solving the First Two Layers (F2L) of the cube, the robot uses heuristic search to find the best combination of moves
- For solving the rest of the cube, the robot uses a neural network trained to perform known formulas when given a cube state
| To code the robot, I used Python and trained a few custom machine learning models using PyTorch and a Kaggle dataset. |
|---|
First, I worked on the main algorithm: the one that actually solves a Rubik's cube. My idea was, if a computer could learn what move is best to make when it sees a specific Rubik's cube configuration, it would be able to solve the entire cube. So, I found a dataset on Kaggle that has just what I needed: tens of thousands of Rubik's Cube configurations, each with a corresponding "next move" that bring the cube closer to a solved state. Here is the dataset: Kaggle.com (Thank you, Anton) With this dataset, I trained a dense neural network using PyTorch, which takes 54 numbers as an input (representing a rubik's cube configuration) and outputs 19 numbers, one for each type of move you can make (including "stop"). |
![]() |
However, although the model claimed to have 100% accuracy, this was not so. When I ran the model on many scrambled Rubik's cubes, it only got 30-50% of the moves correct, and for some reason the last two moves before the cube was solved were always predicted wrong by the model. I asked ChatGPT and my dad for suggestions, and I tried to include the previous 30 moves the robot already did as an input to the model, and also made the model predict how many moves away from being solved the input cube state is. But these only helped a little. So, I moved on to work on other parts of the code. I will revisit this model later. |
Next, I worked on making the robot actually turn and manipulate the Rubik's cube. In Python, I created a "Claw" class, which allows me to control each individual claw: extend/retract, twist, set_angle, etc. I also created a "Claw Machine" class, which is in charge of orchestrating all four claw movements and actually manipulating the cube. Some key methods in this class are "turn_face()," which turns any given face one time clockwise; "turn_cube()," which rotates the entire cube around a given axis; and "move()," which combines the previous two methods to turn any face in any direction. Here is a demo of the robot turning the Rubik's cube: |
TurnDemo.mp4
| To make the robot "see" what colors the Rubik's cube has on each face, I mounted a camera, connected it to the Jetson Nano, and created a few versions of a program that extracts the colors of each face. |
|---|
My first idea was to hard-code a program that took an image of a cube face, divided it into a 3x3 grid, took the average color of each section, and compared it to a list of known colors. Whichever color each section was closest to, that was the predicted color for that sticker. This approach worked surprisingly well; but it was still not 100% accurate. So, I decided to try something else: detecting the colors using a neural network. To do this, I first needed to find a large dataset of Rubik's cube face images, each labeled with the correct color pattern. I could not find a good dataset, so I did what my dad suggested. I used the robot I already had to scramble, turn, and take pictures of the cube. Using a custom index remapping dictionary, the computer keeps track of the Rubik's cube configuration after each rotation, so the robot could automatically label each of the pictures it took with the correct color pattern. Here is a timelapse of the automatic data collection process: |
DataCollection.MOV
Currently, I am just short of finishing the project. I still need to train a more reliable Vision model, and need to implement the main Rubik's Cube solving algorithm with the improvements I mentioned: using heuristic search for F2L and a formula-based neural network for the rest of the puzzle.
By heuristic search, I mean this:
- Training a neural network to take an input cube configuration, and predict how "good" that configuration is - that is, how many moves will it take to solve from this point.
- Given a scrambled cube, the program will try every possible move combination (up to 4 moves in depth), and the move sequence with the best final configuration (the one that the above neural network predicts is closest to being solved) will be selected and those moves will be performed.
- The process repeats until the First Two Layers of the Rubik's Cube are solved.
By "formula-based" neural network, I mean this:
- Training a neural network to take an input cube configuration, and predict what specific formula is best to apply in this situation.
- This network will be trained using the same dataset I used for the first version of the Cube Solving algorithm.
Finally, while the project is still work-in-progress, here is a demo video, where the robot "solves" a Rubik's cube by following a predetermined formula that I wrote: VIDEO DEMO: YouTube - Robot Following a Formula
Thank you, God Bless!










