Game Development Community

General thoughts on a Different CrC type file transfer

by John "Mythic" Henry · in General Discussion · 08/15/2008 (11:49 am) · 4 replies

The Topic really doesn't explain this well, so Lets see if I can describe what I am thinking of...

To begin with, this is nothing more then Thinking aloud...
Im just a good programmer, but not good enough to do this idea.

The basics as we all know of the crc check, is to compare a file that has just been
transfered with a second value, A CRC value, to make sure we got it correctly.

How about we go the opposite of this and send just a Long single value, that
when entered into a specialized program, it converts that value back into the
file that needed to be transfered?? Simple in theory, but initially most probably
not easy to do.

The basic thoughts on this:

You should be able to put together a single 128bit or maybe 256bit binary value
combined with the following additional values sent in the previous packet:
File name, File type, File Size.

So instead of sending a 2Mb file, you send two 256byte packets.
On recieving these packets, the File is recreated line by line using the single
value sent in the second packet.

Logic implies, or so I think its possible, to come up with a Single value that when
run through an application with the 3 values in the first packet you could recreate the
Binary lines that end up as the original file.

The idea is to start with the basic design of a Crc check and reverse it.
I'm sure we could use a few other constants to make sure it can be duplicated.
For example: Date time the (RcR we'll call it for now), was compiled, combined
with how long it took create it... Any number of Constants could be used.

It takes the Idea of mathmatics and compression programs a bit further, or maybe
in a completely different direction. The best way to look at it would be at the Binary
Level with no interest in what the file actually contains. We shouldn't care.

I've never really dug into compression applications like Zip, so I may be barking up
a Dead tree that has been tried before. I'll have to do some hunting around a bit
and see how if ever this has been tried before.

I figured this would be a great place to discuss the Idea itself as a system like this
would be of great use to the gaming community.

Years ago, we didn't have the computing power to create this, I think maybe now we do.

By using two packets, and knowing what File type we are recreating, this would reduce the
processing needed to start with and also gives us our constants that would be the same
this way on any computer it is recreated on.
We should also send the Same Random Seed. Another constant that would simplify it.

Some possible parts would be:
Constant Data Length. Each line, (although it actually is not a Line, just a set length to Calc).
Say 128bits.

So we use the Constants and run them through some sort of Hash check
using the 256 bit value sent through.

Starting the clock at the same point the Original App started, we recreate a line (128bits),
append it to the File created from the First packet data.

At the end of a set cycle, the next Line (128 bits) is created, and this is where
the Random Seed value comes in as the PC Random generator is not actually random :)

This actually could work, thinking this as I'm typing. We know that the PC will reproduce the
same numbers if it is set up correctly to match another PC. But we dont have to depend
on that. We can recreate a generator ourselves that will reproduce the same numbers as
needed. So actually it is not a Random generator, but a Generator of numbers that can be
used in the code.

With start time/end time/file size/seed value and the 256 bit value, this might be possible.

The toughest part is creating a Code that would take the numbers from the generator and
the bits from the original line and come up with a single 256 value that can be inputed back in
and recreate each line again.

Just a bit of rambling this is.
I've been back at work with the engine again now that ArcaneFx is available for Tgea.

#1
08/15/2008 (1:26 pm)
Quote:
Logic implies, or so I think its possible, to come up with a Single value that when
run through an application with the 3 values in the first packet you could recreate the
Binary lines that end up as the original file.

This isn't possible. Even with a 2048 bit hash, you'll have collisions which will make your idea impossible to implement. Two different files, even at the same size, can have the same hash. Hashes are one-way by design, and there's a reason no such solution exists yet. :)
#2
08/15/2008 (1:59 pm)
Note that a 256-byte string represents 2^2048 different values, which is a pretty large space, much larger than an MD5 hash, so while you will get collisions, they will probably be infrequent.

that said, John, i think you probably are barking up a wrong tree here,
but it sounds like an interesting and educational tree, so i'd be curious to see what more you come up with.

one way to explore the idea would be to reduce the data size a lot and see how it would work.
for example instead of a 2MB file and a 2048-bit hash,
consider first just 1-byte files and a 1-bit hash. then move on to a 2-bit hash, etc.
#3
08/15/2008 (6:26 pm)
@John
As far as I know, Hashes and CRC are 1 way. What you are suggesting could loosely be considered as file compression.
#4
08/18/2008 (7:28 am)
Suprising I got responses so fast. I've never looked at this before, It was just an
Idea that came to me while I was sorting out a glitch in the render code, sitting there
waiting for the DataBlocks to transfer... :)

This could be classed as File Compression and transfer combined.
It's a form of file encryption as well if you look at.

I am very comfortable with using compression technology, even dealing
with zip compression built into code. I'd just never dug deeper into it. Shortly after posting
I decided to see what came up on the web and found some good stuff.

The only problem I can see is that everybody working on compression technology is just
trying to improve or enhance the current systems already worked out. No new inovation
that I can find. So if anybody is working along similar lines of thought, thier keeping it closed
and quiet.

I do understand CrC at the basic level and know it was not designed for more then
a way to verify the file after transfer. From the reading I did on, (I believe it was the Huffman
system used by most including zip), the fault occurrance is very low, but it still can be faulty.

I even ran into someone describing how to Defeat the CRC check, reproducing the CrC within
the file itself after any changes.

As for using a Hash table, that was merely the first inklings of the idea. As I've thought on it
some more, I'm just trying to visual a basic idea first. The implementation always come second.

So the thoughts follow this train:

1: A file at the very basic level is [101100111001...]. So the line (size of section to reproduce),
will only EVER contain either a 1 or 0. Simple to begin with.

2: A Line of Code [Say starting small] (8 bits) contains a Finite number of possible combinations.
I know I am right on that one at least :)

Now starting from those two basic Constants: The contents are either [1/0] and set # of combos.
We now need to come up with a way to ReCreate some set number of iterations of this from a
RcR data packet Created from the Original Code.

Lets drop to 4 bits, easy to do by hand and much simpler to start with:

How many combos are possible?

Dec: 0 -- 15
To
Binary: 0 -- 15
0000 0001 0010 0011 0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 1111

So we have only 15 possible combos for any one 4bit line to reproduce.
Remember: We want to reproduce the original lines from one Line of code.

So lets try a Generator that creates a series of Numbers the same every time.
The generator runs at a set period, so for every tick, you can either have a new
psuedo random number OR the next in the series. We can do whatever is needed.
It should ALWAYS recreate the same numbers starting from the initial seed value.

[SM == Sending Machine]
1: we seed the generator with any random byte value, Saving that seed value to send.
2: we generate a single packet of data that corresponds to the Lines we want to reproduce.
3: we send 2 packets to RM (shorthand for Receiving machine).

[RM == Receiving machine]
1:Sets up the generator with the seed number from packet one.
2:Creates an empty file of the correct name from packet one.
3:Starts the clock on the generator and starts recreating the file
one line (4 bits) at a time.
4: Each line is appended to the new file created.

If it goes as planned, the created file could pass a CRC check unless the
CrC uses the File creation date.

So the second packet contains the Data used by the RM to calulate what each line
should be from the numbers generated combined somehow with the second packet data.

Could this be done? This much I might be able to do, so we'll see what kind off test code I
can come up with this week :)

I've already started getting a headache remembering my basics:
A == Generated number
B == Wanted 4bit value (0-15)
C == Data used for Each 4bit value created...

So [ A ? C = B ]
Psuedo code formula [ ? == I dont know ] (LoL).
without knowing HOW [?] im going to use those two values I cant determine either one *sigh*.

At least we know [ C ] will be Constant once created on the Sending machine.
Anybody out there got a PHD in theorectical Mathmatics??
I think I am going to drop down to a 1 bit value to start with, make it even easier
as Orion recommended. The formulas for this may get tough tho.

Also, as we have quite a range of Data available in a single 256 byte packet, we could
maybe even cycle through the Packet values according to a set cyclic period.
There are loads of possible ways to go.