building an audio file from scratch

['building an audio file from scratch', '\nin python&\n\n \\n \nthat is the horrifying result of self-made wav files. i cant actually remember why i got interested in this, but it went by with only a few hiccups. most of my work was based off the following website: http://soundfile.sapp.org/doc/WaveFormat/ which has some fantastic visualisations.\n \\n in short, the aim of this project was to create a wav file, fully from scratch. i chose wav files because they are probably the simplest (no maths). this was achieved with python\n\n (**header**) \nbasically every file has a header, telling the computer some key info. the header of a wav file is either RIFF (little endian (stores the least significant byte first)) or RIFX (big endian). RIFF is more commonly used. \\n \nthe first part of the header is the chunkID, which is just "riff" (or rifx) in ascii. this is big-endian regardless. \\n the next 4 bytes are the size of the file, minus the 8 bytes of this field and chunkID. this is calculated last. \\n \nfinally, the header ends with Format, which is just "WAVE" in ascii. big endian again. \n \'\'\' \nChunkID = bytes.fromhex(\'52 49 46 46\') #"riff"\nFormat = bytes.fromhex(\'57 41 56 45\') #"wave"\nChunkSize = int_to_hex(4 + (8 + len(Subchunk1Size) + len(Subchunk1ID) + len(AudioFormat) + len(NumChannels) +\n len(SampleRate) + len(ByteRate) + len(BlockAlign) + len(BitsPerSample)) +\n (8 + len(data))) #spoilers \n \'\'\' \n (;)\n (**file**) \nthe file is made of two chunks, fmt and data. \\n \nthe fmt chunk stores data about the format of the audio. \\n \n \'\'\' \nSubchunk1ID = bytes.fromhex(\'66 6d 74 20\') #"fmt "\nSubchunk1Size = bytes.fromhex(\'10 00 00 00\') #16 bytes\nAudioFormat = bytes.fromhex(\'01 00\') #no compression/PCM\n\n#you can change these\nNumChannels = bytes.fromhex(\'01 00\') #1 channel\nSampleRate = bytes.fromhex(\'44 AC 00 00\') #44100 samplerate\nBitsPerSample = bytes.fromhex(\'10 00\') #16 bit \n \'\'\' \nyou can see what makes up the fmt part in that snippet, it\'s pretty simple. however, there are two which are missing: \n \'\'\' \ntemp = int.from_bytes(SampleRate, byteorder="little") * int.from_bytes(NumChannels, byteorder="little") * int.from_bytes(BitsPerSample, byteorder="little")\nByteRate = int_to_hex(temp)\n\ntemp = int.from_bytes(NumChannels, byteorder="little") * (int.from_bytes(BitsPerSample, byteorder="little") / 8)\nBlockAlign = int_to_hex(temp, 2) \n \'\'\' \nthe byterate is just the product of the samplerate, the number of channels and the number of bits per sample. the block align is just the product of the bytes per sample and the number of channels. (;) \n(**actual_data**) \nnow you have to create the data subchunk. this is very simple, only 3 parts: the subchunk2ID which is just "DATA" in ascii, subchunk2Size which is the number of bytes in the data part, and the data part, which is a load of bytes. "8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2\'s-complement signed integers, ranging from -32768 to 32767." \\n\nheres an example file:\n \'\'\' \\x52\\x49\\x46\\x46\\x84\\x61\\x74\\x61\\x57\\x41\\x56\\x45\\x66\\x6d\\x74\\x20\\x10\\x00\\x00\\x00\\x01\\x00\\x01\\x00\\x10\\x27\\x00\\x00\\x80\\x38\\x01\\x00\\x01\\x00\\x08\\x00\\x64\\x61\\x74\\x61\\x71\\x3e\\x02\\x00\\xfe\\xff\\xff\\xd9\\x05\\xff\\xff\\xd8\\xf5\\xd6\\x6a\\xff\\xff\\x64\\x71\\xff\\xff\\xff\\xff\\x3f\\x83\\xff\\xff\\x8b\\xf6\\xbb\\xe0\\xff\\xff\\x08 \'\'\' \nfinally, just combine it all. i used the "ba" file writing in python which just appends binary data to a file. \n \'\'\' \n#create file (or overwrite previous)\nwith open("test.wav", "wb"):\n pass\n\n#write\naudioFile = open("test.wav", "ab")\n\n\n#riff header\naudioFile.write(ChunkID)\naudioFile.write(ChunkSize)\naudioFile.write(Format)\n#fmt sub\naudioFile.write(Subchunk1ID)\naudioFile.write(Subchunk1Size)\naudioFile.write(AudioFormat)\naudioFile.write(NumChannels)\naudioFile.write(SampleRate)\naudioFile.write(ByteRate)\naudioFile.write(BlockAlign)\naudioFile.write(BitsPerSample)\n#data sub\naudioFile.write(Subchunk2ID)\naudioFile.write(Subchunk2Size)\naudioFile.write(data)\n\naudioFile.close() \n \'\'\' \nit must be in this order too. time to make the "data". \n (;) \n(**making_the_sound**) \nmy first thought was a sine wave because they make nice sounds. most pure tones are just sine waves and i think all audio can be expressed as a sum of loads of sine waves. seeing as each sample is a measurement of the amplitude of the wave, i needed the amplitude of a sine wave, which is A(t)= Amplitude × sin(2πft), where f=frequency of the wave and t is the time along the wave. so, just needed to write a function to generate a list of bytes that represent a sound wave: \n \'\'\' \n#dont forget to import math\ndef generate_sine_wave(frequency, num_samples, sample_rate=int.from_bytes(SampleRate, byteorder="little")):\n sine_wave = []\n for i in range(num_samples):\n t = i / sample_rate\n amplitude = math.sin(2 * math.pi * frequency * t)\n byte_value = int((amplitude + 1) * 127.5)\n sine_wave.append(byte_value)\n \n return bytes(sine_wave) #convert list into a bunch of bytes \n \'\'\' \nand then: \n \'\'\' dataArray.append(generate_sine_wave(440, 441000)) \'\'\' 440hz is the frequency of A4 in modern music, the A above middle C. then, write everything and it sounded awful, and then i remembered that i was using 16bit, and my sine function was giving an 8bit output, so i rewrote it: \n \'\'\' \ndef generate_16bit_sine_wave(frequency, num_samples, sample_rate=int.from_bytes(SampleRate, byteorder="little")):\n sine_wave = bytearray() #quicker\n for i in range(num_samples):\n t = i / sample_rate \n amplitude = math.sin(2 * math.pi * frequency * t)\n sample_value = max(-32768, min(32767, int(amplitude * 32767))) #stops it going over the limits\n sine_wave.extend(struct.pack(\' \nthen i went to combine two waves, to play an interval. i thought adding the amplitudes would work, and lucky guess from me because that is indeed how that works. heres the function i wrote to add them, 16 bits at a time: \n \'\'\' \ndef sum_bytes(byte_arrays):\n max_length = max(len(arr) for arr in byte_arrays)\n padded_arrays = [arr + bytes(max_length - len(arr)) for arr in byte_arrays] # makes them the same length\n result = []\n for i in range(0, max_length, 2):\n byte_pairs = [arr[i:i+2] for arr in padded_arrays]\n int_pairs = [int.from_bytes(pair, byteorder="little", signed=True) for pair in byte_pairs]\n byte_sum = sum(int_pairs)\n byte_sum = max(-32768, min(byte_sum, 32767)) # clip it\n result.extend(byte_sum.to_bytes(2, byteorder="little", signed=True))\n \n return bytes(result) \n \'\'\' \nso then i did this, which would generate two sine waves an octave apart:\n \'\'\' \ndataArray = []\ndataArray.append(generate_16bit_sine_wave(440, 44100*secs))\ndataArray.append(generate_16bit_sine_wave(440*2, 44100*secs))\ndata = sum_bytes(dataArray) \n \'\'\' \nas the samplerate is 44.1khz i can multiply that by a value to return that many seconds of wave. \\n \ni doubled the 440 to make the two waves an octave apart, which should be nice and consonant. \n \nfor some reason, you can only hear the A4. I added a fifth: \n \'\'\' dataArray.append(generate_16bit_sine_wave(440*(chromaticRatio**7), 44100*secs)) \'\'\' (the chromatic ratio is the twelfth root of two, and is the ratio of frequencies one semitone apart in 12TET, a fifth is 7 semitones. i might do more on this another time) \n \nthat sounds nice, so i added a few more tones: \n \'\'\' \ndataArray.append(generate_16bit_sine_wave(220, 44100*secs))\ndataArray.append(generate_16bit_sine_wave(440, 44100*secs))\ndataArray.append(generate_16bit_sine_wave(440*(chromaticRatio**7), 44100*secs))\ndataArray.append(generate_16bit_sine_wave(880*(chromaticRatio**7), 44100*secs))\ndataArray.append(generate_16bit_sine_wave(440*(chromaticRatio**4), 44100*secs))\ndataArray.append(generate_16bit_sine_wave(440*2, 44100*secs)) \n \'\'\' \n \nit has started clipping, so i\'ve floored the sums by 2. this should make it half as loud, and used these tones: \n \'\'\' \ndataArray.append(generate_16bit_sine_wave(440*2, 44100*secs)) #root 8va\ndataArray.append(generate_16bit_sine_wave(440, 44100*secs)) #root\ndataArray.append(generate_16bit_sine_wave(440*(chromaticRatio**7), 44100*secs)) #5th\ndataArray.append(generate_16bit_sine_wave(440*(chromaticRatio**10), 44100*secs)) #7th\ndataArray.append(generate_16bit_sine_wave(440*(chromaticRatio**4), 44100*secs)) #3rd\ndataArray.append(generate_16bit_sine_wave(440*3, 44100*secs)) #root 16va \n \'\'\' \n \nsounds quite cool. now time for a progression? \n \'\'\' \nsecs = 4\ndataArray.append(b"".join([generate_16bit_sine_wave(440, 44100*secs), generate_16bit_sine_wave(440*(chromaticRatio**2), 44100*secs)]))\ndataArray.append(b"".join([generate_16bit_sine_wave(440*(chromaticRatio**7), 44100*secs), generate_16bit_sine_wave(440*(chromaticRatio**7)*(chromaticRatio**2), 44100*secs)]))\ndataArray.append(b"".join([generate_16bit_sine_wave(440*(chromaticRatio**4), 44100*secs), generate_16bit_sine_wave(440*(chromaticRatio**4)*(chromaticRatio**2), 44100*secs)])) \n \'\'\' \nthis should play the A major triad, for 4 seconds, then go a tone above, B major, for 4 seconds: \n it kinda works, although its difficult to hear the different tones.\nthen i went into doing random noises, like so \n \'\'\' \ndataArray.append(b"".join([generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)])) \n \'\'\' \nin fact, the audio file at the top of this page was generated from the following lines: \n \'\'\' \ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 2000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(2000, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(5000, 10000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(10000, 20000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 2000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(2000, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(5000, 10000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(10000, 20000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 2000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(2000, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(5000, 10000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(10000, 20000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 2000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(2000, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(5000, 10000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(10000, 20000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 2000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(2000, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(5000, 10000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(10000, 20000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(50, 2000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(2000, 5000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(5000, 10000), r.randint(10000, 80000)) for i in range(50)]))\ndataArray.append(b"".join( [generate_16bit_sine_wave(r.randint(10000, 20000), r.randint(10000, 80000)) for i in range(50)])) \n \'\'\' \n\nsome fun ones: \n \n(**10kHz,_8bit,_2_waves_moving_randomly:**) \n \nthe same at 16bit (still using the 8bit method to generate the amplitude, which causes the weird noise) \n \n \n(**20.005kHz,_16bit,_3_waves**) \nsame as above, here is with 8bit generation: \n \nand 16bit: \n \nwhich sounds worse. still some fun stuff in there i might sample\n \n(**fun_sounds**) \nthis is 44.1kHz, 16bit, one wave moving randomly very fast. i think i undersampled this which is what lead to the jumpyness: \n \\n \n44.1, 16bit, 16bit generation, 6 waves apparently: \n \\n \n44.1, 16bit, 14 waves (half of the one at the top). this takes an average rather than summing the 14 waves. \n \\n \nsimilar vibe here, 5 waves instead: \n \nnice one\n\\n\noverall i\'m just happy that the files actually play, even if it\'s nowhere close to what i\'m after. \\n \nso thats about it. see below a gallery of 2 helpful graphics. \n(**gallery**) \nthe photos that helped me do this: \n \n \nthese are gifs for some reason but yes very helpful \\n \nthanks for reading this']