My view (based on a lot of RISC research) is that this isn't a meaningful question. As computer architect John Mashey said: there are CPUs for which RISC/CISC is probably not a very relevant classification, such as older CPUs or embedded processors or specialized ones. Tortuous arguments about whether a 6502, or a PDP-8, or an IBM 360/44, or an XDS Sigma 7, etc are RISC or CISC do not usually lead to great insight.
How about neither? CISC and RISC are extremes (look up the VAX instruction set for CISC, how about the "index" instruction for example). There's plenty of room in the middle.
Some of the people in that 2005 discussion got it: 6502 is neither RISC nor CISC, it predates either concept.
It's an impoverished instruction set, trying to get the most utility possible out of a very limited number of transistors that you could fit on a chip at the time.
Early mainframes (pre IBM 360) and minicomputers (DEC PDP-8, DG Nova) were designed under similar constraints and were similarly quirky and hard to use effectively. They also tended to have very poor code density which, combined with small memory sizes, made things such as bytecode interpreters and threaded code popular, despite the order of magnitude (or more) slowdown that imposed on already slow machines.
It is interesting to contrast the 6502 with the earlier 6800 designed by many of the same people in their previous job. The 6800 is in fact much more RISC-like, with a very regular instruction set, and addressing modes no more complex than 16 bit index register plus 8 bit offset, which is very familiar to anyone working with modern ISAs such as RISC-V.
The problem the 6800 has is that there is only one register that can be used as a pointer, which means something as simple as a memcpy or adding elements from two arrays together requires you to load the single index register from memory, use it to access the data, increment it (probably), and write it back to memory. All this taking 18 cycles and 7 bytes of code. And then do the same with the destination (or other source) pointer.
The 6502 on the other hand uses its indirect addressing modes to allow you to directly work with up to 128 16 bit pointers stored in Zero Page. And if you're accessing the same elements in each array/string then you can use the Y register as an offset for all of them, and only have to INY to step between them -- and check for Y wrapping if the arrays are larger than 256 bytes.
Loading a byte from one array, storing it to another, and bumping Y takes 5 bytes of code and 13-14 cycles in total on 6502, vs 14 bytes of code and 36 cycles on the 6800.
Also the 6800's 2nd accumulator, B, gets very little use, and mostly as a loop counter which makes most of the instructions that can operate on it a waste of opcode space and silicon.
The 6502 is really quite ugly and complex in comparison to the 6800, but it is much more effective in its use of code bytes, memory cycles, and transistors.
The Z80 is amazingly very comparable overall to the 6502 in code density and use of memory cycles, despite a very different design, but uses twice as many transistors. The 8080 also uses more transistors than the 6502, while being quite a bit less effective than the Z80 or 6502.
The first ARM was microcoded.[1] The DEC Alpha used microcode (nee PALcode) for a number of functions. There are RISC-V implementations using microcode.[2]
Not all CICS chips use microcode. Indeed, you can make an argument the 6502 and Z80 don't have microcode rather a ROM sequencer. Depends on how fine you want to grind your axe.
Second, it is totally normal, when production get non-ideal product, but lot of compromises.
Whole idea of CISC, it have relatively simple hardware, but many thing made under the hood by software (microcode), because of this for CISC is normal commands lasts more than 20 cycles (in PDP was commands with 48 cycles), while others could be 3-8 cycles.
RISC avoid long operations. Even more, whole idea of RISC to made hardware extremely straightforward and unified, so ideally ALL commands work with same count of cycles.
But remember about compromises. Power of CISC was that it's microcode could been updated much easier than RISC hardware, because of this even for some IBM, sure CISC processors exists option, to create for EACH client unique CPU with its own opcodes.
Sorry, even more - for some IBM processors was possible to load new microcode on boot time, and they even build on this their line of mainframes, in which in top machines nearly all standard opcodes was hardware supported, but in cheapest machines, most things calculated by microcode.
For RISC, yes, exists also for ARM option, to include client opcodes, but this was totally different time and totally different price.
My favorite comment: "Much of the software written in the 70s was not primitive. The 6502 ISA was designed for hand-written assembly code, fast memory (relative to the CPU clock), what they could fit on a chip, and what microprocessor programmers were used to."
The ensuing argument over whether the 6502 ISA was primitive is a great nerd fight